# Regression Modelling Superconductivity Critical Temperature from Elemental Composition

In [1]:
# Base imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

# Pipeline and Modelling
from sklearn.preprocessing import StandardScaler, RobustScaler, MaxAbsScaler, SplineTransformer
from sklearn.linear_model import LinearRegression, Ridge, BayesianRidge, RidgeCV
from sklearn.svm import LinearSVR
from sklearn.ensemble import GradientBoostingRegressor, HistGradientBoostingRegressor
from sklearn.metrics import accuracy_score, r2_score, roc_auc_score, roc_curve
from xgboost import XGBRegressor

# Plot styles
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['axes.titleweight'] = 'bold'
plt.rcParams['axes.titlesize'] = 13
plt.rcParams['font.size'] = 11
from matplotlib.ticker import FuncFormatter

In [2]:
# Setting up same dataframes as EDA file
df = pd.read_csv('data/train.csv')
chemcomp = pd.read_csv('data/unique_m.csv')
norm_chemcomp = chemcomp.drop(columns=['critical_temp','material'])
norm_chemcomp = norm_chemcomp.div(norm_chemcomp.sum(axis=1),axis=0)
norm_chemcomp['critical_temp']=chemcomp['critical_temp']

## Context from Exploratory Data Analysis
In our analysis, we noted high correlation between features derived from elemental properties, both weighted for composition proportion and not, and our target, the critical temperature (Tc) for superconduction. However we also noted high correlation between overall sample compositions and Tc, possibly largely influenced by the prevalence of cuprous boron oxide (*CBOx) superconductors in high-temperature superconduction research. Our task is then to ensure that our derived features create a better model than one derived from composition alone and the Tc is well-predicted for the full range of available temperatures and is not overfit to our *CBOx high Tc materials - that our features are meaningful APART from their connection to whether they typify a cuprous oxide. 