# 5. Regression models
After basic data analysis, it is possible to make a regression model and characterise input data.

## 5.1 Starting setup
Let's first return to the main starting point of our concrete data analysis example.

In [2]:
%pip install matplotlib
%pip install seaborn

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

cycle = plt.rcParams['axes.prop_cycle'].by_key()['color']

# %matplotlib ipympl
import pandas as pd
df = pd.read_csv(r'files\concrete_data.csv')
rename = {
    'blast_furnace_slag': 'component1',
    'fly_ash': 'component2',
    'superplasticizer': 'component3',
    'coarse_aggregate': 'component4',
    'fine_aggregate ': 'component5',
    'concrete_compressive_strength': 'strength'
}
df.rename(columns=rename, inplace=True)

Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\20194060\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\20194060\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


## 5.2 SciKit Learn Introduction
Using the SciKit-Learn package, it possible to perform learn models (e.g. regression) and use these models to do predictive data analyses. More information about the scikit-learn package can be found here: https://scikit-learn.org/stable/.

In the below code, we load the package, in particular the `LinearRegression` and `PolynomialFeatures` classes.

In [4]:
%pip install scikit-learn



You should consider upgrading via the 'c:\Users\20194060\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.





In [5]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score

from scipy.optimize import curve_fit

## 5.3 Linear Regression
With these packages loaded and imported, it is possible to make a linear regression model, for example between `age` and `strength` of the concrete:

In [6]:
x = df.age
y = df.strength

model = LinearRegression()
model.fit(x.to_numpy().reshape(-1, 1), y.to_numpy().reshape(-1, 1));

For the linear regression model, several characteristics can be obtained, as shown below.

In [7]:
model.intercept_

array([31.84659489])

In [8]:
model.coef_

array([[0.08697285]])

In [9]:
y_pred = model.predict(x.to_numpy().reshape(-1, 1))
r2 = r2_score(y, y_pred)
print(f'R² Score: {r2}')

R² Score: 0.10815745064203353


Scipy curve fit:

In [10]:
def linear(x, a, b):
    return a*x + b

def polynomial(x, a, b, c, d):
    return a*x**3 + b*x**2 + c*x + d

In [None]:
x = df.age
y = df.strength

popt, pcov = curve_fit(polynomial, x, y)
popt

In [None]:
y_pred = polynomial(x, *popt)
r2 = r2_score(y, y_pred)
print(f'R² Score: {r2}')

In [None]:
x_plot = np.linspace(0, x.max())

ax = plot_scatter(x, y)
ax.plot(x_plot, polynomial(x_plot, *popt), c=cycle[1])

The Eurocode equation for strength is:
$f_{c} (t) = f_{c,0} \exp{ \left( s \left[ 1 - \left( \dfrac{28}{t} \right)^{1/2} \right] \right)}$

Create a model for this dataset based on this equation:

***

Linear:

In [None]:
x = df.drop('strength', axis=1)
y = df.strength

model = LinearRegression()
model.fit(x, y.to_numpy().reshape(-1, 1));

In [None]:
y_pred = model.predict(x)
r2 = r2_score(y, y_pred)
print(f'R² Score: {r2}')

In [None]:
fig, ax = plt.subplots()
ax.scatter(y, y_pred, alpha=0.5, linewidths=0)
ax.plot([0, 90], [0, 90], c='k')

Polynomial:

In [None]:
degree = 2  # Degree of polynomial
poly_features = PolynomialFeatures(degree=degree)
x_poly = poly_features.fit_transform(x)

model = LinearRegression()
model.fit(x_poly, y);

In [None]:
y_pred = model.predict(x_poly)
r2 = r2_score(y, y_pred)
print(f'R² Score: {r2}')

In [None]:
fig, ax = plt.subplots()
ax.scatter(y, y_pred, alpha=0.5, linewidths=0)
ax.plot([0, 90], [0, 90], c='k')