# Solving regression tasks with machine learning

## Science Kit for learning - ```sklearn``` package

[sklearn](https://scikit-learn.org/stable) is an open source machine learning library that supports supervised and unsupervised learning.  
Provides basic implementaions for machine learning practices:
- data preprocessing
- modell selection
- model building
- fitting - automatic parameter searching/training
- predicting
- validation

## Prerequisites
Install required packages:
- numpy
- sklearn
- matplotlib  
  
Install packages in Jupyter kernel

In [None]:
import sys
print(sys.executable)

In [None]:
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install sklearn
!{sys.executable} -m pip install matplotlib

Import required modules
- numpy
- from sklearn
    - preprocessing
    - linear_model
- matplotlib.pyplot

In [None]:
import numpy as np
from sklearn import preprocessing
from sklearn import linear_model
import matplotlib.pyplot as plt

# Required to use matplotlib embedded in Jupyter notebook
%matplotlib inline

# set inline graph size
plt.rcParams['figure.figsize'] = [15, 5]

## Steps to create regression example
1. Create data
    1. create clean data
        - linear
        - non-linear
    2. put noise (uniform distribution) on data - as many times as want
2. Print and/or show created data
3. Create approximation
    - linear
    - nonlinear

## Create clean linear data
\begin{equation*}
y = a x + b
\end{equation*}

In [None]:
X_src = np.arange(-3, 3, 0.1)
X = X_src.reshape(-1, 1)
a = 1
b = 0.12
y = a * X_src + b

## Create clean quadratic data
\begin{equation*}
y = a_1 x^2 + a_0 x + b
\end{equation*}

In [None]:
X_src = np.arange(-3, 3, 0.1)
X = X_src.reshape(-1, 1)
a1 = 1
a0 = 0.12
b = 2
y = a1 * X_src**2 + a0 * X_src + b

## Create clean cubic data
\begin{equation*}
y = a_2 x^3 + a_1 x^2 + a_0 x + b
\end{equation*}

In [None]:
X_src = np.arange(-3, 3, 0.1)
X = X_src.reshape(-1, 1)
a2 = 0.95
a1 = 0.7
a0 = 0.12
b = 2
y = a2 * X_src**3 + a1 * X_src**2 + a0 * X_src + b

## Put noise on data
\begin{equation*}
y = y + rnd \left( -0.5 .. 0.5 \right)
\end{equation*}

In [None]:
noise_amp = 3
noise = (np.random.rand(1, len(X_src)).ravel() - 0.5) * noise_amp
y = y + noise

## Print created data

In [None]:
y

## Show created data

In [None]:
plt.scatter(X, y)
plt.show()

### Linear regression to approximate noisy data  
Linear data can be approximated  
Non-linear data approximation is inaccurate

In [None]:
lin_reg = linear_model.LinearRegression()
X_src = X
lin_reg.fit(X_src, y)

### (Second degree) Polinomial regression to approximate noisy data
Polinomial regression approximates well also linear and non-linear data  
\begin{equation*}
y = a_1 x^2 + a_0 x + b
\end{equation*}

In [None]:
# create data transformer to generate quadratic descriptors
transformer = preprocessing.PolynomialFeatures(degree=2, include_bias=False)
X_src = transformer.fit_transform(X)

lin_reg = linear_model.LinearRegression()
lin_reg.fit(X_src, y)

### (Third degree) Polinomial regression to approximate noisy data
Polinomial regression approximates well also linear and non-linear data  
\begin{equation*}
y = a_2 x^3 + a_1 x^2 + a_0 x + b
\end{equation*}

In [None]:
# create data transformer to generate quadratic descriptors
transformer = preprocessing.PolynomialFeatures(degree=3, include_bias=False)
X_src = transformer.fit_transform(X)

lin_reg = linear_model.LinearRegression()
lin_reg.fit(X_src, y)

### Use the created model to predict/approximate source data

In [None]:
y_pred = lin_reg.predict(X_src)

print("a={0}, b={1}".format(lin_reg.coef_, lin_reg.intercept_))
plt.scatter(X, y)
plt.plot(X, y_pred, 'r')
plt.show()