<a href="https://colab.research.google.com/github/Lukas-Swc/data-science-bootcamp/blob/main/06_uczenie_maszynowe/03_metryki_regresja.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### scikit-learn
>Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  
>
>Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)
>
>Podstawowa biblioteka do uczenia maszynowego w języku Python.
>
>Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
pip install scikit-learn
```

### Metryki - Problem regresji:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)

    

### <a name='a0'></a>  Import bibliotek

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [2]:
y_true = 100 + 20 * np.random.randn(50)
y_true

array([ 76.67303523, 119.52101468,  91.74643631, 102.14712014,
       123.03591897,  76.32890593,  93.332969  ,  74.8341302 ,
        90.68152313, 118.17238449,  92.95094276,  88.98245266,
       115.23274636, 123.07212925,  87.13203094, 120.45847052,
       113.16915763,  71.59134852,  99.44643811,  82.15692275,
        94.93677668,  82.43285013, 112.23665072, 120.97236084,
        91.02341032,  80.56198694, 111.97284256,  85.56098723,
       115.01628464,  79.33907681,  94.73785243, 124.77722821,
        91.77897443,  88.77820111,  76.61700246, 109.76370443,
        91.48040178,  93.59968434,  95.39469487, 101.91240421,
        93.51433997,  75.507825  , 116.0246822 , 105.71827813,
        94.24392927, 113.97568969,  84.49590296,  83.99569067,
        73.98710431, 101.32041672])

In [3]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([ 85.09980526, 118.3866612 ,  61.96828179,  93.52875633,
       137.79654689,  78.75012013,  95.65630248,  77.65890595,
        97.67445691, 103.30694822, 101.04271643, 105.78401942,
       128.13719635, 127.78743402,  88.70428361, 100.44057383,
       101.5923425 ,  63.72809179, 107.76698122,  94.90168893,
       114.52804954,  87.6906581 ,  99.81227004, 105.93106003,
       103.37124207,  77.99145239,  85.91649623,  87.97419785,
       101.60431426,  72.63532322, 111.90517497, 144.2164356 ,
        81.4751791 ,  73.76731138,  73.35481761, 112.68460156,
       104.50100867,  98.40917357,  95.46799055, 105.86171316,
        89.86958345,  83.40226943, 126.81492107, 113.21508004,
       105.8261698 , 101.96434805,  90.1937423 ,  92.96799852,
        58.67324878,  92.35719542])

In [4]:
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results.head()

Unnamed: 0,y_true,y_pred
0,76.673035,85.099805
1,119.521015,118.386661
2,91.746436,61.968282
3,102.14712,93.528756
4,123.035919,137.796547


In [7]:
results['error'] = results['y_true'] - results['y_pred']
results.head()

Unnamed: 0,y_true,y_pred,error
0,76.673035,85.099805,-8.42677
1,119.521015,118.386661,1.134353
2,91.746436,61.968282,29.778155
3,102.14712,93.528756,8.618364
4,123.035919,137.796547,-14.760628



### <a name='a2'></a> Interpretacja graficzna

In [10]:
def plot_regression_results(y_true, y_pred):
    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode='markers'),
                    go.Scatter(x=[min, max], y=[min, max])],
                    layout=go.Layout(showlegend=False, width=800, height=500,
                                     xaxis_title='y_true',
                                     yaxis_title='y_pred',
                                     title='Regression results'))
    fig.show()
plot_regression_results(y_true, y_pred)

In [11]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)

### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [12]:
def mean_absolute_error(y_true, y_pred):
  return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

np.float64(7.743154827272271)

In [13]:
from sklearn.metrics import mean_absolute_error

print(mean_absolute_error(y_true, y_pred))

7.743154827272271


### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [15]:
def mean_squared_error(y_true, y_pred):
  return ((y_true - y_pred) ** 2).sum() / len(y_true)

mean_squared_error(y_true, y_pred)

np.float64(94.41480131747592)

In [16]:
from sklearn.metrics import mean_squared_error

print(mean_squared_error(y_true, y_pred))

94.41480131747592


### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [17]:
def root_mean_squared_error(y_true, y_pred):
  return np.sqrt(((y_true - y_pred) ** 2).sum() / len(y_true))

root_mean_squared_error(y_true, y_pred)

np.float64(9.716727912084187)

In [18]:
from sklearn.metrics import root_mean_squared_error

print(root_mean_squared_error(y_true, y_pred))

9.716727912084187


### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$

In [19]:
def max_error(y_true, y_pred):
  return abs(y_true - y_pred).max()

max_error(y_true, y_pred)

np.float64(33.92469577976186)

In [20]:
from sklearn.metrics import max_error

print(max_error(y_true, y_pred))

33.92469577976186


### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [23]:
def R2_score(y_true, y_pred):
  return 1 - (((y_true - y_pred) ** 2).sum() / ((y_true - np.mean(y_true)) ** 2).sum())

R2_score(y_true, y_pred)

np.float64(0.7399354296114118)

In [24]:
from sklearn.metrics import r2_score

print(r2_score(y_true, y_pred))

0.7399354296114118
