<a href="https://colab.research.google.com/github/Szyseba/data-sience-bootcamp/blob/main/06_uczenie_maszynowe/03_metryki_regresja.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* @author: krakowiakpawel9@gmail.com  
* @site: e-smartdata.org

### scikit-learn
>Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  
>
>Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)
>
>Podstawowa biblioteka do uczenia maszynowego w języku Python.
>
>Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
pip install scikit-learn
```

### Metryki - Problem regresji:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)

    

### <a name='a0'></a>  Import bibliotek

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [7]:
y_true = 100 + 20 * np.random.randn(50)
y_true

array([111.12961298,  69.79914572, 102.68747221,  82.41477095,
       123.0860062 ,  62.94029625,  71.2401004 , 112.94039163,
        87.19695383,  96.42692109,  92.11596821,  79.93513946,
       116.0994151 ,  78.39596168, 121.15229949, 129.60260408,
       112.32600913, 137.38347681,  87.17054002, 104.01912291,
        78.21611198,  70.4803073 , 138.19467722, 100.54580877,
        66.50131968, 103.05716572,  77.73713687,  90.42670217,
       105.04419195,  94.2711426 ,  82.28891222, 123.48830285,
       113.28691269, 131.93738959, 125.79114604,  64.86814979,
        87.61611766, 105.49388961,  88.70267911, 102.7388975 ,
        98.68471698,  82.31115758, 114.27475903, 105.35591929,
       131.70097094,  90.91476672, 100.71804991,  93.08552917,
        99.721672  , 109.33784856])

In [8]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([103.92540265,  62.33385598,  95.29573343,  73.80225517,
       112.51752281,  55.52631369,  74.46660501, 111.39257226,
        94.31533481,  93.3462924 ,  84.33984722,  76.580071  ,
       118.8421037 ,  70.65423855, 122.22792742, 133.90373305,
       104.86480174, 132.37119734,  74.15656292,  82.88699362,
        76.96191661,  79.85908248, 132.82447834,  98.25295196,
        69.68863827,  99.39097214,  75.60289161,  78.86133078,
       107.46784441,  97.05831974,  94.49700737, 130.53860296,
       120.81591178, 138.92390902, 124.60865243,  72.13366948,
       111.30453024, 120.62831268,  89.40144348,  98.76585279,
        99.4489592 ,  87.89547949, 123.9566622 ,  87.85224952,
       135.02847023,  89.98552509, 114.37967351,  97.81252613,
        80.17172138, 104.5436118 ])

In [9]:
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results.head()

Unnamed: 0,y_true,y_pred
0,111.129613,103.925403
1,69.799146,62.333856
2,102.687472,95.295733
3,82.414771,73.802255
4,123.086006,112.517523


In [10]:
results['error'] = results['y_true'] - results['y_pred']
results.head()

Unnamed: 0,y_true,y_pred,error
0,111.129613,103.925403,7.20421
1,69.799146,62.333856,7.46529
2,102.687472,95.295733,7.391739
3,82.414771,73.802255,8.612516
4,123.086006,112.517523,10.568483



### <a name='a2'></a> Interpretacja graficzna

In [11]:
def plot_regression_results(y_true, y_pred):
    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode='markers'),
                    go.Scatter(x=[min, max], y=[min, max])],
                    layout=go.Layout(showlegend=False, width=800, height=500,
                                     xaxis_title='y_true',
                                     yaxis_title='y_pred',
                                     title='Regression results'))
    fig.show()
plot_regression_results(y_true, y_pred)

In [23]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)

### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [24]:
def mean_absolute_error(y_true, y_pred):
    return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

np.float64(7.642831919882392)

In [None]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred)

7.802812669603978

### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [26]:
def mean_squared_error(y_true, y_pred):
    return ((y_true - y_pred) ** 2).sum() / len(y_true)

mean_squared_error(y_true, y_pred)

np.float64(90.55201991682415)

In [25]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true, y_pred)

90.55201991682415

### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [33]:
def root_mean_squared_error(y_true, y_pred):
    return np.sqrt(((y_true - y_pred) ** 2).sum() / len(y_true))

root_mean_squared_error(y_true, y_pred)

np.float64(9.51588250856557)

In [34]:
np.sqrt(mean_squared_error(y_true, y_pred))

np.float64(9.51588250856557)

### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$

In [35]:
def max_error(y_true, y_pred):
    return abs(y_true - y_pred).max()

In [36]:
max_error(y_true, y_pred)

np.float64(35.597267246934905)

In [37]:
from sklearn.metrics import max_error

max_error(y_true, y_pred)

np.float64(35.597267246934905)

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [46]:
from sklearn.metrics import r2_score

r2_score(y_true, y_pred)

0.7743080916000764

In [43]:
def r2_score(y_true, y_pred):
    numerator = ((y_true - y_pred) ** 2).sum()
    denominator = ((y_true - y_true.mean()) ** 2).sum()
    try:
        r2 = 1 - numerator / denominator
    except ZeroDivisionError:
        print('Dzielenie przez zero')
    return r2

In [45]:
r2_score(y_true, y_pred)

np.float64(0.7743080916000764)