<a href="https://colab.research.google.com/github/LucjanSakowicz/data-science-bootcamp/blob/main/06_uczenie_maszynowe/03_metryki_regresja.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* @author: krakowiakpawel9@gmail.com  
* @site: e-smartdata.org

### scikit-learn
>Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  
>
>Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)
>
>Podstawowa biblioteka do uczenia maszynowego w języku Python.
>
>Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
pip install scikit-learn
```

### Metryki - Problem regresji:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)

    

### <a name='a0'></a>  Import bibliotek

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [2]:
y_true = 100 + 20 * np.random.randn(50)
y_true

array([ 53.93765349,  91.03999552,  55.50852806, 108.41238242,
       107.55410711, 120.21922785,  90.24528196,  95.61767802,
        95.90528617,  78.60539253, 100.82337886,  99.89831874,
       110.04564977, 122.02841735,  92.25437333,  92.4624223 ,
       138.19940789,  73.1804464 ,  74.1361351 , 136.19179327,
       107.38024548,  70.59299105, 125.11754607, 122.59341515,
        81.66917942,  73.17424899,  99.79142691, 120.23410424,
       123.2519013 ,  79.20776696,  98.29685606,  91.66664209,
       144.92943908, 130.14381267, 113.54116248, 104.01280155,
        99.57348931,  77.22223763, 107.79121148,  62.08486727,
        79.05741376, 104.04423341,  94.25962762, 116.29011363,
       107.59643937,  94.303694  , 106.53043305,  73.64460554,
        89.50277974,  85.63758497])

In [3]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([ 55.250412  ,  91.95613368,  62.03834748, 126.46607229,
       101.22624011, 111.37513268,  81.35606785, 107.39879558,
       113.276291  ,  81.64946528,  97.74035322,  84.84399155,
       109.24983645, 136.8179553 ,  82.97270766,  93.64941601,
       137.9346925 ,  83.9338996 ,  70.68729706, 128.68311648,
       100.90997477,  75.8794564 , 127.32173489, 131.69655171,
        86.51196943,  62.62790196,  92.45276085, 118.16944569,
       105.73358578,  76.49259533,  91.14775158, 104.26095287,
       131.0597971 , 120.50191611, 120.60312063, 104.29490004,
        90.52029491,  84.2776815 , 127.68207553,  73.14702118,
        98.57719488, 104.79206324, 102.65408483, 114.34740563,
       114.31789092, 101.31425602, 115.43830531,  69.6311852 ,
        80.01811462,  67.10280368])

In [4]:
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results.head()

Unnamed: 0,y_true,y_pred
0,53.937653,55.250412
1,91.039996,91.956134
2,55.508528,62.038347
3,108.412382,126.466072
4,107.554107,101.22624


In [5]:
results['error'] = results['y_true'] - results['y_pred']
results.head()

Unnamed: 0,y_true,y_pred,error
0,53.937653,55.250412,-1.312759
1,91.039996,91.956134,-0.916138
2,55.508528,62.038347,-6.529819
3,108.412382,126.466072,-18.05369
4,107.554107,101.22624,6.327867



### <a name='a2'></a> Interpretacja graficzna

In [6]:
def plot_regression_results(y_true, y_pred): 
    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode='markers'),
                    go.Scatter(x=[min, max], y=[min, max])],
                    layout=go.Layout(showlegend=False, width=800, height=500,
                                     xaxis_title='y_true', 
                                     yaxis_title='y_pred',
                                     title='Regression results'))
    fig.show()
plot_regression_results(y_true, y_pred)

In [7]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)

### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [8]:
def mean_absolute_error(y_true, y_pred):
    return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

8.153039572560465

In [9]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred)

8.153039572560465

### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [10]:
def mean_squared_error(y_true, y_pred):
    return ((y_true - y_pred) ** 2).sum() / len(y_true)

mean_squared_error(y_true, y_pred)

102.01826498575063

In [11]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true, y_pred)

102.01826498575063

### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [12]:
def root_mean_squared_error(y_true, y_pred):
    return np.sqrt(((y_true - y_pred) ** 2).sum() / len(y_true))

root_mean_squared_error(y_true, y_pred)

10.100409149423138

In [13]:
np.sqrt(mean_squared_error(y_true, y_pred))

10.100409149423138

### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$ 

In [14]:
def max_error(y_true, y_pred):
    return abs(y_true - y_pred).max()

In [15]:
max_error(y_true, y_pred)

33.38931213435406

In [16]:
from sklearn.metrics import max_error

max_error(y_true, y_pred)

33.38931213435406

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [17]:
from sklearn.metrics import r2_score

r2_score(y_true, y_pred)

0.7485013790416539

In [18]:
def r2_score(y_true, y_pred):
    numerator = ((y_true - y_pred) ** 2).sum()
    denominator = ((y_true - y_true.mean()) ** 2).sum()
    try:
        r2 = 1 - numerator / denominator
    except ZeroDivisionError:
        print('Dzielenie przez zero')
    return r2

In [19]:
r2_score(y_true, y_pred)

0.7485013790416539