<a href="https://colab.research.google.com/github/chrispi21/ml-bc/blob/main/supervised/02_regression/07_regression_metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### scikit-learn
Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  

Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)

Podstawowa biblioteka do uczenia maszynowego w języku Python.

Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
!pip install scikit-learn
```
Aby zaktualizować do najnowszej wersji bibliotekę scikit-learn, użyj polecenia poniżej:
```
!pip install --upgrade scikit-learn
```
Kurs stworzony w oparciu o wersję `0.22.1`

### Spis treści:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)



    

### <a name='a0'></a>  Import bibliotek

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

np.random.seed(42)

In [4]:
y_true = 100 + 20 * np.random.randn(50)
y_true

array([ 71.69258516,  91.58709354,  93.14570967,  83.95445462,
        96.77428577, 108.08101714, 137.72371802, 103.49155626,
       105.15100781,  98.51108168,  61.62457569,  99.46972249,
       101.2046042 , 149.26484225,  96.1527807 , 106.03094685,
        99.30576461,  76.62643925, 122.85645629, 115.03866065,
       115.82063894,  81.8122509 , 128.05588622,  71.96297874,
       111.73714188, 143.80911252,  80.1892735 ,  88.67404541,
       101.9930273 ,  89.93048692,  68.98673138, 101.3712595 ,
        78.75392573, 109.47184861,  81.61151532, 130.9986881 ,
        84.33493415,  93.55876968, 116.27034435,  75.38271367,
       104.54919869, 126.14285509,  67.85033531, 103.69267717,
       105.19765588, 115.63645744,  75.26098578,  73.59086774,
       110.43883131, 105.93969346])

In [5]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([ 74.19751366,  95.05157564,  86.34546245,  86.27699159,
        99.7050105 , 100.93750296, 156.38146314, 108.22988547,
        93.23797284, 105.07661777,  51.87775899, 107.34056853,
       112.79055999, 141.05801907, 105.786542  , 110.15875612,
       107.52636621,  95.59436907, 120.40257513, 107.50129901,
       106.92549464,  73.65414805, 127.28486912,  75.37449849,
       114.50404987, 152.08094501,  80.31929242, 103.20938618,
        99.34645897, 117.13217858,  75.24340486,  92.79968393,
        68.04500074, 114.29657277,  79.37688746, 138.13869304,
        89.0673104 ,  92.83048055, 107.80240717,  60.23424142,
       100.08404917, 134.70684303,  69.99127275,  91.23528938,
       106.92946514, 119.48963123,  66.42241142,  75.1281188 ,
       111.0209185 ,  94.50999049])

In [6]:
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results.head()

Unnamed: 0,y_true,y_pred
0,71.692585,74.197514
1,91.587094,95.051576
2,93.14571,86.345462
3,83.954455,86.276992
4,96.774286,99.70501


In [8]:
results['error'] = results['y_true'] - results['y_pred']
results['error_squared'] = results['error'] ** 2
results.head()

Unnamed: 0,y_true,y_pred,error,error_squared
0,71.692585,74.197514,-2.504929,6.274667
1,91.587094,95.051576,-3.464482,12.002636
2,93.14571,86.345462,6.800247,46.243362
3,83.954455,86.276992,-2.322537,5.394178
4,96.774286,99.70501,-2.930725,8.589147


In [9]:
print(f"MAE - mean absolute error: {results['error'].abs().sum() / len(results):.4f}")

print(f"MSE - mean squared error: {results['error_squared'].sum() / len(results):.4f}")

print(f"RMSE - root mean squared error: {np.sqrt(results['error_squared'].sum() / len(results)):.4f}")

MAE - mean absolute error: 7.1319
MSE - mean squared error: 78.9787
RMSE - root mean squared error: 8.8870



### <a name='a2'></a> Interpretacja graficzna

In [14]:
results[['y_true', 'y_pred']].min().min()

51.87775899174596

In [13]:
def plot_regression_results(y_true, y_pred):

    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode='markers'),
                    go.Scatter(x=[min, max], y=[min, max])],
                    layout=go.Layout(showlegend=False, width=800,
                                     xaxis_title='y_true',
                                     yaxis_title='y_pred',
                                     title='Regresja: y_true vs. y_pred'))
    fig.show()

plot_regression_results(y_true, y_pred)

In [17]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)

results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)

### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [18]:
def mean_absolute_error(y_true, y_pred):
    return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

8.029713522680673

In [19]:
from sklearn.metrics import mean_absolute_error

mean_absolute_error(y_true, y_pred)

8.029713522680673

### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [20]:
def mean_squared_error(y_true, y_pred):
    return ((y_true - y_pred) ** 2).sum() / len(y_true)

mean_squared_error(y_true, y_pred)

102.17201792898915

In [21]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true, y_pred)

102.17201792898915

### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [22]:
def root_mean_squared_error(y_true, y_pred):
    return np.sqrt(((y_true - y_pred) ** 2).sum() / len(y_true))

root_mean_squared_error(y_true, y_pred)

10.1080175073547

In [23]:
np.sqrt(mean_squared_error(y_true, y_pred))

10.1080175073547

### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$

In [24]:
def max_error(y_true, y_pred):
    return abs(y_true - y_pred).max()

In [25]:
max_error(y_true, y_pred)

30.982994340712565

In [26]:
from sklearn.metrics import max_error

max_error(y_true, y_pred)

30.982994340712565

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [27]:
from sklearn.metrics import r2_score

r2_score(y_true, y_pred)

0.7379968537750543

In [28]:
def r2_score(y_true, y_pred):
    numerator = ((y_true - y_pred) ** 2).sum()
    denominator = ((y_true - y_true.mean()) ** 2).sum()
    try:
        r2 = 1 - numerator / denominator
    except ZeroDivisionError:
        print('Dzielenie przez zero')
    return r2

In [29]:
r2_score(y_true, y_pred)

0.7379968537750543