<a href="https://colab.research.google.com/github/Lukas-Swc/machine-learning-bootcamp/blob/main/supervised/02_regression%20/07_regression_metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### scikit-learn
Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  

Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)

Podstawowa biblioteka do uczenia maszynowego w języku Python.

Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
!pip install scikit-learn
```
Aby zaktualizować do najnowszej wersji bibliotekę scikit-learn, użyj polecenia poniżej:
```
!pip install --upgrade scikit-learn
```
Kurs stworzony w oparciu o wersję `0.22.1`

### Spis treści:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)


### <a name='a0'></a>  Import bibliotek

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

np.random.seed(42)

In [11]:
y_true = 100 + 20 *  np.random.randn(50)
y_true

array([ 80.70153079, 113.7210292 , 121.16848974,  64.82521027,
        76.33482975,  59.21535644,  94.61186331, 114.35084512,
       130.04714104, 101.48189561, 132.57231091,  72.39797084,
        65.93235121,  98.88904602, 107.68130898,  99.34610504,
        58.651158  ,  98.21759921,  73.91060999, 113.39345098,
       107.33196492,  81.20240427,  89.72266165,  78.81572956,
        98.74641805, 119.10284641,  80.28547907, 110.08093031,
        89.39484763,  84.14254335,  97.8593928 ,  79.29515355,
        88.92701389,  76.04244215, 139.29450266, 100.70527104,
        86.00548984, 104.27959821,  97.75343901,  95.58060801,
       112.283334  , 115.1501542 ,  89.38997705,  88.48363519,
        94.49896606,  53.96157671,  69.69617876, 127.33748535,
       132.89935427,  95.01927921])

In [12]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([ 86.46710042, 116.83353075, 151.95729782,  76.02095939,
        75.05565383,  49.65995204,  78.54740011, 116.38548147,
       122.48363359,  87.25935851, 126.10658207,  61.5824908 ,
        82.80376756, 107.70544359, 107.60158257, 114.14554643,
        59.42484108,  89.6047572 ,  89.14185076, 118.78255141,
        96.95950338,  79.29901749,  80.96647912,  64.98773225,
       108.00819353, 138.19701281,  66.29980334, 115.71062268,
        82.88842194,  79.27128952,  91.93545356,  70.65524585,
        89.41223017,  67.73294098, 141.99907092, 100.20288994,
        83.61600937,  95.20396159,  91.9857257 , 103.13452027,
       117.29250588, 105.37460175,  90.3833001 ,  95.99750642,
        77.80491325,  59.39517863,  63.06994117, 133.04347203,
       125.2667627 ,  76.9704582 ])

In [14]:
results = pd.DataFrame(data={'y_true': y_true, 'y_pred': y_pred})
results.head()

Unnamed: 0,y_true,y_pred
0,80.701531,86.4671
1,113.721029,116.833531
2,121.16849,151.957298
3,64.82521,76.020959
4,76.33483,75.055654


In [18]:
results['error'] = results['y_true'] - results['y_pred']
results['error_squared'] = results['error'] ** 2
results.head()

Unnamed: 0,y_true,y_pred,error,error_squared
0,80.701531,86.4671,-5.76557,33.241793
1,113.721029,116.833531,-3.112502,9.687666
2,121.16849,151.957298,-30.788808,947.950703
3,64.82521,76.020959,-11.195749,125.344798
4,76.33483,75.055654,1.279176,1.636291


In [19]:
print(f"MAE - mean absolute error: {results['error'].abs().sum() / len(results):.4f}")

print(f"MSE - mean squared error: {results['error_squared'].sum() / len(results):.4f}")

print(f"RMSE - root mean squared error: {np.sqrt(results['error_squared'].sum() / len(results)):.4f}")

MAE - mean absolute error: 8.3686
MSE - mean squared error: 104.8293
RMSE - root mean squared error: 10.2386


### <a name='a2'></a>  Interpretacja graficzna

In [20]:
def plot_regression_results(y_true, y_pred):

    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode='markers'),
                    go.Scatter(x=[min, max], y=[min, max])],
                    layout=go.Layout(showlegend=False, width=800,
                                     xaxis_title='y_true',
                                     yaxis_title='y_pred',
                                     title='Regresja: y_true vs. y_pred'))
    fig.show()

plot_regression_results(y_true, y_pred)

In [21]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)

results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)

### <a name='a3'></a>  Mean Absolute Error - MAE - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [22]:
def mean_absolute_error(y_true, y_pred):
  return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

7.755456919872209

In [23]:
from sklearn.metrics import mean_absolute_error

mean_absolute_error(y_true, y_pred)

7.755456919872209

### <a name='a4'></a>  Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [27]:
def mean_squared_error(y_true, y_pred):
  return ((y_true - y_pred) ** 2).sum() / len(y_true)

mean_squared_error(y_true, y_pred)

95.56588322626148

In [28]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true, y_pred)

95.56588322626148

### <a name='a5'></a>  Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [30]:
def root_mean_squared_error(y_true, y_pred):
  return np.sqrt(((y_true - y_pred) ** 2).sum() / len(y_true))

root_mean_squared_error(y_true, y_pred)

9.775780440776147

In [31]:
from sklearn.metrics import root_mean_squared_error

root_mean_squared_error(y_true, y_pred)

9.775780440776147

### <a name='a6'></a>  Max Error - Błąd maksymalny
$$ME = max(|y\_true - y\_pred|)$$

In [33]:
def max_error(y_true, y_pred):
  return max(abs(y_true - y_pred))

max_error(y_true, y_pred)

31.931075678448607

In [34]:
from sklearn.metrics import max_error

max_error(y_true, y_pred)

31.931075678448607

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [38]:
def r2_score(y_true, y_pred):
  numerator = ((y_true - y_pred) ** 2).sum()
  denominator = ((y_true - y_true.mean()) ** 2).sum()
  try:
      r2 = 1 - numerator / denominator
  except ZeroDivisionError:
      print('Dzielenie przez zero')
  return r2

r2_score(y_true, y_pred)

0.7582801303266378

In [39]:
from sklearn.metrics import r2_score

r2_score(y_true, y_pred)

0.7582801303266378