<a href="https://colab.research.google.com/github/KacperKaczmarczyk/data-science-bootcamp/blob/main/05_sklearn/03_metryki_regresji.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### scikit-learn
>Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  
>
>Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)
>
>Podstawowa biblioteka do uczenia maszynowego w języku Python.
>
>Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
pip install scikit-learn
```

### Metryki - Problem regresji:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)

    

In [9]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go


In [3]:
y_true = 100 + 20 * np.random.randn(50)
y_true

array([102.27256772,  94.26588858,  87.12160775,  88.01427049,
       102.84018022, 103.17300784,  78.23508195, 138.0939243 ,
       114.51573422, 120.65670443, 119.02484089, 108.41674867,
        89.67412664, 131.19419123,  82.07383497,  97.08304809,
       130.62816395,  95.388023  , 116.51752986,  69.277204  ,
       102.69233051, 132.24708053,  98.88451377, 113.3301429 ,
       119.36313618, 106.67751037, 111.14846327, 103.72509448,
       103.65207397,  85.05169227,  80.70722305,  87.95001398,
       101.01658474,  66.48124961, 116.41850559,  81.02337744,
        92.39095799,  73.87590232, 103.66512617,  81.81235614,
        83.67602274,  96.65949383,  84.9357469 , 109.5165712 ,
        94.48153992,  92.60865172,  53.57654812,  97.73621583,
        92.55620826, 103.10560099])

In [4]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([ 84.4151734 ,  94.28174072,  66.3657273 ,  83.50639951,
       108.93430055, 102.24357903,  80.97592874, 138.33974025,
       107.05004178, 114.48632132,  97.23179728, 129.35705835,
        89.52103692, 131.55425477,  90.3557972 ,  92.28074859,
       133.83375852,  79.81575039, 123.90946535,  70.89694603,
       102.74813214, 132.10528392, 103.93441096,  99.29342545,
       115.38767656,  89.26137478, 105.09450298, 104.07085925,
        89.63177821,  82.13775447,  65.28173024,  70.75501829,
       101.61054297,  63.34775134, 124.91075603,  88.65952204,
        76.88251403,  90.45351737,  83.9450671 ,  79.48417831,
        82.28473731, 106.83826777,  73.21321892,  83.40635205,
        84.05491993,  88.27191609,  52.68938298,  99.04917892,
       106.39346205, 114.2991058 ])

In [5]:
results = pd.DataFrame({'y_true':y_true, 'y_pred': y_pred})
results.head()

Unnamed: 0,y_true,y_pred
0,102.272568,84.415173
1,94.265889,94.281741
2,87.121608,66.365727
3,88.01427,83.5064
4,102.84018,108.934301


In [7]:
results['error'] = results['y_true'] - results['y_pred']
results.head()

Unnamed: 0,y_true,y_pred,error
0,102.272568,84.415173,17.857394
1,94.265889,94.281741,-0.015852
2,87.121608,66.365727,20.75588
3,88.01427,83.5064,4.507871
4,102.84018,108.934301,-6.09412


#Interpretacja graficzna

In [10]:
def plot_regression_results(y_true, y_pred): 
    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode='markers'),
                    go.Scatter(x=[min, max], y=[min, max])],
                    layout=go.Layout(showlegend=False, width=800, height=500,
                                     xaxis_title='y_true', 
                                     yaxis_title='y_pred',
                                     title='Regression results'))
    fig.show()
plot_regression_results(y_true, y_pred)

In [15]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)

### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [17]:
def mean_absolute_error(y_true, y_pred):
  return abs(y_true-y_pred).sum()/len(y_true)

mean_absolute_error(y_true, y_pred)

7.753328469546394

In [18]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred)

7.753328469546394

### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [19]:
def mean_squared_error(y_true, y_pred):
  return abs((y_true-y_pred) ** 2).sum()/len(y_true)

mean_squared_error(y_true, y_pred)

96.00756013868435

In [20]:
from sklearn.metrics import mean_squared_error
mean_squared_error(y_true, y_pred)

96.00756013868435

### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [23]:
def root_mean_squared_error(y_true, y_pred):
  return np.sqrt(abs((y_true-y_pred) ** 2).sum()/len(y_true))

root_mean_squared_error(y_true, y_pred)

9.798344765249096

### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$ 

In [25]:
def max_error(y_true, y_pred):
  return abs(y_true - y_pred).max()

max_error(y_true,y_pred)

34.913136764356324

In [26]:
from sklearn.metrics import max_error
max_error(y_true,y_pred)

34.913136764356324

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [27]:
from sklearn.metrics import r2_score

r2_score(y_true, y_pred)

0.7631117325393175

In [30]:
def r2_score(y_true,y_pred):
  numerator = ((y_true - y_pred) ** 2).sum()
  denominator = ((y_true - y_pred.mean()) ** 2).sum()
  try:
    r2 = 1 - numerator / denominator
  except ZeroDivisionError:
      print('Dzielenie przez zero')
  return r2

In [31]:
r2_score(y_true, y_pred)

0.7636194868530246