# Loss Functions

In this exercise, you will compare the effects of Loss functions on a `LinearRegression` model.

👇 Let's download a CSV file to use for this challenge and parse it into a DataFrame

In [7]:
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression,SGDRegressor
from sklearn.model_selection import cross_validate
from sklearn.metrics import mean_squared_error

In [2]:
import pandas as pd

data = pd.read_csv("https://wagon-public-datasets.s3.amazonaws.com/05-Machine-Learning/04-Under-the-Hood/loss_functions_dataset.csv")
data.sample(5)

Unnamed: 0,Relative Compactness,Surface Area,Wall Area,Roof Area,Overall Height,Glazing Area,Average Temperature
593,0.79,637.0,343.0,147.0,7.0,0.4,41.255
476,0.62,808.5,367.5,220.5,3.5,0.25,14.87
294,0.9,563.5,318.5,122.5,7.0,0.25,32.775
678,0.9,563.5,318.5,122.5,7.0,0.4,37.31
134,0.66,759.5,318.5,220.5,3.5,0.1,13.165


🎯 Your task is to predict the average temperature inside a greenhouse based on its design. Your temperature predictions will help you select the appropriate greenhouse design for each plant, based on their climate needs. 

🌿 You know that plants can handle small temperature variations, but are exponentially more sensitive as the temperature variations increase. 

## 1. Theory 

❓ Theoretically, which Loss function would you train your model on to limit the risk of killing plants?

<details>
<summary> 🆘 Answer </summary>
    
By theory, you would use a Mean Square Error (MSE) Loss function. It would penalize outlier predictions and prevent your model from committing large errors. This would ensure smaller temperature variations and a lower risk for plants.

</details>

Mean Square Error(MSE) loss function

## 2. Application

### 2.1 Preprocessing

❓ Standardise the features

In [30]:
#Select only the features
X = data.loc[:,'Relative Compactness' : 'Glazing Area']
y = data['Average Temperature']
#Fit Scaler
scaler = StandardScaler().fit(X)

#Scaled continuous features
X_scaled = scaler.transform(X)
X_scaled = pd.DataFrame(X_scaled)
X_scaled.columns = X.columns
X_scaled

Unnamed: 0,Relative Compactness,Surface Area,Wall Area,Roof Area,Overall Height,Glazing Area
0,2.041777,-1.785875,-0.561951,-1.470077,1.0,-1.760447
1,2.041777,-1.785875,-0.561951,-1.470077,1.0,-1.760447
2,2.041777,-1.785875,-0.561951,-1.470077,1.0,-1.760447
3,2.041777,-1.785875,-0.561951,-1.470077,1.0,-1.760447
4,1.284979,-1.229239,0.000000,-1.198678,1.0,-1.760447
...,...,...,...,...,...,...
763,-1.174613,1.275625,0.561951,0.972512,-1.0,1.244049
764,-1.363812,1.553943,1.123903,0.972512,-1.0,1.244049
765,-1.363812,1.553943,1.123903,0.972512,-1.0,1.244049
766,-1.363812,1.553943,1.123903,0.972512,-1.0,1.244049


### 2.2 Modeling

In this section, you are going to verify the theory by evaluating models optimized on different Loss functions.

### Least Squares (MSE) Loss

❓ **10-Fold Cross-validate** a Linear Regression model optimized by **Stochastic Gradient Descent** (SGD) on a **Least Squares Loss** (MSE)



In [32]:
#Stochastic Gradient Decent with squared loss
sgd_model = SGDRegressor(loss='squared_error')

#Cross validate model
sgd_model_cv = cross_validate(
    sgd_model,
    X_scaled,y,
    cv=10,
    scoring=['max_error','r2']
)

sgd_model_cv

{'fit_time': array([0.01019287, 0.00435781, 0.00388384, 0.00347996, 0.00396228,
        0.00383592, 0.00369716, 0.00315309, 0.00274611, 0.00252986]),
 'score_time': array([0.00139308, 0.0008719 , 0.00072908, 0.00082374, 0.00079894,
        0.00070214, 0.00067496, 0.00046992, 0.00047207, 0.00046396]),
 'test_max_error': array([-9.77581446, -8.69455311, -8.81116796, -9.23945736, -8.8429631 ,
        -8.60759781, -8.55415236, -8.80749787, -8.34914959, -7.72541889]),
 'test_r2': array([0.78766352, 0.90942944, 0.89579133, 0.88354452, 0.93136952,
        0.89663737, 0.92720915, 0.9159022 , 0.89542898, 0.93836021])}

❓ Compute 
- the mean cross-validated R2 score and save it in the variable `r2`
- the single biggest prediction error in °C of all your folds and save it in the variable `max_error`?

(Tips: `max_error` is an accepted scoring metric in sklearn)

In [34]:
r2 = sgd_model_cv['test_r2'].mean()
r2

0.898133625112802

In [38]:
max_error = abs(sgd_model_cv['test_max_error']).max()
max_error


9.775814455714215

### Mean Absolute Error (MAE) Loss

What if we optimize our model on the MAE instead?

❓ **10-Fold Cross-validate** a Linear Regression model optimized by **Stochastic Gradient Descent** (SGD) on a **MAE** Loss

<details>
<summary>💡 Hints</summary>

- MAE loss cannot be directly specified in `SGDRegressor`. It must be engineered by adjusting the right parameters

</details>

In [39]:
#MAE loss engineerd by setting episoln_insensitive = 0
lin_reg_sgd = SGDRegressor(loss = 'epsilon_insensitive',epsilon = 0).fit(X,y)
lin_reg_sgd



❓ Compute 
- the mean cross-validated R2 score, store it in `r2_mae`
- the single biggest prediction error of all your folds, store it in `max_error_mae`?

In [42]:
mae_sgd_cv = cross_validate(
    lin_reg_sgd,
    X_scaled,y,
    cv=10,
    scoring=['max_error','r2']
)
mae_sgd_cv

{'fit_time': array([0.0133183 , 0.00608206, 0.0082581 , 0.00547814, 0.00576282,
        0.00540614, 0.00449514, 0.00572491, 0.00466585, 0.00473332]),
 'score_time': array([0.00175691, 0.00095892, 0.0012269 , 0.00095224, 0.00069904,
        0.00066781, 0.00082684, 0.00061512, 0.00054717, 0.00053668]),
 'test_max_error': array([-11.11595318, -10.60064642, -10.68871433, -11.23562693,
        -11.14611541, -10.95838664, -10.78674204, -11.18118408,
        -10.95023597, -10.16557683]),
 'test_r2': array([0.74236402, 0.87716769, 0.87361583, 0.84528871, 0.91735772,
        0.87358398, 0.91809089, 0.89846051, 0.87831834, 0.93535156])}

In [43]:
r2_mae = mae_sgd_cv['test_r2'].mean()
r2_mae

0.8759599249527742

In [46]:
max_error_mae = abs(mae_sgd_cv['test_max_error']).max()

In [47]:
max_error_mae

11.235626925129441

## 3. Conclusion

❓Which of the models you evaluated seems the most appropriate for your task?

<details>
<summary> 🆘Answer </summary>
    
Although mean cross-validated r2 scores are approximately similar between the two models, the one optimized on a MAE has more chance to make larger mistakes from time to time, increasing the risk of killing plants!

    
</details>

# 🏁 Check your code and push your notebook

In [48]:
from nbresult import ChallengeResult

result = ChallengeResult(
    'loss_functions',
    r2 = r2,
    r2_mae = r2_mae,
    max_error = max_error,
    max_error_mae = max_error_mae
)

result.write()
print(result.check())


platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /Users/orchidaung/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /Users/orchidaung/code/NwayEi/data-loss-functions/tests
plugins: asyncio-0.19.0, anyio-3.6.2
asyncio: mode=strict
[1mcollecting ... [0mcollected 3 items

test_loss_functions.py::TestLossFunctions::test_max_error_order [32mPASSED[0m[32m   [ 33%][0m
test_loss_functions.py::TestLossFunctions::test_r2 [32mPASSED[0m[32m                [ 66%][0m
test_loss_functions.py::TestLossFunctions::test_r2_mae [32mPASSED[0m[32m            [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/loss_functions.pickle

[32mgit[39m commit -m [33m'Completed loss_functions step'[39m

[32mgit[39m push origin master

