# Stochastic Gradient Descent

In this exercice, you will compare different elements of Batch Gradient Descent and Stochastic Gradient Descent on a binary classification task.

## 1. Data Exploration

👇 Import the dataset located in the folder

In [83]:
import pandas as pd

df = pd.read_csv("data.csv")

df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol,quality rating
0,12.55,12.73,10.92,10.12,9.56,10.46,9.82,10.54,9.81,9.17,2
1,10.67,13.67,11.74,9.06,13.47,9.66,10.31,9.05,10.51,11.75,1
2,10.6,10.34,9.9,8.44,11.01,10.94,6.9,8.13,10.2,9.9,2
3,9.54,13.5,12.35,8.78,14.72,9.06,9.67,10.15,11.17,12.17,3
4,12.01,12.38,11.32,11.41,8.64,11.2,11.63,9.83,10.35,8.82,1


- The dataset consists of different wines. 
- The features describe different properties of the wines. 
- The target is a quality rating given by an expert

👇 Check how many observations make up the dataset

In [84]:
len(df)

500000

👇 Check the range of quality ratings

In [140]:
df['quality rating'].unique()

array([ 2,  1,  3,  5,  4,  6,  9, 10,  8,  7])

👇 Check the number of observations for each quality rating

In [86]:
df['quality rating'].value_counts()

5     50479
10    50426
6     50195
8     50144
2     50093
1     49872
4     49825
3     49660
9     49658
7     49648
Name: quality rating, dtype: int64

## 2. Data Preprocessing

👇 Transform the target into a binary classification task where quality ratings below 6 are bad [0], and ratings of 6 and above are good [1]. 

In [155]:
df['binary_quality'] = pd.cut(x = df['quality rating'], 
                       bins=[0,5,10], 
                       labels=[0,1])
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol,quality rating,binary_quality
0,12.55,12.73,10.92,10.12,9.56,10.46,9.82,10.54,9.81,9.17,2,0
1,10.67,13.67,11.74,9.06,13.47,9.66,10.31,9.05,10.51,11.75,1,0
2,10.6,10.34,9.9,8.44,11.01,10.94,6.9,8.13,10.2,9.9,2,0
3,9.54,13.5,12.35,8.78,14.72,9.06,9.67,10.15,11.17,12.17,3,0
4,12.01,12.38,11.32,11.41,8.64,11.2,11.63,9.83,10.35,8.82,1,0


👇 Check the class balance of the new binary target

In [156]:
df['binary_quality'].value_counts()

1    250071
0    249929
Name: binary_quality, dtype: int64

👇 Scale the features using `MinMaxScaler`

In [164]:
from sklearn.preprocessing import MinMaxScaler

# Select only the features 
X = df.loc[:,'fixed acidity':'alcohol']

# Fit scaler to features
scaler = MinMaxScaler().fit(X)

# Scale features
X_scaled = scaler.transform(X)

## 3. Batch Gradient Descent

👇 10-Fold Cross-validate a LogisticRegression model optimized by Batch Gradient Descent. Return the following metrics:

- Accuracy
- Precision
- Recall
- F1 

In [165]:
from sklearn.model_selection import cross_validate
from sklearn.linear_model import LogisticRegression

log_model = LogisticRegression()

cv_log = cross_validate(log_model,
                        X_scaled, df.binary_quality,
                        cv = 10,
                        scoring = ['accuracy','precision','recall','f1'])

❓ What is the model's total training time?

In [166]:
import numpy as np

np.sum(cv_log['fit_time'])

26.105647563934326

## 4. Stochastic Gradient Descent

👇 10-Fold Cross-validated the same model, this time optimized by Stochastic Gradient Descent. Return the following metrics:

- Accuracy
- Precision
- Recall
- F1 

[SGDClassifier documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html)


<details>
<summary>💡 Hint</summary>
Specify that the model be optimized with the LogisticRegression Loss function.
</details>



In [167]:
from sklearn.linear_model import SGDClassifier

sgd_model = SGDClassifier(loss="log")

cv_sgd = cross_validate(sgd_model,
                        X_scaled, df.binary_quality,
                        cv = 10,
                        scoring = ['accuracy','precision','recall','f1'])

❓ What is the model's total training time?

In [168]:
np.sum(cv_sgd['fit_time'])

8.8377046585083

The SGD model should have a shorter training. This is a direct effect of performing each epoch of the Gradient Descent on a single data point.

## 5. Model Selection

❓ Your task is to recommend wines to returning customers who are unhappy about your last suggestions. Your supervisor warns you that you better get your recommendations right this time. Which model would you chose to guide your decision?

In [102]:
cv_log['test_precision'].mean()

0.8729888563990306

In [103]:
cv_sgd['test_precision'].mean()

0.8764951507633173

## 6. Prediction

❓ Using the model you deemed appropriate for the task, would you recommend the following wine to a client?

In [136]:
new_data = pd.read_csv('new_data.csv')

new_data

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol
0,9.54,13.5,12.35,8.78,14.72,9.06,9.67,10.15,11.17,12.17


👇 Compute your answer

In [170]:
# Fit the SGD model wih the better Precision score to the data
sgd_model.fit(X_scaled, df['binary_quality'])

# Scale the new data with origin scaler
new_data_scaled = scaler.transform(new_data)

# Predict the quality of the new wine
sgd_model.predict(new_data_scaled)

array([0])

### ⚠️ Please, push your exercice when you are done 🙃

# 🏁 