# Solvers ⚙️

In this exercise, you will investigate the effects of different `solvers` on `LogisticRegression` models.

👇 Run the code below to import the dataset

## Imports 

In [1]:
import pandas as pd
from sklearn.preprocessing import RobustScaler, MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate
from sklearn.linear_model import SGDClassifier

In [2]:
df = pd.read_csv("https://wagon-public-datasets.s3.amazonaws.com/05-Machine-Learning/04-Under-the-Hood/solvers_dataset.csv")
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol,quality rating
0,9.47,5.97,7.36,10.17,6.84,9.15,9.78,9.52,10.34,8.8,6
1,10.05,8.84,9.76,8.38,10.15,6.91,9.7,9.01,9.23,8.8,7
2,10.59,10.71,10.84,10.97,9.03,10.42,11.46,11.25,11.34,9.06,4
3,11.0,8.44,8.32,9.65,7.87,10.92,6.97,11.07,10.66,8.89,8
4,12.12,13.44,10.35,9.95,11.09,9.38,10.22,9.04,7.68,11.38,3


- The dataset consists of different wines 🍷
- The features describe different properties of the wines 
- The target 🎯 is a quality rating given by an expert

## Target engineering

In this section, you are going to transform the ratings into a binary target.

👇 How many observations are there for each rating?

In [3]:
df["quality rating"].value_counts()

quality rating
10    10143
5     10124
1     10090
2     10030
8      9977
6      9961
9      9955
7      9954
4      9928
3      9838
Name: count, dtype: int64

❓ Create `y` by transforming the target into a binary classification task where quality ratings below 6 are bad [0], and ratings of 6 and above are good [1]

In [4]:
y = df["quality rating"].apply(lambda x : 0 if x < 6 else 1)
y

0        1
1        1
2        0
3        1
4        0
        ..
99995    1
99996    1
99997    0
99998    1
99999    0
Name: quality rating, Length: 100000, dtype: int64

❓ Check the class balance of the new binary target

In [5]:
y.value_counts()

quality rating
0    50010
1    49990
Name: count, dtype: int64

❓ Create your `X` by normalising the features. This will allow for fair comparison of different solvers.

In [6]:
X = df.drop(columns="quality rating")
X

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol
0,9.47,5.97,7.36,10.17,6.84,9.15,9.78,9.52,10.34,8.80
1,10.05,8.84,9.76,8.38,10.15,6.91,9.70,9.01,9.23,8.80
2,10.59,10.71,10.84,10.97,9.03,10.42,11.46,11.25,11.34,9.06
3,11.00,8.44,8.32,9.65,7.87,10.92,6.97,11.07,10.66,8.89
4,12.12,13.44,10.35,9.95,11.09,9.38,10.22,9.04,7.68,11.38
...,...,...,...,...,...,...,...,...,...,...
99995,6.93,4.49,8.25,8.60,9.41,11.07,8.38,10.89,12.42,8.99
99996,10.57,9.56,9.83,8.98,9.77,10.04,10.87,11.28,9.57,8.97
99997,10.23,10.98,11.74,11.76,8.87,9.03,9.93,9.86,10.04,8.66
99998,7.25,3.95,7.03,8.90,8.49,9.75,11.45,10.21,8.32,9.44


In [7]:
r_scaler = RobustScaler() 
r_scaler.fit(X) 
X_scaled_r = pd.DataFrame(r_scaler.transform(X), columns=r_scaler.get_feature_names_out())
X_scaled_r

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol
0,-0.884354,-1.231405,-1.252294,0.410526,-1.088,-0.634328,-0.162963,-0.358209,0.251852,-0.282443
1,-0.489796,-0.440771,-0.151376,-0.531579,0.236,-2.305970,-0.222222,-0.738806,-0.570370,-0.282443
2,-0.122449,0.074380,0.344037,0.831579,-0.212,0.313433,1.081481,0.932836,0.992593,-0.083969
3,0.156463,-0.550964,-0.811927,0.136842,-0.676,0.686567,-2.244444,0.798507,0.488889,-0.213740
4,0.918367,0.826446,0.119266,0.294737,0.612,-0.462687,0.162963,-0.716418,-1.718519,1.687023
...,...,...,...,...,...,...,...,...,...,...
99995,-2.612245,-1.639118,-0.844037,-0.415789,-0.060,0.798507,-1.200000,0.664179,1.792593,-0.137405
99996,-0.136054,-0.242424,-0.119266,-0.215789,0.084,0.029851,0.644444,0.955224,-0.318519,-0.152672
99997,-0.367347,0.148760,0.756881,1.247368,-0.276,-0.723881,-0.051852,-0.104478,0.029630,-0.389313
99998,-2.394558,-1.787879,-1.403670,-0.257895,-0.428,-0.186567,1.074074,0.156716,-1.244444,0.206107


In [8]:
mmscaler = MinMaxScaler().fit(X)
X_scaled_m = pd.DataFrame(mmscaler.transform(X), columns=mmscaler.get_feature_names_out())
X_scaled_m

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol
0,0.531348,0.285244,0.265966,0.504968,0.229879,0.363248,0.451878,0.432173,0.557503,0.413523
1,0.576803,0.420113,0.459984,0.343270,0.412348,0.123932,0.442488,0.370948,0.435926,0.413523
2,0.619122,0.507989,0.547292,0.577236,0.350606,0.498932,0.649061,0.639856,0.667032,0.432028
3,0.651254,0.401316,0.343573,0.457995,0.286659,0.552350,0.122066,0.618247,0.592552,0.419929
4,0.739028,0.636278,0.507680,0.485095,0.464168,0.387821,0.503521,0.374550,0.266156,0.597153
...,...,...,...,...,...,...,...,...,...,...
99995,0.332288,0.215695,0.337914,0.363144,0.371555,0.568376,0.287559,0.596639,0.785323,0.427046
99996,0.617555,0.453947,0.465643,0.397471,0.391400,0.458333,0.579812,0.643457,0.473165,0.425623
99997,0.590909,0.520677,0.620049,0.648600,0.341786,0.350427,0.469484,0.472989,0.524644,0.403559
99998,0.357367,0.190320,0.239289,0.390244,0.320838,0.427350,0.647887,0.515006,0.336254,0.459075


##  LogisticRegression solvers

❓ Logistic Regression models can be optimized using different **solvers**. Make a comparison of the available solvers':
- Fit time - which solver is **the fastest**?
- Precision - **how different** are their respective precision scores?

Available solvers for Logistic Regression are `['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']`
 
For more information on these 5 solvers, check out [this Stack Overflow thread](https://stackoverflow.com/questions/38640109/logistic-regression-python-solvers-defintions)

In [9]:
solvers = ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']
solvers_data = {}
precisions = {}

for solver in solvers:
    model = LogisticRegression(solver=solver, max_iter = 5000)
    cv_results = cross_validate(model, X_scaled_m, y, cv=10, scoring=["precision"])
    solvers_data.update(
        {
            f"{solver}" : [cv_results["fit_time"].mean(), cv_results["test_precision"].mean()]
        }    
    )

print(solvers_data)

{'newton-cg': [0.483899188041687, 0.8742370014407493], 'lbfgs': [0.5039994239807128, 0.8742344676126175], 'liblinear': [0.29310040473937987, 0.8743399285518076], 'sag': [0.8540915012359619, 0.8742343485044115], 'saga': [1.6895635366439818, 0.8742370014407493]}


## Stochastic Gradient Descent

Logistic Regression models can also be optimized via Stochastic Gradient Descent.

❓ Evaluate a Logistic Regression model optimized via **Stochastic Gradient Descent**. How do its precision score and training time compare to the performance of the models trained in section 2?

In [10]:
model = SGDClassifier(loss="log_loss")
res = cross_validate(model, X_scaled_r, y, cv=10, scoring=["precision", "f1"])
fit_time = res["fit_time"].mean()
precision = res['test_precision'].mean()  
print(fit_time, precision)

0.28640387058258054 0.8733753492787535


☝️ The SGD model should have one of the shortest times (maybe even shorter than `liblinear`), for similar performance. This is a direct effect of performing each epoch of the Gradient Descent on a single row as opposed to loading 100k rows into memory at a time.

##  Predictions

❓ Use the best model (balanced with short fit time and high precision) to predict the binary quality (0 or 1) of the following wine. Store your:
- `predicted_class`
- `predicted_proba_of_class` (i.e if your model predicted a class of 1 what is the probability it believes 1 to be the class should be between 0 and 1)

In [11]:
new_wine = pd.read_csv('https://wagon-public-datasets.s3.amazonaws.com/05-Machine-Learning/04-Under-the-Hood/solvers_new_wine.csv')
new_wine

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol
0,9.54,13.5,12.35,8.78,14.72,9.06,9.67,10.15,11.17,12.17


In [12]:
new_wine_scale = r_scaler.transform(new_wine)
new_wine_scale = pd.DataFrame(r_scaler.transform(new_wine), columns=r_scaler.get_feature_names_out())
model = LogisticRegression(solver="lbfgs",max_iter=5000).fit(X_scaled_r, y)
model.predict(new_wine_scale)
predicted_class = model.predict(new_wine_scale)
predicted_proba_of_class = model.predict_proba(new_wine_scale)[0][0]
print(predicted_class, predicted_proba_of_class)

[0] 0.9686586300733363


In [13]:
model = SGDClassifier(loss="log_loss",max_iter=5000).fit(X_scaled_r, y)
model.predict(new_wine_scale)
predicted_class = model.predict(new_wine_scale)
predicted_proba_of_class = model.predict_proba(new_wine_scale)[0][0]
print(predicted_class, predicted_proba_of_class)

[0] 0.9650464889101875
