<img src="https://drive.google.com/uc?id=1-cL5eOpEsbuIEkvwW2KnpXC12-PAbamr" style="Width:1000px">

# Solvers ⚙️

In this exercise, you will investigate the effects of different `solvers` on `LogisticRegression` models.

👇 Run the code below

In [1]:
from nbta.utils import download_data
download_data(id='1ZvgTmG7Hy5Ot_H1TO0iSsZH2itDhtLdB')

In [2]:
import pandas as pd

df = pd.read_csv("raw_data/data.csv")

df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol,quality rating
0,9.47,5.97,7.36,10.17,6.84,9.15,9.78,9.52,10.34,8.8,6
1,10.05,8.84,9.76,8.38,10.15,6.91,9.7,9.01,9.23,8.8,7
2,10.59,10.71,10.84,10.97,9.03,10.42,11.46,11.25,11.34,9.06,4
3,11.0,8.44,8.32,9.65,7.87,10.92,6.97,11.07,10.66,8.89,8
4,12.12,13.44,10.35,9.95,11.09,9.38,10.22,9.04,7.68,11.38,3


- The dataset consists of different wines 🍷
- The features describe different properties of the wines 
- The target 🎯 is a quality rating given by an expert

## Target engineering

In this section, you are going to transform the ratings into a binary target.

👇 How many observations are there for each rating?

In [3]:
df['quality rating'].value_counts()

quality rating
10    10143
5     10124
1     10090
2     10030
8      9977
6      9961
9      9955
7      9954
4      9928
3      9838
Name: count, dtype: int64

👇 Create `y` by transforming the target into a binary classification task where quality ratings below 6 are bad [0], and ratings of 6 and above are good [1]

In [4]:
y=df['quality rating'].map(lambda x: 0 if x<6 else 1)
y

0        1
1        1
2        0
3        1
4        0
        ..
99995    1
99996    1
99997    0
99998    1
99999    0
Name: quality rating, Length: 100000, dtype: int64

👇 Check the class balance of the new binary target

In [5]:
y.value_counts()

quality rating
0    50010
1    49990
Name: count, dtype: int64

Create your `X` by scaling the features. This will allow for fair comparison of different solvers.

In [6]:
from sklearn.preprocessing import MinMaxScaler

features_df = df.drop('quality rating', axis=1)
features = features_df.columns

scaler = MinMaxScaler().fit(features_df)
X = scaler.transform(features_df)
X

array([[0.53134796, 0.28524436, 0.26596605, ..., 0.43217287, 0.55750274,
        0.41352313],
       [0.57680251, 0.42011278, 0.45998383, ..., 0.37094838, 0.43592552,
        0.41352313],
       [0.61912226, 0.50798872, 0.54729184, ..., 0.63985594, 0.66703176,
        0.43202847],
       ...,
       [0.59090909, 0.52067669, 0.6200485 , ..., 0.4729892 , 0.52464403,
        0.40355872],
       [0.35736677, 0.19031955, 0.2392886 , ..., 0.515006  , 0.33625411,
        0.45907473],
       [0.82523511, 0.54934211, 0.37590946, ..., 0.6062425 , 0.44797371,
        0.47829181]])

In [7]:
pd.DataFrame(X).describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
count,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0
mean,0.612494,0.478552,0.479484,0.445078,0.399353,0.453899,0.477875,0.489762,0.520226,0.463207
std,0.102899,0.126472,0.123195,0.129879,0.111011,0.106351,0.117424,0.120383,0.109563,0.10143
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.559561,0.388628,0.392886,0.355014,0.324146,0.382479,0.399061,0.409364,0.445783,0.408541
50%,0.633229,0.495301,0.486661,0.434508,0.379824,0.45406,0.4777,0.489796,0.520263,0.439858
75%,0.674765,0.559211,0.569119,0.526649,0.461963,0.525641,0.557512,0.570228,0.593647,0.501779
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


## LogisticRegression solvers

👇 Logistic Regression models can be optimized using different **solvers**. Find out 
- Which is the `fastest_solver` ?
- What can you say about their respective precision score?

`solvers = ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']`
 
For more information on these 5 solvers, check out [this stackoverflow thread](https://stackoverflow.com/questions/38640109/logistic-regression-python-solvers-defintions)

In [8]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate
import numpy as np

solvers = ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']

for solver in solvers:
    model = LogisticRegression(solver=solver)
    cv = cross_validate(model, X, y, cv=5, scoring='precision')
    print(f"Time for {solver}:{np.mean(cv['fit_time'])}, Precision:{np.mean(cv['test_score'])}")

Time for newton-cg:0.21315593719482423, Precision:0.8743861578395332
Time for lbfgs:0.2950770378112793, Precision:0.8743887617071217
Time for liblinear:0.17984409332275392, Precision:0.8744491043503466
Time for sag:0.30013437271118165, Precision:0.8744398062395279
Time for saga:0.44750180244445803, Precision:0.8743861578395332


In [9]:
# YOUR ANSWER
fastest_solver = "liblinear"

<details>
    <summary>☝️ Intuition</summary>

All solvers should produce similar precision scores because our cost-function is "easy" enough to have a global minimum which is found by all 5 solvers. For very complex cost-functions such as in Deep Learning, different solvers may stopping at different values of the loss function. 

</details> 

###  🧪 Test your code

In [None]:
from nbresult import ChallengeResult

result = ChallengeResult('solvers',
                         fastest_solver=fastest_solver
                         )
result.write()
print(result.check())

## Stochastic Gradient Descent

Logistic Regression models can also be optimized via Stochastic Gradient Descent.

👇 Evaluate a Logistic Regression model optimized via **Stochastic Gradient Descent**. How do its precision score and training time compare to the performance of the models trained in section 2.?


<details>
<summary>💡 Hint</summary>

- If you are stuck, look at the [SGDClassifier doc](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html)!

</details>



In [10]:
from sklearn.linear_model import SGDClassifier

model = SGDClassifier()
cv = cross_validate(model, X, y, cv=5, scoring='precision')
print(f"Time for SGD:{np.mean(cv['fit_time'])}, Precision:{np.mean(cv['test_score'])}")

Time for SGD:0.11465349197387695, Precision:0.8731077623394675


☝️ The SGD model should have the shortest training time, for similar performance. This is a direct effect of performing each epoch of the Gradient Descent on a single data point.

## Predictions

👇 Use the best model to predict the binary quality (0 or 1) of the following wine. Store your
- `predicted_class`
- `predicted_proba_of_class`

In [11]:
new_data = pd.read_csv('raw_data/new_data.csv')

new_data

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,sulphates,alcohol
0,9.54,13.5,12.35,8.78,14.72,9.06,9.67,10.15,11.17,12.17


In [15]:
best_model = LogisticRegression(solver='liblinear')

best_model.fit(X,y)

new_data_sc = scaler.transform(new_data)

predicted_class = best_model.predict(new_data_sc)[0]

predicted_proba_of_class = best_model.predict_proba(new_data_sc)[0][0]

print(f"This wine is of class {predicted_class} with a probability of {predicted_proba_of_class}. In other words, it's bad!")

This wine is of class 0 with a probability of 0.9669923040921434. In other words, it's bad!


# 🏁  Check your code

In [None]:
from nbresult import ChallengeResult

result = ChallengeResult('new_data_prediction',
    predicted_class=predicted_class,
    predicted_proba_of_class=predicted_proba_of_class
)
result.write()
print(result.check())

# 🏁 Finished!

Well done! <span style="color:teal">**Push your exercise to GitHub**</span>, and move on to the next one.