<a href="https://colab.research.google.com/github/Metallicode/Math/blob/main/Multi_Class_Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#MultiClass Logistic Regression

##Iris flower DataSet

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

###Get the data

In [2]:
df = pd.read_csv("iris.csv")

In [5]:
X = df.drop(["species"], axis=1)
y = df["species"]

###Split and Scale

In [8]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import  StandardScaler

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [10]:
scaler = StandardScaler()
scaler.fit(X_train)

In [11]:
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

###Model and GridSearch

In [13]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

In [21]:
logistic_model = LogisticRegression(solver="saga", multi_class="ovr", max_iter=5000, verbose=0)

###GridSearch parameters

In [22]:
penalty = ["l1","l2","elasticnet"]
l1_ratio = np.linspace(0,1,20)
C = np.logspace(0,10,20)

param_grid = {
    "penalty" : penalty,
    "l1_ratio" : l1_ratio,
    "C" : C
}

In [23]:
grid_model = GridSearchCV(logistic_model, param_grid=param_grid)

###Train

In [None]:
from tables.tests.common import verbosePrint
grid_model.fit(X_train_scaled, y_train)

In [26]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [27]:
grid_model.best_params_

{'C': 3.3598182862837818, 'l1_ratio': 0.0, 'penalty': 'l1'}

In [30]:
y_preds = grid_model.predict(X_test_scaled)

In [31]:
accuracy_score(y_preds, y_test)

1.0

In [32]:
confusion_matrix(y_preds, y_test)

array([[19,  0,  0],
       [ 0, 13,  0],
       [ 0,  0, 13]])

###Solver

The "solver" in the context of logistic regression refers to the optimization algorithm used to find the parameters (e.g., weights and bias) that minimize the loss function. Logistic regression involves finding the maximum likelihood estimate of the parameters, and this optimization task is done using various algorithms.

Different solvers might be preferable depending on the nature of the data and the problem. Here are the solvers provided in `scikit-learn`'s implementation of logistic regression:

1. **liblinear**:
    - Algorithm: It's a library for large linear classification which uses a coordinate descent (CD) algorithm.
    - Pros: Good for small datasets.
    - Limitations: Does not handle multinomial loss natively; instead, it uses a one-vs-rest scheme.

2. **lbfgs**:
    - Algorithm: Stands for Limited-memory Broyden–Fletcher–Goldfarb–Shanno. It approximates the second derivative matrix updates with an estimation from the last few gradient evaluations.
    - Pros: Works well for small to medium datasets.
    - Limitations: Might be slow for very large datasets.

3. **newton-cg**:
    - Algorithm: Newton's method which also uses second-order derivative information.
    - Pros: Can converge faster than other methods because it leverages the second order derivative.
    - Limitations: Needs to compute the second derivative matrix, which is computationally expensive for large datasets.

4. **sag**:
    - Algorithm: Stochastic Average Gradient descent. It uses a stochastic gradient descent approach, and only a random subset of the samples is used to compute the gradients at each step.
    - Pros: Fast for large datasets.
    - Limitations: Requires a lot of iterations to converge.

5. **saga**:
    - Algorithm: Extension of `sag` which also allows for L1 regularization.
    - Pros: Useful for large datasets with L1 regularization.
    - Limitations: Like `sag`, it requires a lot of iterations to converge.

When to use which solver:

- For **small datasets**, `liblinear` is often good enough.
- For **multinomial loss** or **multi-class problems**, prefer `lbfgs`, `sag`, `saga`, or `newton-cg`.
- For **large datasets**, `sag` or `saga` might be the best options because of their efficiency with large sample sizes.

It's also worth noting that, depending on the regularization and data, not all solvers can handle all types of regularization, and sometimes specific solvers are better suited for specific regularizations.

##"solver" and "penalty"

1. **Solver**:
   - **What it is**: The optimization algorithm used to find the parameters (coefficients) of the logistic regression model that minimize the loss function.
   - **Purpose**: Solves the optimization problem, i.e., finds the best weights/coefficients for the logistic regression model.
   - **Common Options**: `liblinear`, `lbfgs`, `newton-cg`, `sag`, `saga` (as detailed in the previous answer).

2. **Penalty**:
   - **What it is**: The type of regularization applied to the logistic regression model.
   - **Purpose**: Regularization is a technique used to prevent overfitting by adding a penalty to the magnitude of coefficients. Depending on the type of penalty, it can drive some coefficients to zero (L1) or just shrink coefficients (L2).
   - **Common Options**:
     - `l1`: Lasso regularization. Can drive some coefficients to zero, leading to feature selection. Typically used with solvers like `liblinear` and `saga`.
     - `l2`: Ridge regularization. Shrinks the coefficients but doesn't set any to zero. Compatible with a wider range of solvers, like `lbfgs`, `newton-cg`, `sag`, `saga`, and `liblinear`.
     - `elasticnet`: A combination of L1 and L2 regularization. It seeks to blend the properties of both Lasso and Ridge. Typically used with the `saga` solver.
     - `none`: No regularization.

The interplay:

- Not all combinations of solvers and penalties are compatible in `scikit-learn`. For example, the `liblinear` solver doesn't support the `elasticnet` penalty.
- Regularization (specified by the "penalty") can help in preventing overfitting, especially in scenarios where the number of features is high relative to the number of training samples. The strength of the regularization is typically controlled by another parameter, often called `C` or `alpha` in `scikit-learn`.
- The choice of solver can affect the speed of convergence and accuracy of the solution. Some solvers might work better for certain types of data or penalty types.

In summary, while the solver determines how the model will optimize its weights, the penalty determines how those weights will be regularized to prevent overfitting.