# Logistic Regression Model

**Model**: `LogisticRegression()`

This model is used for classification using logistic regression, which is especially suitable for binary and multiclass classification problems.

**Hypertuning**: Grid Search was performed on the following hyperparameters:

- **C**: `[0.01, 0.05, 0.1, 0.5, 1, 10]`  
  This parameter is used to set the regularization strength. Smaller values of C lead to stronger regularization, while larger values result in weaker regularization. A value of 0.01 was selected as the best choice to prevent overfitting.

- **solver**: `['lbfgs', 'newton-cg', 'saga']`  
  This parameter selects the optimization algorithm for logistic regression. The "newton-cg" algorithm provided the best performance for this model.

- **penalty**: `['l2', 'elasticnet', None]`  
  This parameter determines the type of regularization to use. Selecting "None" means no regularization, which yielded the best results for this model.

- **max_iter**: `[100, 200, 500, 1000]`  
  This parameter sets the maximum number of iterations for optimization algorithms. The value of 100 worked well for this dataset, and more iterations were not needed.

- **multi_class**: `['ovr', 'multinomial']`  
  This parameter specifies the strategy for multiclass classification. Using "multinomial" provided the best performance and higher accuracy.

**Best Parameters**:  
`{'C': 0.01, 'max_iter': 100, 'multi_class': 'multinomial', 'penalty': None, 'solver': 'newton-cg'}`

These settings optimized the performance of the Logistic Regression model for the given dataset.

In [1]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from joblib import dump

In [2]:
X_train = pd.read_csv('traintest\X_train.csv')
y_train = pd.read_csv('traintest\y_train.csv').values.ravel()

In [3]:
model = LogisticRegression()

In [4]:
param_grid = {
    'C': [0.01, 0.05, 0.1, 0.5, 1, 10],
    'solver': ['lbfgs', 'newton-cg', 'saga'],
    'penalty': ['l2', 'elasticnet', None],
    'max_iter': [100, 200, 500, 1000],
    'multi_class': ['ovr', 'multinomial']
}

In [5]:
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, n_jobs=6, verbose=1)

In [6]:
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 432 candidates, totalling 2160 fits


720 fits failed out of a total of 2160.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
240 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\eftekhari\Desktop\DataScienseTest\Chapter4\myenv\lib\site-packages\sklearn\model_selection\_validation.py", line 866, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\eftekhari\Desktop\DataScienseTest\Chapter4\myenv\lib\site-packages\sklearn\base.py", line 1389, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "c:\Users\eftekhari\Desktop\DataScienseTest\Chapter4\myenv\lib\site-packages\sklearn\linear_model\_logistic.py", line 1193, in fit
    solver = _check_solver(self.solver, self.penalty, self.dual)
  

In [7]:
print("Best Parameters: ", grid_search.best_params_)

Best Parameters:  {'C': 0.01, 'max_iter': 100, 'multi_class': 'multinomial', 'penalty': None, 'solver': 'newton-cg'}


In [8]:
best_model = grid_search.best_estimator_
dump(best_model, 'logistic_regression_model.joblib')

['logistic_regression_model.joblib']