# *Logistic Regression*

A supervised machine learning algorithm used for binary classification, i.e., 0/1, Yes/No, True/False.

- **Key Points**:
    - Predicts the probability of an event occurings.
    - Uses the sigmoid function to convert linear output into a value between 0 and 1.
    - Decision rule:
        - If probability >= 0.5 --> Class 1
        - If probability < 0.5 --> Class 0
    - Useful for problems like spam detection, pass/fail prediction, customer churn, etc.

# *Regularization*

It is used to prevent overfitting by adding a penalty to large coefficients.

- **Types**
    1. Lasso Regression (L1)
    2. Ridge Regression (L2)
    3. Elastic Net (L1 + L2)

## *Lasso Regression*

It adds a penalty equal to the absolute value of coefficients (L1 norms).

- **Features**
    - Can shrink some coefficients exactly to Zero.
    - Perfroms feature selection automatically.
    - Useful when only a few features are important.

## *Ridge Regression*

It adds penalty equal to the square of the coefficients (L2 norm) to the loss function.

- **Features**
    - Shrinks coefficients but does not make it equal to Zero.
    - Works well when many features contribute to the output.
    - Good for handling multicollinearity.

In [1]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings 
warnings.filterwarnings("ignore")

In [2]:
df = sns.load_dataset('mpg')
df

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,usa,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,usa,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,usa,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,usa,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,usa,ford torino
...,...,...,...,...,...,...,...,...,...
393,27.0,4,140.0,86.0,2790,15.6,82,usa,ford mustang gl
394,44.0,4,97.0,52.0,2130,24.6,82,europe,vw pickup
395,32.0,4,135.0,84.0,2295,11.6,82,usa,dodge rampage
396,28.0,4,120.0,79.0,2625,18.6,82,usa,ford ranger


In [3]:
# drop the data
df.drop("name", axis = 1, inplace = True)

In [4]:
df.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin
0,18.0,8,307.0,130.0,3504,12.0,70,usa
1,15.0,8,350.0,165.0,3693,11.5,70,usa
2,18.0,8,318.0,150.0,3436,11.0,70,usa
3,16.0,8,304.0,150.0,3433,12.0,70,usa
4,17.0,8,302.0,140.0,3449,10.5,70,usa


In [5]:
# check nulll value present in the dataset
df.isna().sum()

mpg             0
cylinders       0
displacement    0
horsepower      6
weight          0
acceleration    0
model_year      0
origin          0
dtype: int64

In [6]:
# type of data
df.dtypes

mpg             float64
cylinders         int64
displacement    float64
horsepower      float64
weight            int64
acceleration    float64
model_year        int64
origin           object
dtype: object

In [7]:
df.shape

(398, 8)

since we have not done outlier treatment, than idea would be to replace the missing value with median

In [8]:
df['horsepower'].median()

93.5

In [9]:
# fillup messing value
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())

In [10]:
df.isna().sum()

mpg             0
cylinders       0
displacement    0
horsepower      0
weight          0
acceleration    0
model_year      0
origin          0
dtype: int64

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           398 non-null    float64
 1   cylinders     398 non-null    int64  
 2   displacement  398 non-null    float64
 3   horsepower    398 non-null    float64
 4   weight        398 non-null    int64  
 5   acceleration  398 non-null    float64
 6   model_year    398 non-null    int64  
 7   origin        398 non-null    object 
dtypes: float64(4), int64(3), object(1)
memory usage: 25.0+ KB


In [12]:
df['origin'].value_counts()

origin
usa       249
japan      79
europe     70
Name: count, dtype: int64

In [13]:
# data encoding 
df['origin'] = df['origin'].map({"usa": 1, "japan": 2, "europe":3})

In [14]:
df['origin']

0      1
1      1
2      1
3      1
4      1
      ..
393    1
394    3
395    1
396    1
397    1
Name: origin, Length: 398, dtype: int64

In [15]:
# converting to int "astype"
# df['origin'] = df['origin'].astype(int)

In [16]:
df.dtypes

mpg             float64
cylinders         int64
displacement    float64
horsepower      float64
weight            int64
acceleration    float64
model_year        int64
origin            int64
dtype: object

In [17]:
# separate into x and y

In [18]:
x = df.drop('mpg', axis= 1)
y = df['mpg']

In [19]:
x

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model_year,origin
0,8,307.0,130.0,3504,12.0,70,1
1,8,350.0,165.0,3693,11.5,70,1
2,8,318.0,150.0,3436,11.0,70,1
3,8,304.0,150.0,3433,12.0,70,1
4,8,302.0,140.0,3449,10.5,70,1
...,...,...,...,...,...,...,...
393,4,140.0,86.0,2790,15.6,82,1
394,4,97.0,52.0,2130,24.6,82,3
395,4,135.0,84.0,2295,11.6,82,1
396,4,120.0,79.0,2625,18.6,82,1


In [20]:
# train test split

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test =  train_test_split(x, y, test_size= 0.3, random_state= 1)

In [21]:
x_train.shape

(278, 7)

In [22]:
x_test.shape

(120, 7)

In [23]:
# simple linear regression model

from sklearn.linear_model import LinearRegression
regression_model = LinearRegression()

In [24]:
regression_model

In [25]:
# model train
regression_model.fit(x_train, y_train)

In [26]:
regression_model.coef_

array([-0.31761423,  0.02623748, -0.01827076, -0.00748775,  0.05040673,
        0.84709514,  1.51909584])

In [27]:
for i, col_name in enumerate(x_train.columns):
    print(f"The coefficent for{col_name} is {regression_model.coef_[i]} ")

The coefficent forcylinders is -0.3176142302799293 
The coefficent fordisplacement is 0.02623748259907899 
The coefficent forhorsepower is -0.01827076491312455 
The coefficent forweight is -0.007487750398361904 
The coefficent foracceleration is 0.05040673461971422 
The coefficent formodel_year is 0.847095142706137 
The coefficent fororigin is 1.519095838797506 


In [28]:
#observation:
#coefficients are relativelly smaller, if one independedent variable changes
#there will be not much different in prediction
#This is smotimes called as smoother model

# These feature might not be contributing in model training

In [29]:
from sklearn.metrics import r2_score
y_pred_linear = regression_model.predict(x_test)

In [30]:
r2_linear = r2_score(y_test, y_pred_linear)
print(f"The square of linear regression {r2_linear} ")

The square of linear regression 0.8348001123742284 


Classifier using Ridge regression.

This classifier first converts the target values into {-1, 1} and then treats the problem as a regression task (multi-output regression in the multiclass case).

alphafloat, default=1.0
  -Regularization strength; must be a positive float. 
  Regularization improves the conditioning of the problem and reduces the variance of the estimates. 
  Larger values specify stronger regularization. Alpha corresponds to 1 / (2C) in other linear models such as LogisticRegression or LinearSVC.

fit_interceptbool, default=True
 -Whether to calculate the intercept for this model. If set to false, 
  no intercept will be used in calculations (e.g. data is expected to be already centered).

copy_Xbool, default=True 
 -If True, X will be copied; else, it may be overwritten.

max_iterint, default=None
 -Maximum number of iterations for conjugate gradient solver.
 The default value is determined by scipy.sparse.linalg.

tolfloat, default=1e-4
 -The precision of the solution (coef_) is determined by tol which specifies   a different convergence criterion for each solver:

‘svd’:      tol has no impact.
‘cholesky’: tol has no impact.
‘sparse_cg’: norm of residuals smaller than tol.
‘lsqr’:     tol is set as atol and btol of scipy.sparse.linalg.lsqr, which            
            control the norm of the residual vector in terms of the norms of                         
            matrix and coefficients.

‘sag’ and ‘saga’: relative change of coef smaller than tol.

‘lbfgs’: maximum of the absolute (projected) gradient=max|residuals| smaller than tol.

In [31]:
# regularised model
# ridge reguression

In [32]:
from sklearn.linear_model import Ridge
ridge_regression_model = Ridge(alpha=0.1)
ridge_regression_model

#in practical implimentation lambda is alpha

In [33]:
ridge_regression_model.fit(x_train, y_train)

In [34]:
for i, col_name in enumerate(x_train.columns):
    print(f"The coefficent for{col_name} is {ridge_regression_model.coef_[i]} ")

The coefficent forcylinders is -0.3170032101006747 
The coefficent fordisplacement is 0.026213249757983826 
The coefficent forhorsepower is -0.018263252481448698 
The coefficent forweight is -0.007487326050213167 
The coefficent foracceleration is 0.05036896947443218 
The coefficent formodel_year is 0.8470062938903183 
The coefficent fororigin is 1.5174528285653954 


 For ridge resgression evaluation

In [35]:
y_pred_ridge = ridge_regression_model.predict(x_test)
r2_ridge = r2_score(y_test, y_pred_ridge)
print(f"The R square of Ridge regression {r2_ridge}")

The R square of Ridge regression 0.8348084889168355


In [36]:
# we dont see much variation in coeff of ridge regression as compared regression

In [37]:
from sklearn.linear_model import Lasso
lasso_regression_model = Lasso(alpha = 0.5)
lasso_regression_model

In [38]:
lasso_regression_model.fit(x_train, y_train)

In [39]:
for i, col_name in enumerate(x_train.columns):
    print(f"The coefficent for{col_name} is {lasso_regression_model.coef_[i]} ")

The coefficent forcylinders is -0.0 
The coefficent fordisplacement is 0.0062081988883003845 
The coefficent forhorsepower is -0.011058382987169572 
The coefficent forweight is -0.00698267316802309 
The coefficent foracceleration is 0.0 
The coefficent formodel_year is 0.744654952003819 
The coefficent fororigin is 0.0 


In [40]:
# there feature coefficent is 0, lasso helps in feature selection

In [41]:
y_pred_lasso = lasso_regression_model.predict(x_test)
r2_lasso = r2_score(y_test, y_pred_lasso)
print(f"The R square of lasso regression {r2_lasso}")

The R square of lasso regression 0.8277934716635555


ELASTIC NET

In [42]:
from sklearn.linear_model import ElasticNet
elasticNet_regression_model = ElasticNet(alpha=1, l1_ratio=0.5)
elasticNet_regression_model

In [43]:
# train
elasticNet_regression_model.fit(x_train, y_train)

In [44]:
for i, col_name in enumerate(x_train.columns):
    print(f"The coefficent for{col_name} is {elasticNet_regression_model.coef_[i]} ")

The coefficent forcylinders is -0.0 
The coefficent fordisplacement is 0.005888869953667572 
The coefficent forhorsepower is -0.0124038749335701 
The coefficent forweight is -0.006934550516257633 
The coefficent foracceleration is 0.0 
The coefficent formodel_year is 0.7133150744603874 
The coefficent fororigin is 0.0 


In [45]:
y_pred_elasticNet = elasticNet_regression_model.predict(x_test)
r2_Elastic = r2_score(y_test, y_pred_elasticNet)
print(f"The Elastic square of lasso regression {r2_Elastic}")

The Elastic square of lasso regression 0.8284840073256803


Regularisation with crossvaliditon

In [46]:
from sklearn.linear_model import LassoCV
lasso_cv = LassoCV(cv = 5, verbose=1)

In [47]:
lasso_cv

In [48]:
# train nodel "fit"
lasso_cv.fit(x_train, y_train)

....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

In [49]:
# predict model "predict"
y_pred =  lasso_cv.predict(x_test)
score = r2_score(y_test, y_pred)
print(f"The R square of Lasso of cv {score}")

The R square of Lasso of cv 0.808280598384475


In [50]:
from sklearn.linear_model import RidgeCV
ridgecv = RidgeCV(cv = 5)
ridgecv.fit(x_train, y_train) # the model is train

In [51]:
y_pred = ridgecv.predict(x_test)
r2_score(y_test, y_pred)

0.8354145247502053

In [52]:
ridgecv.get_params()

{'alpha_per_target': False,
 'alphas': (0.1, 1.0, 10.0),
 'cv': 5,
 'fit_intercept': True,
 'gcv_mode': None,
 'scoring': None,
 'store_cv_results': None,
 'store_cv_values': 'deprecated'}

In [53]:
from sklearn.linear_model import ElasticNetCV

In [54]:
elastic_cv = ElasticNetCV(cv = 5)
elastic_cv.fit(x_train, y_train)

In [55]:
y_pred = elastic_cv.predict(x_test)

In [56]:
r2_score(y_test, y_pred)

0.792863401804916

## Lasso, ridge and elstic net implementation

In [80]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.linear_model import Lasso, Ridge, ElasticNet

In [58]:
# gridesearchcvvfor lasso
# define the model
lasso = Lasso()

In [59]:
lasso

Gridge search cv >> Dictionary with parameters name (str) as keys and lists of parameter setting to try as value.

In [60]:
param_grid = {'alpha': [0.001, 0.01, 0.1, 1, 10, 100]}

In [61]:
param_grid

{'alpha': [0.001, 0.01, 0.1, 1, 10, 100]}

In [62]:
grid_search = GridSearchCV(estimator=lasso, param_grid=param_grid, cv= 5, scoring= 'r2', verbose=2)

In [63]:
grid_search

In [64]:
grid_search.fit(x_train, y_train)

Fitting 5 folds for each of 6 candidates, totalling 30 fits
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END .........................................alpha=0.01; total time=   0.0s
[CV] END .........................................alpha=0.01; total time=   0.0s
[CV] END .........................................alpha=0.01; total time=   0.0s
[CV] END .........................................alpha=0.01; total time=   0.0s
[CV] END .........................................alpha=0.01; total time=   0.0s
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ........................................

In [65]:
grid_search.best_params_

{'alpha': 0.1}

In [66]:
grid_search.best_params_['alpha']

0.1

In [67]:
grid_search.best_score_ # Mean cross-validated score of the best_estimater

0.7964209726696481

In [68]:
# Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data.
grid_search.best_estimator_ 

In [69]:
y_pred= grid_search.best_estimator_.predict(x_test)
r2_score(y_test, y_pred)

0.8345318641232302

In [70]:
# randomized search cv for lasso 
lasso = Lasso()
param_distributions = {'alpha': [0.001, 0.01, 0.1, 10, 1, 100]}
Random_search_lasso = RandomizedSearchCV(estimator=lasso, param_distributions=param_distributions, cv=5, scoring="r2", verbose=2)

In [71]:
Random_search_lasso

In [72]:
Random_search_lasso.fit(x_train, y_train)
# due to n_iter parameter in each cross validation 3 parameters are selected

Fitting 5 folds for each of 6 candidates, totalling 30 fits
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END .........................................alpha=0.01; total time=   0.0s
[CV] END .........................................alpha=0.01; total time=   0.0s
[CV] END .........................................alpha=0.01; total time=   0.0s
[CV] END .........................................alpha=0.01; total time=   0.0s
[CV] END .........................................alpha=0.01; total time=   0.0s
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ........................................

In [73]:
Random_search_lasso.best_estimator_

In [74]:
Random_search_lasso.best_score_

0.7964209726696481

In [None]:
y_pred = Random_search_lasso.best_estimator_.predict(x_test)
r2_score(y_test, y_pred)

In [90]:
# Ridge Grid search CV
ridge = Ridge()
param_grid = {'alpha' : [0.001, 0.1, 0.1, 1, 10, 100]}
ridge = GridSearchCV(estimator=ridge, param_grid=param_grid, cv=5, scoring='r2', verbose=2)
ridge.fit(x_train, y_train)
print("Best parameters:", ridge.best_params_)
print("Best estimator:", ridge.best_estimator_)
y_pred= ridge.best_estimator_.predict(x_test)
print("Test R² score:", r2_score(y_test, y_pred))

Fitting 5 folds for each of 6 candidates, totalling 30 fits
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ........................................alpha=0.001; total time=   0.0s
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ........................................

In [89]:
# Ridge RandomizedSearchCV CV
ridge = Ridge()
param_distributions = {'alpha' : [0.001, 0.1, 0.1, 1, 10, 100]}
ridge = RandomizedSearchCV(estimator=ridge, param_distributions=param_distributions,n_iter=2, cv=5, scoring='r2', verbose=2)
ridge.fit(x_train, y_train)
print("Best parameters:", ridge.best_params_)
print("Best estimator:", ridge.best_estimator_)
y_pred= ridge.best_estimator_.predict(x_test)
print("Test R² score:", r2_score(y_test, y_pred))

Fitting 5 folds for each of 2 candidates, totalling 10 fits
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ..........................................alpha=0.1; total time=   0.0s
[CV] END ............................................alpha=1; total time=   0.0s
[CV] END ............................................alpha=1; total time=   0.0s
[CV] END ............................................alpha=1; total time=   0.0s
[CV] END ............................................alpha=1; total time=   0.0s
[CV] END ............................................alpha=1; total time=   0.0s
Best parameters: {'alpha': 1}
Best estimator: Ridge(alpha=1)
Test R² score: 0.834881492591245


In [88]:
# Elastic net 
from sklearn.linear_model import ElasticNet
model = ElasticNet()
param_grid = {"alpha": [0.001, 0.01, 0.1, 1, 10, 100],
             "l1_ratio": [0.1, 0.4, 0.9]}

model = GridSearchCV(estimator=model, param_grid=param_grid, cv = 5, scoring = 'r2', verbose=2)
model.fit(x_train, y_train)
print("Best parameters:", model.best_params_)
print("Best estimator:", model.best_estimator_)
y_pred = model.best_estimator_.predict(x_test)
print("Test R² score:", r2_score(y_test, y_pred))

Fitting 5 folds for each of 18 candidates, totalling 90 fits
[CV] END ..........................alpha=0.001, l1_ratio=0.1; total time=   0.0s
[CV] END ..........................alpha=0.001, l1_ratio=0.1; total time=   0.0s
[CV] END ..........................alpha=0.001, l1_ratio=0.1; total time=   0.0s
[CV] END ..........................alpha=0.001, l1_ratio=0.1; total time=   0.0s
[CV] END ..........................alpha=0.001, l1_ratio=0.1; total time=   0.0s
[CV] END ..........................alpha=0.001, l1_ratio=0.4; total time=   0.0s
[CV] END ..........................alpha=0.001, l1_ratio=0.4; total time=   0.0s
[CV] END ..........................alpha=0.001, l1_ratio=0.4; total time=   0.0s
[CV] END ..........................alpha=0.001, l1_ratio=0.4; total time=   0.0s
[CV] END ..........................alpha=0.001, l1_ratio=0.4; total time=   0.0s
[CV] END ..........................alpha=0.001, l1_ratio=0.9; total time=   0.0s
[CV] END ..........................alpha=0.001, 