# 8. Logistic Regression

***Importing necessary python libraries***

In [22]:
import pandas as pd
import numpy as np

***Reading the contents of a CSV file named "temp.csv" into a dataframe df,***

In [23]:
df = pd.read_csv('diabetes.csv')

***Shows a tuple with row and column size***

In [24]:
df.shape


(768, 9)

***Returns the first few rows of the DataFrame df***

In [25]:
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


***Computes the sum of missing values in each column in df***

In [26]:
df.isnull().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

***making new dataframe x by dropping the column named "Outcome" from the original dataframe df***

In [None]:
x = df.drop(columns= "Outcome")
x

***The dropped column is been called into y dataframe***

In [None]:
y = df["Outcome"]
y

***Importing necessary libraries for regression training, prediction and for finding accuracy***

In [30]:
# import ML related packages of sklearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score


***splitting the data as train and test for representing the features and target variables for testing, with `test_size=0.25` indicating that 30% of the data is allocated for testing and `random_state=0` for reproducibility.***

In [44]:

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.25, random_state=0)

***This line initializes an instance of the Logistic Regression model from scikit-learn.***

In [46]:
lgr = LogisticRegression()

***This line fits the logistic regression model to the training data `x_train` and corresponding target values `y_train`, allowing the model to learn the relationship between the features and the target variable.***

In [47]:
lgr.fit(x_train,y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


***This line predicts the target variable values (y_pred) for the given test features (x_test) using the trained model lr***

In [48]:
y_pred = lgr.predict(x_test)
y_pred

array([1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1,
       1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1,
       1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
       0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0], dtype=int64)

***It represents the coefficients of the model, indicating the change in the predicted target variable for a one-unit change in each predictor variable while holding other variables constant.***

In [49]:
m = lgr.coef_
m

array([[ 0.10321994,  0.03481424, -0.01158648,  0.00956814, -0.00133   ,
         0.07770235,  1.26974503,  0.02158809]])

In [50]:
df = pd.DataFrame({'Actual' : y_test, 'Predicted' : y_pred})
df

Unnamed: 0,Actual,Predicted
661,1,1
122,0,0
113,0,0
14,1,1
529,0,0
...,...,...
366,1,0
301,1,0
382,0,0
140,0,0


* Model Evaluation using Confusion Matrix

In [53]:
from sklearn import metrics
cnf_matrix = metrics.confusion_matrix(y_test, y_pred)
cnf_matrix

array([[115,  15],
       [ 25,  37]], dtype=int64)

* Confusion Metrix Evaluation Metrics

In [54]:
print(metrics.classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.82      0.88      0.85       130
           1       0.71      0.60      0.65        62

    accuracy                           0.79       192
   macro avg       0.77      0.74      0.75       192
weighted avg       0.79      0.79      0.79       192



***Evaluating Accuracy, Precision and Recall to predict the accuracy***

In [55]:
print('Accuracy', metrics.accuracy_score(y_test, y_pred))
print('Precision', metrics.precision_score(y_test, y_pred))
print('Recall', metrics.recall_score(y_test, y_pred))

Accuracy 0.7916666666666666
Precision 0.7115384615384616
Recall 0.5967741935483871


***Mean Squared Error score for the model***

In [57]:
mse = mean_squared_error(y_test, y_pred)
print("MSE --> ", mse)

MSE -->  0.20833333333333334
