# 20MAI0005_Abhishek_Kumar

# Logistic Regression and Naïve Bayes Classifier

# Logistic Regression

Contrary to popular belief, logistic regression IS a regression model. The model builds a regression model to predict the probability that a given data entry belongs to the category numbered as “1”. Just like Linear regression assumes that the data follows a linear function, Logistic regression models the data using the sigmoid function.

FORMULA: g(z) = {1}/{1 + e^(-z)}

# Naïve Bayes Classifier

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set. There is not a single algorithm for training such classifiers, but a family of algorithms based on a common principle: all naive Bayes classifiers assume that the value of a particular feature is independent of the value of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 10 cm in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of any possible correlations between the color, roundness, and diameter features.

In [4]:
# Supress Warnings

import warnings
warnings.filterwarnings('ignore')

In [1]:
#Import the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
#Read the dataset onto a variable

train=pd.read_csv("titanic_data.csv") #titanic dataset
train.head(5)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [3]:
#Fix the Predictor Variables
df=train[['Survived','Pclass','Sex','Age','Fare']]

In [5]:
#Change male to '1' and female to '0'
df["Sex"]=df["Sex"].apply(lambda sex:1 if sex=="male" else 0)

In [6]:
df['Sex'].value_counts()

1    577
0    314
Name: Sex, dtype: int64

In [7]:
# Handling the Missing Values -  Data Imputation
df.isna().sum()

Survived      0
Pclass        0
Sex           0
Age         177
Fare          0
dtype: int64

In [8]:
df['Age']=df['Age'].fillna(df['Age'].median()) 
#Outliners

In [9]:
# Take a look at the Dataframe
df.head()

Unnamed: 0,Survived,Pclass,Sex,Age,Fare
0,0,3,1,22.0,7.25
1,1,1,0,38.0,71.2833
2,1,3,0,26.0,7.925
3,1,1,0,35.0,53.1
4,0,3,1,35.0,8.05


In [10]:
# Set the Predictor(x) and Response(Y) variables
X=df.drop("Survived", axis=1)
Y=df["Survived"]

# Logistic Regression

In [11]:
#Splitting into training and test set
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=25)

In [12]:
#Logistic regression Model
from sklearn.linear_model import LogisticRegression
logit = LogisticRegression()
logit.fit(X_train, Y_train)

LogisticRegression()

In [13]:
#Compute the Predictions 
Y_pred_L= logit.predict(X_test)
Y_pred_L

array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0,
       0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1,
       0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1,
       1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1,
       1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,
       1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1,
       0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
       0, 0, 1, 0], dtype=int64)

In [14]:
#Confusion Matrix
from sklearn.metrics import confusion_matrix
cm_L=confusion_matrix(Y_test,Y_pred_L)
cm_L

array([[136,  29],
       [ 31,  72]], dtype=int64)

In [15]:
confusion_L=pd.crosstab(Y_pred_L,Y_test,rownames=['predicted'],colnames=['actual'])
print("\n The Confusion Matrix is:")
confusion_L


 The Confusion Matrix is:


actual,0,1
predicted,Unnamed: 1_level_1,Unnamed: 2_level_1
0,136,31
1,29,72


In [16]:
#Accuracy Score
from sklearn.metrics import accuracy_score
accuracy_logistics=accuracy_score(Y_test,Y_pred_L)
print("Accuracy using Logistic Regression is: ",accuracy_logistics)

Accuracy using Logistic Regression is:  0.7761194029850746


In [17]:
#Classification Report
from sklearn.metrics import classification_report
report=classification_report(Y_test,Y_pred_L)
print(report)

              precision    recall  f1-score   support

           0       0.81      0.82      0.82       165
           1       0.71      0.70      0.71       103

    accuracy                           0.78       268
   macro avg       0.76      0.76      0.76       268
weighted avg       0.78      0.78      0.78       268



# Gaussian Naive Bayes Classifier

In [18]:
#Gaussian Naive Bayes Classifier
from sklearn.naive_bayes import GaussianNB
gnbt=GaussianNB()
gnbt.fit(X_train, Y_train)

GaussianNB()

In [19]:
#Compute the Predictions
Y_pred_G= gnbt.predict(X_test)
Y_pred_G

array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0,
       0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1,
       0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1,
       1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1,
       1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0,
       1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
       0, 0, 1, 1], dtype=int64)

In [20]:
#Confusion Matrix
from sklearn.metrics import confusion_matrix
cm_G=confusion_matrix(Y_test,Y_pred_G)
cm_G

array([[130,  35],
       [ 27,  76]], dtype=int64)

In [21]:
confusion_G=pd.crosstab(Y_pred_G,Y_test,rownames=['predicted'],colnames=['actual'])
print("\n The Confusion Matrix is:")
confusion_G


 The Confusion Matrix is:


actual,0,1
predicted,Unnamed: 1_level_1,Unnamed: 2_level_1
0,130,27
1,35,76


In [22]:
#Accuracy Score
from sklearn.metrics import accuracy_score
accuracy_gnbt=accuracy_score(Y_test,Y_pred_G)
print("Accuracy using Gaussian Naïve Bayes classifier is: ",accuracy_gnbt)

Accuracy using Gaussian Naïve Bayes classifier is:  0.7686567164179104


In [23]:
#Classification Report
from sklearn.metrics import classification_report
report_g=classification_report(Y_test,Y_pred_G)
print(report_g)

              precision    recall  f1-score   support

           0       0.83      0.79      0.81       165
           1       0.68      0.74      0.71       103

    accuracy                           0.77       268
   macro avg       0.76      0.76      0.76       268
weighted avg       0.77      0.77      0.77       268

