## Logestic regression

In [None]:
#Load the dataset
import pandas as pd

pima = pd.read_csv("diabetes.csv")
pima

In [None]:
# seperate the dataset in features and target variable.
feature_cols = ['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','Age']
X = pima[feature_cols] #features
y = pima.Outcome #target variable

In [None]:
#split the dataset into training and testing data
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.25,random_state = 0)

#Here,the dataset is broken into two parts in a ratio of 75:25.It means 75% data will be used for model training and 25% for model testing

In [None]:
#import the class


from sklearn.linear_model import LogisticRegression

In [None]:
#instantiate the model with default parameters
logreg = LogisticRegression()
#fit the model with data
logreg.fit(X_train,y_train)
y_predict = logreg.predict(X_test)

In [None]:
#import the metrics class
from sklearn import metrics
cnf_matrix = metrics.confusion_matrix(y_test,y_predict)
cnf_matrix

In [None]:
#confusion matrix evaluation metrics
print("Accuracy:",metrics.accuracy_score(y_test,y_predict))
print("Precision:",metrics.precision_score(y_test,y_predict))
print("Recall:",metrics.recall_score(y_test,y_predict))

Precision :Precision is about being precise i.e.,how accurate your model is.In other words,you can say when a model makes a prediction,how often it is correct.

Recall :If there are patients who have diabetes in test set and how your Logistic Regression model can identify (58%) of the time.

Here,you can see the confusion matrix in the form of array object,with the dimensions 2*2 as it comes under binary classification,we have two classes 0 and 1.Diagonal values represent accurate predictions,while non-diagonal elements represent inaccurate predictions.

In [None]:
#import required modules
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
class_names = [0,1] #names of classes
fig,ax = plt.subplots()
tick_marks =np.arange(len(class_names))
plt.xticks(tick_marks,class_names)
plt.yticks(tick_marks,class_names)
#create heatmap
sns.heatmap(pd.DataFrame(cnf_matrix),annot = True,cmap = 'Purples',fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.xlabel("Predicted label")
plt.ylabel("Actual label")
plt.title("Confusion Matrix")
plt.show()

ROC Curve shows the tradeoff between sensitivity and specificity

In [None]:
y_predict_proba = logreg.predict_proba(X_test)[::,1]
fpr,tpr,_= metrics.roc_curve(y_test,y_predict_proba)
auc = metrics.roc_auc_score(y_test,y_predict_proba)
plt.plot(fpr,tpr,label = "data 1,auc ="+str(auc))
plt.legend(loc = 'best')
plt.show()

AUC score for this case is 0.86,score approx 1 represents the perfect classsifier,and 0.5 rep a worthless classifier