In [None]:
Don’t get confused by its name! 
It is a classification not a regression algorithm. 
It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false )
based on given set of independent variable(s). 
In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function. 
Hence, it is also known as logit regression.
Since, it predicts the probability, its output values lies between 0 and 1 (as expected).

Again, let us try and understand this through a simple example.

Let’s say your friend gives you a puzzle to solve. 
There are only 2 outcome scenarios – either you solve it or you don’t. 
Now imagine, that you are being given wide range of puzzles / quizzes in an attempt to understand which 
subjects you are good at. The outcome to this study would be something like this – 
if you are given a trignometry based tenth grade problem, you are 70% likely to solve it. 
On the other hand, if it is grade fifth history question, the probability of getting an answer is only 30%. 
This is what Logistic Regression provides you.

Coming to the math, the log odds of the outcome is modeled as a linear combination of the predictor variables.

odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk

NB/  *Logistic Regression is a Machine Learning algorithm which is used for the binary classification problems, 
      it is a predictive analysis algorithm and based on the concept of probability.
     *Logistic Regression can be used for various classification problems such as spam detection.
     *Logistic Regression is one of the most simple and commonly used Machine Learning algorithms for two-class classification.
     *It is the go-to method for binary classification problems (problems with two class values).
     *Logistic regression is named for the function used at the core of the method, the logistic function (sigmoid function).
     *sigmoid function is an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1,
     but never exactly at those limits.

     1 / (1 + e^-value)

In [None]:
"""
code example
"""

# importing required libraries
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')


print(train_data.head())

# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)

# Now, we need to predict the missing target variable in the test data
# target variable - Survived

# seperate the independent and target variable on training data
train_x = train_data.drop(columns=['Survived'],axis=1)
train_y = train_data['Survived']

# seperate the independent and target variable on testing data
test_x = test_data.drop(columns=['Survived'],axis=1)
test_y = test_data['Survived']

'''
Create the object of the Logistic Regression model
You can also add other parameters and test your code here
Some parameters are : fit_intercept and penalty
Documentation of sklearn LogisticRegression: 

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

 '''
model = LogisticRegression()

# fit the model with the training data
model.fit(train_x,train_y)

# coefficeints of the trained model
print('Coefficient of model :', model.coef_)

# intercept of the model
print('Intercept of model',model.intercept_)

# predict the target on the train dataset
predict_train = model.predict(train_x)
print('Target on train data',predict_train) 

# Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('accuracy_score on train dataset : ', accuracy_train)

# predict the target on the test dataset
predict_test = model.predict(test_x)
print('Target on test data',predict_test) 

# Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('accuracy_score on test dataset : ', accuracy_test)

In [None]:
"""
Working out an example using pima indian diabetes dataset
"""
import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix,accuracy_score
from sklearn.linear_model import LogisticRegression
import seaborn as sn
from sklearn import metrics
import matplotlib.pyplot as plt


column_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']

# load dataset
pima = pd.read_csv("pima-indians-diabetes.csv", header=None, names=column_names)


#split dataset in features and target variable
feature_columns = ['pregnant', 'insulin', 'bmi', 'age','glucose','bp','pedigree']
X = pima[feature_columns] # Features
y = pima.label # Target variable

# split X and y into training and testing sets with our test data taking 25% & train data 75%
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=0)

model = LogisticRegression(max_iter=1000)

model.fit(X_train,y_train)

y_pred=model.predict(X_test)

pima.head()

print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print(accuracy_score(y_test, y_pred))

#print shape of our dataset
print('Shape of training data:',X_train.shape)
print('Shape of testing data :',X_test.shape)

#confusion matrix
confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], colnames=['Predicted'])
sn.heatmap(confusion_matrix, annot=True)

print('Accuracy: ',metrics.accuracy_score(y_test, y_pred))
plt.show()