# Logistic Regression

- It is supervised learning algorithm used for classification
- Use <b>logistic</b> or <b>sigmoid</b> function
- <b>The probability of occurance of a binary event utilizing a logit or sigmoid function.</b>
- <b>Odds Ratio = P(event happening)/(1-P(event happening))</b>
- <b>Log of Odds = log(P(event happening)/(1-P(event happening)) = log(p/(1-p)) = y = a*x + b

Here we will compare y=mx+c with log(odds)

### Sigmoid Function

<b>S(x) = 1/(1 + e^-(ax+b))</b>

###### Problem Statement : Based on Age, predict whether the person bought the insurance or not. 

In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [9]:
df = pd.read_csv('./insurance_data.csv')
df.head()

Unnamed: 0,age,bought_insurance
0,22,0
1,25,0
2,47,1
3,52,0
4,46,1


In [10]:
df.shape

(27, 2)

In [13]:
df.isnull().sum()

age                 0
bought_insurance    0
dtype: int64

In [15]:
x = df[['age']]
y = df['bought_insurance']
print(type(x), x.shape)
print(type(y), y.shape)

<class 'pandas.core.frame.DataFrame'> (27, 1)
<class 'pandas.core.series.Series'> (27,)


In [18]:
from sklearn.model_selection import train_test_split

In [20]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

(20, 1)
(7, 1)
(20,)
(7,)


In [21]:
# the test_size parameter determines what persentage of data we want as test data
# (70,30), (75,25), (80,20), (85,15)  these are some of the standard considerations

In [22]:
x_test.head()

Unnamed: 0,age
14,49
24,50
19,18
0,22
6,55


In [23]:
y_test.head()

14    1
24    1
19    0
0     0
6     0
Name: bought_insurance, dtype: int64

In [24]:
from sklearn.linear_model import LogisticRegression

In [25]:
m1 = LogisticRegression()
m1.fit(x_train, y_train)

In [26]:
# In linear regression we calculate r2_score for model performance
# In logistic regression we calculate the accuracy of the model

In [27]:
# accuracy
print("Training Score = ", m1.score(x_train, y_train))
print("Test Score = ", m1.score(x_test, y_test))

Training Score =  0.9
Test Score =  0.8571428571428571


In [29]:
y_pred_m1 = m1.predict(x_test)
print(y_pred_m1)

[1 1 0 0 1 0 1]


In [30]:
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score

In [31]:
cm = confusion_matrix(y_test, y_pred_m1)
cr = classification_report(y_test, y_pred_m1)
print(cm)
print(cr)

[[3 1]
 [0 3]]
              precision    recall  f1-score   support

           0       1.00      0.75      0.86         4
           1       0.75      1.00      0.86         3

    accuracy                           0.86         7
   macro avg       0.88      0.88      0.86         7
weighted avg       0.89      0.86      0.86         7



In [32]:
# we are getting TP=3, FN=1, FP=0, TN=3

In [33]:
# accuracy
print("Test Score = ", m1.score(x_test, y_test))
print("Accuracy Score = ", accuracy_score(y_test, y_pred_m1))

Test Score =  0.8571428571428571
Accuracy Score =  0.8571428571428571


In [34]:
# accuracy = TP+TN / (TP+TN+FP+FN)

print((3+3)/(3+3+0+1))

0.8571428571428571


In [36]:
# we were simply equating the linear equation with the sigmoid function
# S(x) = y = ax + b = mx + c

In [37]:
m = m1.coef_
c = m1.intercept_
print("Coefficient or Slope = ", m)
print("Intercept or Constant = ", c)

Coefficient or Slope =  [[0.15103038]]
Intercept or Constant =  [-5.49914654]


In [38]:
def sigmoid(x,m,c):
    logit = 1/(1+np.exp(-(m*x+c)))
    print(logit)

##### Predict whether the person bought insurance or not when :
1) Age = 59<br>
2) Age = 27

In [39]:
y_pred_59 = m1.predict([[59]])
print(y_pred_59)
sigmoid(59,m,c)

[1]
[[0.96806653]]




In [40]:
y_pred_27 = m1.predict([[27]])
print(y_pred_27)
sigmoid(27,m,c)

[0]
[[0.19445376]]


