# Logistic Regression

Logistic regression is nothing but the Classification problem which estimates the probability of an event occurring

y = b<sub>0</sub> + b<sub>1</sub>x<sub>1</sub>

p = $\frac{1}{1 + e^{-y} }$

$\ln(\frac{p}{1-p}) = b_0 + b_1x$

![Image](https://miro.medium.com/max/800/1*dm6ZaX5fuSmuVvM4Ds-vcg.jpeg)

## Importing Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Importing Dataset

In [2]:
df = pd.read_csv("./Social_Network_Ads.csv")
df.head()

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0


In [3]:
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

In [4]:
X[:5]

array([[   19, 19000],
       [   35, 20000],
       [   26, 43000],
       [   27, 57000],
       [   19, 76000]], dtype=int64)

In [5]:
y[:5]

array([0, 0, 0, 0, 0], dtype=int64)

## Splitting Dataset into Train and Test set

In [6]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

In [7]:
X_train[:5]

array([[    44,  39000],
       [    32, 120000],
       [    38,  50000],
       [    32, 135000],
       [    52,  21000]], dtype=int64)

In [8]:
X_train.shape, X_test.shape

((300, 2), (100, 2))

In [9]:
y_train.shape, y_test.shape

((300,), (100,))

## Feature Scalling

In [10]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [11]:
X_train_scaled[:4]

array([[ 0.58164944, -0.88670699],
       [-0.60673761,  1.46173768],
       [-0.01254409, -0.5677824 ],
       [-0.60673761,  1.89663484]])

In [12]:
X_test_scaled[:4]

array([[-0.80480212,  0.50496393],
       [-0.01254409, -0.5677824 ],
       [-0.30964085,  0.1570462 ],
       [-0.80480212,  0.27301877]])

## Training the Logistic Regression Model

In [13]:
from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression()
classifier.fit(X_train_scaled, y_train)

LogisticRegression()

## Predicting new results

In [14]:
def predict(data):
    data = np.array(data).reshape(1, -1)
    data = scaler.transform(data)
    y_p = classifier.predict(data)
    print(y_p)

In [15]:
X_new = [2, 500000]
predict(X_new)

[1]


In [16]:
print(f"Accuracy Score {classifier.score(X_test_scaled, y_test) * 100}%",)

Accuracy Score 89.0%


## Predicting Test Result

In [19]:
df_p = pd.DataFrame(y_test, columns=["Actual"])
y_p = classifier.predict(X_test_scaled)
df_p["Predicted"] = y_p
df_p

Unnamed: 0,Actual,Predicted
0,0,0
1,0,0
2,0,0
3,0,0
4,0,0
...,...,...
95,1,0
96,0,0
97,1,0
98,1,1


## Cofusion Matrix

In [20]:
from sklearn.metrics import confusion_matrix

confusion_matrix(y_test, y_p)

array([[65,  3],
       [ 8, 24]], dtype=int64)