## **Machine Learning model to predict whether the customer will buy a new car or not using logistic regression.**
### **Steps:**
  1. Importing needed libraries
  2. Data preprocessing
  3. Training logistic regression model
  4. Predicting new result
  5. Predicting test results
  6. Making confusion matrix

In [1]:
# 1. importing neede libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# 2. data preprocessing
# 2.1. importing needed dataset
df = pd.read_csv('Social_Network_Ads.csv')
df.head()

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


In [3]:
# 2.2. checking for missing values
df.isnull().sum()

User ID            0
Gender             0
Age                0
EstimatedSalary    0
Purchased          0
dtype: int64

In [4]:
# 2.3. splitting dependent & independent variable
X = df.iloc[:, 2:-1].values
y = df.iloc[:, -1].values

In [5]:
# 2.4. splitting train & test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [6]:
# 2.5. feature scaling
# feature scaling is used to improve the accuracy
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [7]:
# 3. training logistic regression model
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [8]:
# 4. predicting new result
lr.predict(sc.transform([[19, 76000]]))

array([0])

In [9]:
# 5. predicting test results
y_pred = lr.predict(X_test)
y_pred

array([1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1,
       0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0,
       0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0])

In [10]:
pd.concat([pd.DataFrame(y_test, columns=['y_test']),pd.DataFrame(data=y_pred, columns=['y_pred'])], axis=1)

Unnamed: 0,y_test,y_pred
0,1,1
1,0,0
2,1,1
3,1,1
4,1,1
...,...,...
75,0,0
76,1,1
77,0,0
78,1,1


In [11]:
# 6. making confusion matrix & accuracy
from sklearn.metrics import confusion_matrix, accuracy_score
conMatrix = confusion_matrix(y_test, y_pred)
accScore = accuracy_score(y_test, y_pred)
print(conMatrix, accScore)

[[43  3]
 [ 8 26]] 0.8625
