# Logistic Regression

## Data Collection

The dataset used in this notebook is "Social Network Ads" that contains data related to purchase of a particular product.The dataset can be obtained from https://github.com/TarunNanduri/Artificial-Intelligence/tree/master/LogisticRegression/ 

In [10]:
import pandas as pd

In [11]:
file = './Social_Network_Ads.csv'
data = pd.read_csv(file)
data.head()

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


## Data pre-processing

Since the data is related to a particular product. So,the factors that effect the output can be the age and salary. 

In [12]:
features = data.iloc[:,2:4]
labels = data.iloc[:,4]

In [13]:
features.head()

Unnamed: 0,Age,EstimatedSalary
0,19,19000
1,35,20000
2,26,43000
3,27,57000
4,19,76000


When we take a close look towards the features, the features are of different scales.So, let's scale them to same scale

In [14]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
scaledFeatures = sc.fit_transform(features)
pd.DataFrame(scaledFeatures,columns = features.columns).head(10)

Unnamed: 0,Age,EstimatedSalary
0,-1.781797,-1.490046
1,-0.253587,-1.460681
2,-1.113206,-0.78529
3,-1.017692,-0.374182
4,-1.781797,0.183751
5,-1.017692,-0.344817
6,-1.017692,0.418669
7,-0.540127,2.35675
8,-1.208719,-1.078938
9,-0.253587,-0.139263


**StandardScaler** scales the given data into (-3,3) range

Now, the data is ready and we need to fed to the algorithm.So,before we do that let's split our data into train and test set.

In [15]:
from sklearn.model_selection import train_test_split
Xtrain, Xtest, Ytrain, Ytest = train_test_split(scaledFeatures , labels)

# Training the model

In [16]:
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(Xtrain,Ytrain)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

# Validating the model

In [17]:
YPredict = classifier.predict(Xtest)

In [18]:
from sklearn.metrics import confusion_matrix,classification_report
cm = confusion_matrix(Ytest, YPredict)
print(cm)
print(classification_report(Ytest,YPredict))

[[63  2]
 [14 21]]
              precision    recall  f1-score   support

           0       0.82      0.97      0.89        65
           1       0.91      0.60      0.72        35

    accuracy                           0.84       100
   macro avg       0.87      0.78      0.81       100
weighted avg       0.85      0.84      0.83       100



### Our model totally gets an overall accuracy of 84%