### In this project we will be working with a fake advertising data set, indicating whether or not a particular internet user clicked on an Advertisement on a company website. We will try to create a model that will predict whether or not they will click on an ad based off the features of that user.

In [14]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

This data set contains the following features:

'Daily Time Spent on Site': consumer time on site in minutes  
    
'Age': cutomer age in years  
    
'Area Income': Avg. Income of geographical area of consumer  
    
'Daily Internet Usage': Avg. minutes a day consumer is on the internet  
    
'Ad Topic Line': Headline of the advertisement  
    
'City': City of consumer  
    
'Male': Whether or not consumer was male  
    
'Country': Country of consumer  
    
'Timestamp': Time at which consumer clicked on Ad or closed window  
    
'Clicked on Ad': 0 or 1 indicated clicking on Ad

In [5]:
ad_data = pd.read_csv('advertising.csv')

In [7]:
ad_data.head(7)

Unnamed: 0,Daily Time Spent on Site,Age,Area Income,Daily Internet Usage,Ad Topic Line,City,Male,Country,Timestamp,Clicked on Ad
0,68.95,35,61833.9,256.09,Cloned 5thgeneration orchestration,Wrightburgh,0,Tunisia,2016-03-27 00:53:11,0
1,80.23,31,68441.85,193.77,Monitored national standardization,West Jodi,1,Nauru,2016-04-04 01:39:02,0
2,69.47,26,59785.94,236.5,Organic bottom-line service-desk,Davidton,0,San Marino,2016-03-13 20:35:42,0
3,74.15,29,54806.18,245.89,Triple-buffered reciprocal time-frame,West Terrifurt,1,Italy,2016-01-10 02:31:19,0
4,68.37,35,73889.99,225.58,Robust logistical utilization,South Manuel,0,Iceland,2016-06-03 03:36:18,0
5,59.99,23,59761.56,226.74,Sharable client-driven software,Jamieberg,1,Norway,2016-05-19 14:30:17,0
6,88.91,33,53852.85,208.36,Enhanced dedicated support,Brandonstad,0,Myanmar,2016-01-28 20:59:32,0


In [11]:
ad_data.dropna(inplace=True)

In [12]:
ad_data.describe()

Unnamed: 0,Daily Time Spent on Site,Age,Area Income,Daily Internet Usage,Male,Clicked on Ad
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,65.0002,36.009,55000.00008,180.0001,0.481,0.5
std,15.853615,8.785562,13414.634022,43.902339,0.499889,0.50025
min,32.6,19.0,13996.5,104.78,0.0,0.0
25%,51.36,29.0,47031.8025,138.83,0.0,0.0
50%,68.215,35.0,57012.3,183.13,0.0,0.5
75%,78.5475,42.0,65470.635,218.7925,1.0,1.0
max,91.43,61.0,79484.8,269.96,1.0,1.0


## Logistic Regression
Now it's time to do a train test split, and train our model!

In [16]:
from sklearn.model_selection import train_test_split

In [17]:
X = ad_data[['Daily Time Spent on Site', 'Age', 'Area Income','Daily Internet Usage', 'Male']]
y = ad_data['Clicked on Ad']

In [18]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=27)

In [19]:
from sklearn.linear_model import LogisticRegression

In [20]:
model = LogisticRegression()
model.fit(X_train,y_train)

Evaluation and Predictions

In [21]:
predictions = model.predict(X_test)

In [22]:
from sklearn.metrics import classification_report

In [23]:
print(classification_report(y_test,predictions))

              precision    recall  f1-score   support

           0       0.92      0.88      0.90       162
           1       0.86      0.91      0.88       138

    accuracy                           0.89       300
   macro avg       0.89      0.89      0.89       300
weighted avg       0.89      0.89      0.89       300

