# **Machine Learning with Logistic Regression**
Second in the ML micro-project series, in this project we will work with a fake advertising data set, indicating whether or not a particular internet user clicked on an advertisement.

We will create a logistic regression model that will predict whether or not a user will click on an ad, based on his/her features. As this is a binary classification problem, a logistic regression model is well suited here.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# **Data**
The data set contains the following features:


*   'Daily Time Spent on Site': consumer time on site in minutes
*   'Age': cutomer age in years
*   'Area Income': Avg. Income of geographical area of consumer
*   'Daily Internet Usage': Avg. minutes a day consumer is on the internet
*   'Ad Topic Line': Headline of the advertisement
*   'City': City of consumer
*   'Male': Whether or not consumer was male
*   'Country': Country of consumer
*   'Timestamp': Time at which consumer clicked on Ad or closed window
*   'Clicked on Ad': 0 or 1 indicated clicking on Ad

In [None]:
ad_data = pd.read_csv('data/advertising.csv')

In [None]:
ad_data.head()

In [None]:
ad_data.info()

In [None]:
ad_data.describe()

# **Exploratory Analysis**
Checking out the distribution of user age.

In [None]:
plt.hist(ad_data['Age'],bins=30)
plt

Checking out the relationship between age and daily time spent on site.

In [None]:
sns.jointplot('Age','Daily Time Spent on Site',ad_data)

In [None]:
sns.jointplot('Daily Time Spent on Site','Daily Internet Usage',ad_data)

Finally, a pairplot to visualise everything else, colored on the basis of whether they clicked the ad or not.

In [None]:
sns.pairplot(ad_data,hue='Clicked on Ad')