**Introduction**

For a marketing campaign, the companies are interested in seeing how much of the success could be attributed to the ads. With that in mind, an A/B test is done where the experimental group (the majority of people) will be exposed to ads and a small portion of people (the control group) would instead see a Public Service Announcement (PSA) or nothing in the exact size and place the ad would normally be. 

We analyze the groups and find if the ads were successful, how much the company can make from the ads, and if the difference between the groups is statistically significant. 

**Data Dictionary**

- **index**: Row Index  
- **User ID**: User ID (unique)  
- **Test group**: If 'ad', the person saw the advertisement; if 'PSA', they only saw the public service announcement.  
- **converted**: If a person bought the product, then True; otherwise, False  
- **total ads**: Amount of ads seen by a person  
- **most ads day**: Day the person saw the most ads  
- **most ads hour**: Hour of the day the person saw the most ads  


**Importing the necessary libraries**

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

**Exploratory Data Analysis**

In [None]:
#Reading the data

df = pd.read_csv('marketing_AB.csv')

#Checking the first five observations

df.head(5)

Unnamed: 0.1,Unnamed: 0,user id,test group,converted,total ads,most ads day,most ads hour
0,0,1069124,ad,False,130,Monday,20
1,1,1119715,ad,False,93,Tuesday,22
2,2,1144181,ad,False,21,Tuesday,18
3,3,1435133,ad,False,355,Tuesday,10
4,4,1015700,ad,False,276,Friday,14


In [11]:
#Checking for duplicate user id as it's the unique identifier

df.duplicated(subset = 'user id').sum()

0

There were no duplicate user ids. It indicates that the ids are all unique. 

In [13]:
#Dropping irrelevant columns - Unnamed: 0 and user id as uder_id alone is not a predictor of anything

df.drop(['Unnamed: 0', 'user id'], axis = 1, inplace = True)