## A/B Testing for ShoeFly.com

 Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week.
 
In this project we will help them analyze the data using aggregate measures and pivot tables.
The data for this project can be had from https://www.kaggle.com/davidshahshankhar/ad-clicks?select=ad_clicks.csv

In [1]:
# Let's start by importimg pandas first
import pandas as pd

In [21]:
# loading and reading the dataset.
ad_clicks = pd.read_csv(r"C:\Users\amanp\Downloads\ad_clicks.csv")
ad_clicks.head()

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B


 The manager wants to know which ad platform is getting the most views.
 Let's check how many views (i.e., rows of the table) came from each utm_source?

In [22]:
ad_clicks.groupby('utm_source').user_id.count().reset_index()

Unnamed: 0,utm_source,user_id
0,email,255
1,facebook,504
2,google,680
3,twitter,215


So the most views came from google, followed by facebook. Twitter and email had comparatively much less views.

In [23]:
# If the column ad_click_timestamp is not null, then someone actually clicked on the ad that was displayed.
# let's create a new column called is_click, which is True if ad_click_timestamp is not null and False otherwise.

ad_clicks['is_click'] = ~ad_clicks\
   .ad_click_timestamp.isnull()

# The ~ is a NOT operator, and isnull() tests whether or not the value of ad_click_timestamp is null.

ad_clicks.head()

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group,is_click
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A,True
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B,False
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A,False
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B,False
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B,False


 Let's check the percent of people who clicked on ads from each utm_source.

In [24]:
# 1. let's start by grouping by utm_source and is_click and counting the number of user_id‘s in each of those groups.

clicks_by_source = ad_clicks.groupby(['utm_source', 'is_click']).user_id.count().reset_index()
clicks_by_source

Unnamed: 0,utm_source,is_click,user_id
0,email,False,175
1,email,True,80
2,facebook,False,324
3,facebook,True,180
4,google,False,441
5,google,True,239
6,twitter,False,149
7,twitter,True,66


In [25]:
# 2. Now let’s pivot the data so that the columns are is_click (either True or False), the index is utm_source, and the values 
# are user_id.

clicks_pivot = clicks_by_source.pivot(index='utm_source',
          columns='is_click',
          values='user_id').reset_index()
clicks_pivot

is_click,utm_source,False,True
0,email,175,80
1,facebook,324,180
2,google,441,239
3,twitter,149,66


In [26]:
# 3. Finally, let's Create a new column in clicks_pivot called percent_clicked which is equal to the percent of users who clicked on 
# the ad from each utm_source.

clicks_pivot['percent_clicked'] = \
   clicks_pivot[True] / \
   (clicks_pivot[True] + 
    clicks_pivot[False])

# clicks_pivot[True] is the number of people who clicked (because is_click was True for those users)
# clicks_pivot[False] is the number of people who did not click (because is_click was False for those users)
# So, the percent of people who clicked would be (Total Who Clicked) / (Total Who Clicked + Total Who Did Not Click)


clicks_pivot


is_click,utm_source,False,True,percent_clicked
0,email,175,80,0.313725
1,facebook,324,180,0.357143
2,google,441,239,0.351471
3,twitter,149,66,0.306977


So facebook and google had the most clicks

### Analysing A/B Test

In [27]:
# The column experimental_group tells us whether the user was shown Ad A or Ad B.
# let's check if the same number of people shown both adds.

a_or_b = ad_clicks.groupby('experimental_group').user_id.count().reset_index()
a_or_b

Unnamed: 0,experimental_group,user_id
0,A,827
1,B,827


This shows that exactly same number of people were shown both adds.

In [28]:
# Using the column is_click that we defined earlier, let's check to see if a greater percentage of users clicked on Ad A or Ad B.

ab_click = ad_clicks.groupby(['experimental_group','is_click']).user_id.count().reset_index()
ab_click

Unnamed: 0,experimental_group,is_click,user_id
0,A,False,517
1,A,True,310
2,B,False,572
3,B,True,255


So more number of people clicked on add A

In [30]:
# The Product Manager for the A/B test thinks that the clicks might have changed by day of the week.
# let's start by creating two DataFrames: a_clicks and b_clicks, which contain only the results for A group and B group,
# respectively and then, for each group (a_clicks and b_clicks), calculate the percent of users who clicked on the ad by day.

a_clicks = ad_clicks[
   ad_clicks.experimental_group
   == 'A']
b_clicks = ad_clicks[
   ad_clicks.experimental_group
   == 'B']
a_clicks.head()


Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group,is_click
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A,True
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A,False
5,013b0072-7b72-40e7-b698-98b4d0c9967f,facebook,1 - Monday,,A,False
6,0153d85b-7660-4c39-92eb-1e1acd023280,google,4 - Thursday,,A,False
7,01555297-d6e6-49ae-aeba-1b196fdbb09f,google,3 - Wednesday,,A,False


In [31]:

b_clicks.head()

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group,is_click
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B,False
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B,False
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B,False
9,01a210c3-fde0-4e6f-8efd-4f0e38730ae6,email,2 - Tuesday,15:21,B,True
10,01adb2e7-f711-4ae4-a7c6-29f48457eea1,google,3 - Wednesday,,B,False


In [32]:
aa = a_clicks.groupby(['day', 'is_click']).user_id.count().reset_index()
bb= b_clicks.groupby(['day', 'is_click']).user_id.count().reset_index()
aa

Unnamed: 0,day,is_click,user_id
0,1 - Monday,False,70
1,1 - Monday,True,43
2,2 - Tuesday,False,76
3,2 - Tuesday,True,43
4,3 - Wednesday,False,86
5,3 - Wednesday,True,38
6,4 - Thursday,False,69
7,4 - Thursday,True,47
8,5 - Friday,False,77
9,5 - Friday,True,51


In [33]:
bb

Unnamed: 0,day,is_click,user_id
0,1 - Monday,False,81
1,1 - Monday,True,32
2,2 - Tuesday,False,74
3,2 - Tuesday,True,45
4,3 - Wednesday,False,89
5,3 - Wednesday,True,35
6,4 - Thursday,False,87
7,4 - Thursday,True,29
8,5 - Friday,False,90
9,5 - Friday,True,38


In [34]:
# Let's pivot these tables with is_click as columns, day as index and user_id as values. 
aa_pivot = aa.pivot(columns='is_click', index = 'day', values = 'user_id').reset_index()
bb_pivot = bb.pivot(columns='is_click', index = 'day', values = 'user_id').reset_index()



In [35]:
aa_pivot

is_click,day,False,True
0,1 - Monday,70,43
1,2 - Tuesday,76,43
2,3 - Wednesday,86,38
3,4 - Thursday,69,47
4,5 - Friday,77,51
5,6 - Saturday,73,45
6,7 - Sunday,66,43


In [36]:
bb_pivot

is_click,day,False,True
0,1 - Monday,81,32
1,2 - Tuesday,74,45
2,3 - Wednesday,89,35
3,4 - Thursday,87,29
4,5 - Friday,90,38
5,6 - Saturday,76,42
6,7 - Sunday,75,34


In [37]:
# let's finally convert them into percentage
aa_pivot['percent'] = aa_pivot[True]/(aa_pivot[True]+aa_pivot[False])
bb_pivot['percent'] = bb_pivot[True]/(bb_pivot[True]+bb_pivot[False])


In [38]:
aa_pivot

is_click,day,False,True,percent
0,1 - Monday,70,43,0.380531
1,2 - Tuesday,76,43,0.361345
2,3 - Wednesday,86,38,0.306452
3,4 - Thursday,69,47,0.405172
4,5 - Friday,77,51,0.398438
5,6 - Saturday,73,45,0.381356
6,7 - Sunday,66,43,0.394495


In [39]:
bb_pivot

is_click,day,False,True,percent
0,1 - Monday,81,32,0.283186
1,2 - Tuesday,74,45,0.378151
2,3 - Wednesday,89,35,0.282258
3,4 - Thursday,87,29,0.25
4,5 - Friday,90,38,0.296875
5,6 - Saturday,76,42,0.355932
6,7 - Sunday,75,34,0.311927


### Conclusion:
From the results, we can conclude the following:

1. Both the adds got most clicks during the weekends, which is quite obvious because people are free from their routine jobs.
2. Ad A constantly got more clicks than Ad B throughout the week.
3. The highest percentage of clicks for Ad A was on Thursdays whereas that of Ad B was Saturdays.

Overall, we can conclude that Ad A was more popular and eye catching than Ad B.