# A/B Testing for ShoeFly.com

Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week. Help them analyze the data using aggregate measures.

Analyzing Ad Sources

In [106]:
import pandas as pd

In [107]:
ad_clicks = pd.read_csv('ad_clicks.csv')

ad_clicks.head()

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B


Your manager wants to know which ad platform is getting you the most views.

In [108]:
platform = ad_clicks.groupby('utm_source').user_id.count().reset_index()

platform

Unnamed: 0,utm_source,user_id
0,email,255
1,facebook,504
2,google,680
3,twitter,215


If the column ad_click_timestamp is not null, then someone actually clicked on the ad that was displayed.

Create a new column called is_click, which is True if ad_click_timestamp is not null and False otherwise.

In [109]:
ad_clicks['is_click'] = ~ad_clicks.ad_click_timestamp.isnull()

ad_clicks

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group,is_click
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A,True
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B,False
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A,False
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B,False
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B,False
...,...,...,...,...,...,...
1649,fe8b5236-78f6-4192-9da6-a76bba67cfe6,twitter,7 - Sunday,,A,False
1650,fed3db6d-8c92-40e3-a4fb-1fb9d7337eb1,facebook,5 - Friday,,B,False
1651,ff3a22ff-521c-478c-87ca-7dc7b8f34372,twitter,3 - Wednesday,,B,False
1652,ff3af0d6-b092-4c4d-9f2e-2bdd8f7c0732,google,1 - Monday,22:57,A,True


We want to know the percent of people who clicked on ads from each utm_source.

In [110]:
clicks_by_source = ad_clicks.groupby(['utm_source', 'is_click']).user_id.count().reset_index()

clicks_by_source

Unnamed: 0,utm_source,is_click,user_id
0,email,False,175
1,email,True,80
2,facebook,False,324
3,facebook,True,180
4,google,False,441
5,google,True,239
6,twitter,False,149
7,twitter,True,66


In [111]:
clicks_pivot = clicks_by_source.pivot(
        columns = 'is_click',
        index = 'utm_source',
        values = 'user_id'
).reset_index()

clicks_pivot

is_click,utm_source,False,True
0,email,175,80
1,facebook,324,180
2,google,441,239
3,twitter,149,66


In [112]:
clicks_pivot['percent_clicked'] = 100 * (clicks_pivot[True] / (clicks_pivot[True] + clicks_pivot[False]))
clicks_pivot

is_click,utm_source,False,True,percent_clicked
0,email,175,80,31.372549
1,facebook,324,180,35.714286
2,google,441,239,35.147059
3,twitter,149,66,30.697674


Analyzing an A/B Test

In [113]:
experiment_group = ad_clicks.groupby('experimental_group').user_id.count().reset_index()
experiment_group

Unnamed: 0,experimental_group,user_id
0,A,827
1,B,827


In [114]:
experimental = ad_clicks.groupby(['experimental_group', 'is_click']).user_id.count().reset_index()
experimental

Unnamed: 0,experimental_group,is_click,user_id
0,A,False,517
1,A,True,310
2,B,False,572
3,B,True,255


In [117]:
experimental_pivot = experimental.pivot(
        columns = 'is_click',
        index = 'experimental_group',
        values = 'user_id'
).reset_index()
experimental_pivot

is_click,experimental_group,False,True
0,A,517,310
1,B,572,255


In [118]:
experimental_pivot['percent_clicked'] = 100*(experimental_pivot[True] / (experimental_pivot[True] + experimental_pivot[False]))
experimental_pivot

is_click,experimental_group,False,True,percent_clicked
0,A,517,310,37.484885
1,B,572,255,30.834341


In [119]:
a_clicks = ad_clicks[ad_clicks.experimental_group == 'A']
b_clicks = ad_clicks[ad_clicks.experimental_group == 'B']

In [120]:
a_pivot = a_clicks.groupby(['is_click', 'day']).user_id.count().reset_index().pivot(
    columns = 'is_click',
    index = 'day',
    values = 'user_id'
).reset_index()

a_pivot

is_click,day,False,True
0,1 - Monday,70,43
1,2 - Tuesday,76,43
2,3 - Wednesday,86,38
3,4 - Thursday,69,47
4,5 - Friday,77,51
5,6 - Saturday,73,45
6,7 - Sunday,66,43


In [126]:
b_pivot = b_clicks.groupby(['is_click', 'day']).user_id.count().reset_index().pivot(
    columns = 'is_click',
    index = 'day',
    values = 'user_id'
).reset_index()

b_pivot

is_click,day,False,True
0,1 - Monday,81,32
1,2 - Tuesday,74,45
2,3 - Wednesday,89,35
3,4 - Thursday,87,29
4,5 - Friday,90,38
5,6 - Saturday,76,42
6,7 - Sunday,75,34


In [121]:
a_pivot['percent_clicked'] = 100*(a_pivot[True] / (a_pivot[True] + a_pivot[False]))

a_pivot

is_click,day,False,True,percent_clicked
0,1 - Monday,70,43,38.053097
1,2 - Tuesday,76,43,36.134454
2,3 - Wednesday,86,38,30.645161
3,4 - Thursday,69,47,40.517241
4,5 - Friday,77,51,39.84375
5,6 - Saturday,73,45,38.135593
6,7 - Sunday,66,43,39.449541


In [127]:
b_pivot['percent_clicked'] = 100*(b_pivot[True] / (b_pivot[True] + b_pivot[False]))

b_pivot

is_click,day,False,True,percent_clicked
0,1 - Monday,81,32,28.318584
1,2 - Tuesday,74,45,37.815126
2,3 - Wednesday,89,35,28.225806
3,4 - Thursday,87,29,25.0
4,5 - Friday,90,38,29.6875
5,6 - Saturday,76,42,35.59322
6,7 - Sunday,75,34,31.192661


Compare the results for A and B. What happened over the course of the week?

Do you recommend that your company use Ad A or Ad B?

Answer :

from 1654 data, the results are 50 : 50 for Ad A and Ad B, each got 827 results.

Which is Ad A was doing better than Ad B :
    - Ad Click Ad A had 55 results more than Ad B.
    - Ad for a week, Ad A were doing better, only lost on Tuesday from Ad B with 1.68% different
    
It might be a good idea to choose Ad A.