# A/B Testing

A/B testing, also known as split testing, is a quantitative research method that compares two or more versions of a product, design, or 
message to see which one performs best. 

The goal is to learn how people behave on a page and make decisions based on that data. 


### Requirement

Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed 
in emails, as well as in banner ads on Facebook, Twitter, and Google. 

They want to know how the two ads are performing on each of the different platforms on each day of the week. 

Help them analyze the data using aggregate measures.

In [2]:
import pandas as pd

In [3]:
ad_clicks_df = pd.read_csv(r'D:\GIT_Repositories\pandas\ad_clicks.csv')

### Task 1

Examine the first few rows of ad_clicks.

In [4]:
ad_clicks_df.head(10)

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,07:18,A
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B
5,013b0072-7b72-40e7-b698-98b4d0c9967f,facebook,1 - Monday,,A
6,0153d85b-7660-4c39-92eb-1e1acd023280,google,4 - Thursday,,A
7,01555297-d6e6-49ae-aeba-1b196fdbb09f,google,3 - Wednesday,,A
8,018cea61-19ea-4119-895b-1a4309ccb148,email,1 - Monday,18:33,A
9,01a210c3-fde0-4e6f-8efd-4f0e38730ae6,email,2 - Tuesday,15:21,B


## Task 2

Your manager wants to know which ad platform is getting you the most views.

How many views (i.e., rows of the table) came from each utm_source?

In [87]:
utm_most_views = ad_clicks_df.groupby('utm_source').user_id.count() \
                 .sort_values(ascending = False) \
                 .reset_index()

In [88]:
utm_most_views

Unnamed: 0,utm_source,user_id
0,google,680
1,facebook,504
2,email,255
3,twitter,215


From above: Google followed by face book are getting us more views. both brough > 500 views

## Task 3

tricky: .isnull()

In [None]:
If the column ad_click_timestamp is not null, then someone actually clicked on the ad that was displayed.

Create a new column called is_click, which is True if ad_click_timestamp is not null and False otherwise.

In [89]:
ad_clicks_df['is_click'] = ~(ad_clicks_df \
                             .ad_click_timestamp \
                             .isnull())

In [90]:
ad_clicks_df

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group,is_click
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,07:18,A,True
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B,False
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A,False
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B,False
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B,False
...,...,...,...,...,...,...
1649,fe8b5236-78f6-4192-9da6-a76bba67cfe6,twitter,7 - Sunday,,A,False
1650,fed3db6d-8c92-40e3-a4fb-1fb9d7337eb1,facebook,5 - Friday,,B,False
1651,ff3a22ff-521c-478c-87ca-7dc7b8f34372,twitter,3 - Wednesday,,B,False
1652,ff3af0d6-b092-4c4d-9f2e-2bdd8f7c0732,google,1 - Monday,22:57,A,True


## Task 4

We want to know the percent of people who clicked on ads from each utm_source.

Start by grouping by utm_source and is_click and counting the number of user_id‘s in each of those groups. 

Save your answer to the variable clicks_by_source.

In [8]:
ad_clicks_df.columns

Index(['user_id', 'utm_source', 'day', 'ad_click_timestamp',
       'experimental_group', 'is_click'],
      dtype='object')

In [91]:
clicks_by_source = ad_clicks_df.groupby(['utm_source', 'is_click']) \
                   .user_id \
                   .count() \
                   .reset_index()

In [92]:
clicks_by_source

Unnamed: 0,utm_source,is_click,user_id
0,email,False,175
1,email,True,80
2,facebook,False,324
3,facebook,True,180
4,google,False,441
5,google,True,239
6,twitter,False,149
7,twitter,True,66


In [18]:
clicks_pivot = clicks_by_source.pivot(
    columns = 'is_click',
    index = 'utm_source',
    values = 'user_id'
).reset_index()

In [19]:
clicks_pivot

is_click,utm_source,False,True
0,email,175,80
1,facebook,324,180
2,google,441,239
3,twitter,149,66


In [20]:
clicks_pivot['percent_clicked'] = clicks_pivot[True] / ( clicks_pivot[True] + clicks_pivot[False] )

In [21]:
clicks_pivot

is_click,utm_source,False,True,percent_clicked
0,email,175,80,0.313725
1,facebook,324,180,0.357143
2,google,441,239,0.351471
3,twitter,149,66,0.306977


## Analyzing an A/B Test

## Task 1

The column experimental_group tells us whether the user was shown Ad A or Ad B.

Were approximately the same number of people shown both ads?

In [23]:
ad_clicks_df

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group,is_click
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,07:18,A,True
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B,False
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A,False
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B,False
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B,False
...,...,...,...,...,...,...
1649,fe8b5236-78f6-4192-9da6-a76bba67cfe6,twitter,7 - Sunday,,A,False
1650,fed3db6d-8c92-40e3-a4fb-1fb9d7337eb1,facebook,5 - Friday,,B,False
1651,ff3a22ff-521c-478c-87ca-7dc7b8f34372,twitter,3 - Wednesday,,B,False
1652,ff3af0d6-b092-4c4d-9f2e-2bdd8f7c0732,google,1 - Monday,22:57,A,True


In [35]:
experimental_group_cnt = ad_clicks_df.groupby('experimental_group').user_id.count().reset_index

In [36]:
experimental_group_cnt

<bound method Series.reset_index of experimental_group
A    827
B    827
Name: user_id, dtype: int64>

## Task 2

Using the column is_click that we defined earlier, check to see if a greater percentage of users clicked on Ad A or Ad B.

In [37]:
clicked_A_or_B = ad_clicks_df.groupby(['experimental_group','is_click']).user_id.count().reset_index()

In [38]:
clicked_A_or_B

Unnamed: 0,experimental_group,is_click,user_id
0,A,False,517
1,A,True,310
2,B,False,572
3,B,True,255


In [39]:
percent_clicked_A_or_B = clicked_A_or_B.pivot(
        index   = 'experimental_group',
        columns = 'is_click',
        values  = 'user_id'
)

In [40]:
percent_clicked_A_or_B

is_click,False,True
experimental_group,Unnamed: 1_level_1,Unnamed: 2_level_1
A,517,310
B,572,255


## Task 3

The Product Manager for the A/B test thinks that the clicks might have changed by day of the week.

Start by creating two DataFrames: a_clicks and b_clicks, which contain only the results for A group and B group, respectively.

In [41]:
ad_clicks_df.head()

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group,is_click
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,07:18,A,True
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B,False
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A,False
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B,False
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B,False


In [42]:
a_clicks = ad_clicks_df[ad_clicks_df['experimental_group'] == 'A']
b_clicks = ad_clicks_df[ad_clicks_df['experimental_group'] == 'B']

In [43]:
a_clicks

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group,is_click
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,07:18,A,True
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A,False
5,013b0072-7b72-40e7-b698-98b4d0c9967f,facebook,1 - Monday,,A,False
6,0153d85b-7660-4c39-92eb-1e1acd023280,google,4 - Thursday,,A,False
7,01555297-d6e6-49ae-aeba-1b196fdbb09f,google,3 - Wednesday,,A,False
...,...,...,...,...,...,...
1643,fceb13ea-fd8c-446a-a61f-f977d404330a,twitter,6 - Saturday,,A,False
1646,fd7d06ea-38b5-4ed9-acc9-777047db8c56,google,4 - Thursday,,A,False
1647,fe570a20-448f-40ed-930b-8482b8a7c231,facebook,1 - Monday,20:07,A,True
1649,fe8b5236-78f6-4192-9da6-a76bba67cfe6,twitter,7 - Sunday,,A,False


In [44]:
b_clicks

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group,is_click
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B,False
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B,False
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B,False
9,01a210c3-fde0-4e6f-8efd-4f0e38730ae6,email,2 - Tuesday,15:21,B,True
10,01adb2e7-f711-4ae4-a7c6-29f48457eea1,google,3 - Wednesday,,B,False
...,...,...,...,...,...,...
1645,fd2a5852-f0ef-4162-84a6-107a42dc46b5,twitter,3 - Wednesday,,B,False
1648,fe6cfa5a-cc63-4770-8d56-c13ac8cf5bef,google,3 - Wednesday,15:06,B,True
1650,fed3db6d-8c92-40e3-a4fb-1fb9d7337eb1,facebook,5 - Friday,,B,False
1651,ff3a22ff-521c-478c-87ca-7dc7b8f34372,twitter,3 - Wednesday,,B,False


## Task 4

For each group (a_clicks and b_clicks), calculate the percent of users who clicked on the ad by day.

In [56]:
a_clicks_cnt = a_clicks.groupby(['is_click','day']).user_id.count().reset_index()

In [57]:
a_clicks_cnt

Unnamed: 0,is_click,day,user_id
0,False,1 - Monday,70
1,False,2 - Tuesday,76
2,False,3 - Wednesday,86
3,False,4 - Thursday,69
4,False,5 - Friday,77
5,False,6 - Saturday,73
6,False,7 - Sunday,66
7,True,1 - Monday,43
8,True,2 - Tuesday,43
9,True,3 - Wednesday,38


In [58]:
a_clicks_cnt_pivot = a_clicks_cnt.pivot(
    index = 'day', 
    columns = 'is_click',
    values = 'user_id'
).reset_index()

In [59]:
a_clicks_cnt_pivot

is_click,day,False,True
0,1 - Monday,70,43
1,2 - Tuesday,76,43
2,3 - Wednesday,86,38
3,4 - Thursday,69,47
4,5 - Friday,77,51
5,6 - Saturday,73,45
6,7 - Sunday,66,43


In [62]:
a_clicks_cnt_pivot['percent_clicked'] = a_clicks_cnt_pivot[True] / ( a_clicks_cnt_pivot[True] + a_clicks_cnt_pivot[False] )

In [63]:
a_clicks_cnt_pivot

is_click,day,False,True,percent,percent_clicked
0,1 - Monday,70,43,0.380531,0.380531
1,2 - Tuesday,76,43,0.361345,0.361345
2,3 - Wednesday,86,38,0.306452,0.306452
3,4 - Thursday,69,47,0.405172,0.405172
4,5 - Friday,77,51,0.398438,0.398438
5,6 - Saturday,73,45,0.381356,0.381356
6,7 - Sunday,66,43,0.394495,0.394495


In [74]:
b_clicks_cnt = b_clicks.groupby(['is_click','day']).user_id.count().reset_index()

In [75]:
b_clicks_cnt

Unnamed: 0,is_click,day,user_id
0,False,1 - Monday,81
1,False,2 - Tuesday,74
2,False,3 - Wednesday,89
3,False,4 - Thursday,87
4,False,5 - Friday,90
5,False,6 - Saturday,76
6,False,7 - Sunday,75
7,True,1 - Monday,32
8,True,2 - Tuesday,45
9,True,3 - Wednesday,35


In [79]:
b_clicks_cnt_pivot = b_clicks_cnt.pivot(
    index = 'day', 
    columns = 'is_click',
    values = 'user_id'
).reset_index()

In [80]:
b_clicks_cnt_pivot

is_click,day,False,True
0,1 - Monday,81,32
1,2 - Tuesday,74,45
2,3 - Wednesday,89,35
3,4 - Thursday,87,29
4,5 - Friday,90,38
5,6 - Saturday,76,42
6,7 - Sunday,75,34


In [81]:
b_clicks_cnt_pivot['percent_clicked'] = b_clicks_cnt_pivot[True] / ( b_clicks_cnt_pivot[True] + b_clicks_cnt_pivot[False] )

In [82]:
b_clicks_cnt_pivot

is_click,day,False,True,percent_clicked
0,1 - Monday,81,32,0.283186
1,2 - Tuesday,74,45,0.378151
2,3 - Wednesday,89,35,0.282258
3,4 - Thursday,87,29,0.25
4,5 - Friday,90,38,0.296875
5,6 - Saturday,76,42,0.355932
6,7 - Sunday,75,34,0.311927
