# A/B Testing

A/B testing, also known as split testing, is a quantitative research method that compares two or more versions of a product, design, or 
message to see which one performs best. 

The goal is to learn how people behave on a page and make decisions based on that data. 


### Requirement

Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed 
in emails, as well as in banner ads on Facebook, Twitter, and Google. 

They want to know how the two ads are performing on each of the different platforms on each day of the week. 

Help them analyze the data using aggregate measures.

In [3]:
import pandas as pd

In [4]:
ad_clicks_df = pd.read_csv(r'D:\GIT_Repositories\pandas\ad_clicks.csv')

### Task 1

Examine the first few rows of ad_clicks.

In [6]:
ad_clicks_df.head(10)

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,07:18,A
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B
5,013b0072-7b72-40e7-b698-98b4d0c9967f,facebook,1 - Monday,,A
6,0153d85b-7660-4c39-92eb-1e1acd023280,google,4 - Thursday,,A
7,01555297-d6e6-49ae-aeba-1b196fdbb09f,google,3 - Wednesday,,A
8,018cea61-19ea-4119-895b-1a4309ccb148,email,1 - Monday,18:33,A
9,01a210c3-fde0-4e6f-8efd-4f0e38730ae6,email,2 - Tuesday,15:21,B


## Task 2

Your manager wants to know which ad platform is getting you the most views.

How many views (i.e., rows of the table) came from each utm_source?

In [7]:
ad_clicks_df.groupby('utm_source').user_id.count().reset_index()

Unnamed: 0,utm_source,user_id
0,email,255
1,facebook,504
2,google,680
3,twitter,215


From above: Google followed by face book are getting us more views. both brough > 500 views

## Task 3

tricky: .isnull()

In [None]:
If the column ad_click_timestamp is not null, then someone actually clicked on the ad that was displayed.

Create a new column called is_click, which is True if ad_click_timestamp is not null and False otherwise.

In [39]:
ad_clicks_df['is_click'] = ~(ad_clicks_df.ad_click_timestamp.isnull())

In [40]:
ad_clicks_df

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group,is_click
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,07:18,A,True
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B,False
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A,False
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B,False
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B,False
...,...,...,...,...,...,...
1649,fe8b5236-78f6-4192-9da6-a76bba67cfe6,twitter,7 - Sunday,,A,False
1650,fed3db6d-8c92-40e3-a4fb-1fb9d7337eb1,facebook,5 - Friday,,B,False
1651,ff3a22ff-521c-478c-87ca-7dc7b8f34372,twitter,3 - Wednesday,,B,False
1652,ff3af0d6-b092-4c4d-9f2e-2bdd8f7c0732,google,1 - Monday,22:57,A,True


## Task 4

We want to know the percent of people who clicked on ads from each utm_source.

Start by grouping by utm_source and is_click and counting the number of user_id‘s in each of those groups. 

Save your answer to the variable clicks_by_source.

In [41]:
ad_clicks_df.columns

Index(['user_id', 'utm_source', 'day', 'ad_click_timestamp',
       'experimental_group', 'is_click'],
      dtype='object')

In [43]:
clicks_by_source = ad_clicks_df.groupby(['utm_source', 'is_click']).user_id.count().reset_index()

In [44]:
clicks_by_source

Unnamed: 0,utm_source,is_click,user_id
0,email,False,175
1,email,True,80
2,facebook,False,324
3,facebook,True,180
4,google,False,441
5,google,True,239
6,twitter,False,149
7,twitter,True,66


In [53]:
clicks_pivot = clicks_by_source.pivot(
    columns = ['is_click'],
    index = ['utm_source'],
    values = ['user_id']
).reset_index()

In [54]:
clicks_pivot

Unnamed: 0_level_0,utm_source,user_id,user_id
is_click,Unnamed: 1_level_1,False,True
0,email,175,80
1,facebook,324,180
2,google,441,239
3,twitter,149,66


In [58]:
clicks_by_source_pivot['percent_clicked'] = clicks_pivot[True] / ( clicks_pivot[True] + clicks_pivot[False] )

KeyError: True