# A/B Testing for ShoeFly.com
Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week. Help them analyze the data using aggregate measures.

If you get stuck during this project or would like to see an experienced developer work through it, click **"Get Unstuck"** to see a **project walkthrough video.**

## Analyzing Ad Sources

### Task 1
Examine the first few rows of `ad_clicks`.

In [1]:
import pandas as pd
import numpy as np

ad_clicks = pd.read_csv('ad_clicks.csv')
print(ad_clicks.head(10))

                                user_id utm_source            day  \
0  008b7c6c-7272-471e-b90e-930d548bd8d7     google   6 - Saturday   
1  009abb94-5e14-4b6c-bb1c-4f4df7aa7557   facebook     7 - Sunday   
2  00f5d532-ed58-4570-b6d2-768df5f41aed    twitter    2 - Tuesday   
3  011adc64-0f44-4fd9-a0bb-f1506d2ad439     google    2 - Tuesday   
4  012137e6-7ae7-4649-af68-205b4702169c   facebook     7 - Sunday   
5  013b0072-7b72-40e7-b698-98b4d0c9967f   facebook     1 - Monday   
6  0153d85b-7660-4c39-92eb-1e1acd023280     google   4 - Thursday   
7  01555297-d6e6-49ae-aeba-1b196fdbb09f     google  3 - Wednesday   
8  018cea61-19ea-4119-895b-1a4309ccb148      email     1 - Monday   
9  01a210c3-fde0-4e6f-8efd-4f0e38730ae6      email    2 - Tuesday   

  ad_click_timestamp experimental_group  
0               7:18                  A  
1                NaN                  B  
2                NaN                  A  
3                NaN                  B  
4                NaN          

### Task 2
Your manager wants to know which ad platform is getting you the most views.

How many views (i.e., rows of the table) came from each `utm_source`?

In [4]:
most_ad_clicks = ad_clicks.groupby('utm_source').user_id.count().reset_index()
print(most_ad_clicks)

  utm_source  user_id
0      email      255
1   facebook      504
2     google      680
3    twitter      215


### Task 3
If the column `ad_click_timestamp` is not null, then someone actually clicked on the ad that was displayed.

Create a new column called `is_click`, which is True if `ad_click_timestamp` is not null and False otherwise.

In [6]:
ad_clicks['is_click'] = ad_clicks.ad_click_timestamp.apply(lambda x: False if type(x) is float else True)
print(ad_clicks)

                                   user_id utm_source            day  \
0     008b7c6c-7272-471e-b90e-930d548bd8d7     google   6 - Saturday   
1     009abb94-5e14-4b6c-bb1c-4f4df7aa7557   facebook     7 - Sunday   
2     00f5d532-ed58-4570-b6d2-768df5f41aed    twitter    2 - Tuesday   
3     011adc64-0f44-4fd9-a0bb-f1506d2ad439     google    2 - Tuesday   
4     012137e6-7ae7-4649-af68-205b4702169c   facebook     7 - Sunday   
...                                    ...        ...            ...   
1649  fe8b5236-78f6-4192-9da6-a76bba67cfe6    twitter     7 - Sunday   
1650  fed3db6d-8c92-40e3-a4fb-1fb9d7337eb1   facebook     5 - Friday   
1651  ff3a22ff-521c-478c-87ca-7dc7b8f34372    twitter  3 - Wednesday   
1652  ff3af0d6-b092-4c4d-9f2e-2bdd8f7c0732     google     1 - Monday   
1653  ffdfe7ec-0c74-4623-8d90-d95d80f1ba34   facebook   6 - Saturday   

     ad_click_timestamp experimental_group  is_click  
0                  7:18                  A      True  
1                   NaN  

### Task 4

We want to know the percent of people who clicked on ads from each `utm_source`.

Start by grouping by `utm_source` and `is_click` and counting the number of `user_id`‘s in each of those groups. Save your answer to the variable `clicks_by_source`.

In [8]:
clicks_by_source = ad_clicks.groupby(['utm_source','is_click']).user_id.count().reset_index()
print(clicks_by_source)

  utm_source  is_click  user_id
0      email     False      175
1      email      True       80
2   facebook     False      324
3   facebook      True      180
4     google     False      441
5     google      True      239
6    twitter     False      149
7    twitter      True       66


### Task 5
Now let’s pivot the data so that the columns are `is_click` (either `True` or `False`), the index is `utm_source`, and the values are `user_id`.

Save your results to the variable `clicks_pivot`.

In [10]:
clicks_pivot = clicks_by_source.pivot(columns= 'is_click',
                                      index= 'utm_source',
                                      values= 'user_id')
print(clicks_pivot)

is_click    False  True 
utm_source              
email         175     80
facebook      324    180
google        441    239
twitter       149     66


### Task 6
Create a new column in `clicks_pivot` called `percent_clicked` which is equal to the percent of users who clicked on the ad from each `utm_source`.

Was there a difference in click rates for each source?

In [12]:
clicks_pivot['percent_clicked'] = (100*clicks_pivot[True])/(clicks_pivot[False]+clicks_pivot[True])
print(clicks_pivot)

is_click    False  True  percent_clicked
utm_source                              
email         175    80        31.372549
facebook      324   180        35.714286
google        441   239        35.147059
twitter       149    66        30.697674


## Analyzing an A/B Test

### Task 7
The column `experimental_group` tells us whether the user was shown Ad A or Ad B.

Were approximately the same number of people shown both ads?

In [14]:
adA_adB = ad_clicks.groupby(['utm_source', 'experimental_group']).user_id.count().reset_index()
adA_adB_pivot = adA_adB.pivot(columns= 'experimental_group',
                              index= 'utm_source',
                              values= 'user_id')
print(adA_adB_pivot)
print(f'\nNumber of subjects for ad A:{sum(adA_adB_pivot["A"])}\nNumberof subjects for ad B:{sum(adA_adB_pivot["B"])}')

experimental_group    A    B
utm_source                  
email               121  134
facebook            254  250
google              349  331
twitter             103  112

Number of subjects for ad A:827
Numberof subjects for ad B:827


### Task 8

Using the column `is_click` that we defined earlier, check to see if a greater percentage of users clicked on Ad A or Ad B.

In [16]:
is_click_adA_adB = ad_clicks.groupby(['experimental_group','is_click']).user_id.count().reset_index()
is_click_adA_adB_pivot = is_click_adA_adB.pivot(columns= 'is_click',
                                                index= 'experimental_group',
                                                values= 'user_id')
print(is_click_adA_adB_pivot)
#More users clicked ad A than ad B

is_click            False  True 
experimental_group              
A                     517    310
B                     572    255


### Task 9
The Product Manager for the A/B test thinks that the clicks might have changed by day of the week.

Start by creating two DataFrames: `a_clicks` and `b_clicks`, which contain only the results for `A` group and `B` group, respectively.

In [18]:
a_clicks= ad_clicks.loc[ad_clicks.experimental_group == 'A']
b_clicks= ad_clicks.loc[ad_clicks.experimental_group =='B']

### Task 10
For each group (`a_clicks` and `b_clicks`), calculate the percent of users who clicked on the ad by `day`.

In [20]:
#Number of clicks
a_clicks_day= a_clicks.groupby(['day','is_click']).user_id.count().reset_index()
b_clicks_day= b_clicks.groupby(['day','is_click']).user_id.count().reset_index()

a_clicks_day_pivot= a_clicks_day.pivot(columns= 'day',
                                       index= 'is_click',
                                       values= 'user_id')
b_clicks_day_pivot= b_clicks_day.pivot(columns= 'day',
                                       index= 'is_click',
                                       values= 'user_id')
print(f'A clicks\n{a_clicks_day_pivot}')
print(f'\nB clicks\n{b_clicks_day_pivot}')

A clicks
day       1 - Monday  2 - Tuesday  3 - Wednesday  4 - Thursday  5 - Friday  \
is_click                                                                     
False             70           76             86            69          77   
True              43           43             38            47          51   

day       6 - Saturday  7 - Sunday  
is_click                            
False               73          66  
True                45          43  

B clicks
day       1 - Monday  2 - Tuesday  3 - Wednesday  4 - Thursday  5 - Friday  \
is_click                                                                     
False             81           74             89            87          90   
True              32           45             35            29          38   

day       6 - Saturday  7 - Sunday  
is_click                            
False               76          75  
True                42          34  


In [22]:
a_clicks_day_pivot.loc['percent_of_clicks'] = (a_clicks_day_pivot.loc[True]*100)/(a_clicks_day_pivot.loc[False] + a_clicks_day_pivot.loc[True])
a_clicks_day_pivot.drop(index=[False,True], inplace=True)

b_clicks_day_pivot.loc['percent_of_clicks'] = (b_clicks_day_pivot.loc[True]*100)/(b_clicks_day_pivot.loc[False] + b_clicks_day_pivot.loc[True])
b_clicks_day_pivot.drop(index=[False,True], inplace=True)
print(f'A clicks\n{a_clicks_day_pivot}')
print(f'\nB clicks\n{b_clicks_day_pivot}')

A clicks
day                1 - Monday  2 - Tuesday  3 - Wednesday  4 - Thursday  \
is_click                                                                  
percent_of_clicks   38.053097    36.134454      30.645161     40.517241   

day                5 - Friday  6 - Saturday  7 - Sunday  
is_click                                                 
percent_of_clicks    39.84375     38.135593   39.449541  

B clicks
day                1 - Monday  2 - Tuesday  3 - Wednesday  4 - Thursday  \
is_click                                                                  
percent_of_clicks   28.318584    37.815126      28.225806          25.0   

day                5 - Friday  6 - Saturday  7 - Sunday  
is_click                                                 
percent_of_clicks     29.6875      35.59322   31.192661  


### Task 11
Compare the results for `A` and `B`. What happened over the course of the week?

Do you recommend that your company use Ad A or Ad B?

_The clicks flucturated throughout the week with higher views on the Monday and towards the end of the week. I'd reccomend the company uses ad A as it had a higher percentage of clicks on average._