# A/B Testing for ShoeFly.com

An A/B Test is performed in this online shoe store. They have two different versions of an ad, which they have placed in emails and banner ads on Facebook, Twitter, and Google. For this exercise, they wanted to know how the two ads are performing on each of their different platforms on each day of the week.

## Useful information

- `user_id`: _unique number_ that identifies each user
- `utm_source`: tells from what UTM (Urchin Tracking Module) the ad click comes from - (Google, Facebook, Twitter(X), email)
- `day`: The day the user clicked the ad
- `ad_click_timestamp`: the time that the ad platform was clicked
- `experimental_group`: The two ads being tested - whether the user was shown ad A or ad B

### Step 1

I started by looking at the data and examining the first five rows. 

In [23]:
import pandas as pd

ad_clicks = pd.read_csv('shoefly_data.csv')
styled_underline = ad_clicks.style.set_table_styles(
    [{'selector': ' thead th', 'props': [('border-bottom', '2px solid black')]}]
)
styled_underline

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B
5,013b0072-7b72-40e7-b698-98b4d0c9967f,facebook,1 - Monday,,A
6,0153d85b-7660-4c39-92eb-1e1acd023280,google,4 - Thursday,,A
7,01555297-d6e6-49ae-aeba-1b196fdbb09f,google,3 - Wednesday,,A
8,018cea61-19ea-4119-895b-1a4309ccb148,email,1 - Monday,18:33,A
9,01a210c3-fde0-4e6f-8efd-4f0e38730ae6,email,2 - Tuesday,15:21,B


### Step 2

The manager wanted to know which ad platform was getting the most views so I examined how many views came from each UTM source creating a Series.

In [24]:
counting_views = ad_clicks.groupby('utm_source').user_id.count()
counting_views

utm_source
email       255
facebook    504
google      680
twitter     215
Name: user_id, dtype: int64

_Google was the most visited platform._ 

### Step 3

Checking whether the ad click timestamps are not null. If they are, this means that someone clicked on the ad that was displayed.
`True` means that a user clicked on the ad, `False` tell us otherwise.

In [29]:
new_column_is_click = ad_clicks['is_click'] = ad_clicks['ad_click_timestamp'].notnull()
new_column_is_click

0        True
1       False
2       False
3       False
4       False
        ...  
1649    False
1650    False
1651    False
1652     True
1653    False
Name: ad_click_timestamp, Length: 1654, dtype: bool

### Step 4

Now I wanted to know the **number of people** who clicked on ads from each UTM source.

In [31]:
clicks_by_source = ad_clicks.groupby(['utm_source', 'is_click']).user_id.count().reset_index()
styled_underline = clicks_by_source.style.set_table_styles(
    [{'selector': ' thead th', 'props': [('border-bottom', '2px solid black')]}]
)
styled_underline

Unnamed: 0,utm_source,is_click,user_id
0,email,False,175
1,email,True,80
2,facebook,False,324
3,facebook,True,180
4,google,False,441
5,google,True,239
6,twitter,False,149
7,twitter,True,66


_Google got the greater number of people that clicked on an ad, with 239 clicks. In turn Twitter is the platform with least number of people that clicked on an ad, with 66 clicks._

### Step 5

Pivoting the table for better readability.

In [32]:
clicks_pivot = clicks_by_source.pivot(
    columns = 'is_click',
    index='utm_source',
    values='user_id'
)
styled_underline = clicks_pivot.style.set_table_styles(
    [{'selector': ' thead th', 'props': [('border-bottom', '2px solid black')]}]
)
styled_underline

is_click,False,True
utm_source,Unnamed: 1_level_1,Unnamed: 2_level_1
email,175,80
facebook,324,180
google,441,239
twitter,149,66


### Step 6

Creating a column to calculate the **percentage of users** who clicked on the ad from each UTM source.

In [37]:
new_column = clicks_pivot['percentage_clicked'] = clicks_pivot[True]/(clicks_pivot[True]+clicks_pivot[False])*100
#converting into 2 decimals
clicks_pivot['percentage_clicked'] = clicks_pivot['percentage_clicked'].round(2)
clicks_pivot

is_click,False,True,percentage_clicked
utm_source,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
email,175,80,31.37
facebook,324,180,35.71
google,441,239,35.15
twitter,149,66,30.7


### Step 7: Analyzing an A/B Test

The column `experimental_group` tells us whether the user was shown Ad A or Ad B.

Checking the number of people shown in both ads.

In [8]:
check_ads = ad_clicks.groupby('experimental_group').user_id.count()
check_ads

experimental_group
A    827
B    827
Name: user_id, dtype: int64

_The number of users that have shown both ads is the same_.

### Step 8

Checking if a greater number of users clicked on ad A or ad B

In [9]:
check_ads_pencentage = ad_clicks.groupby(['experimental_group', 'is_click']).user_id.count().reset_index()
#print(check_ads_pencentage)

#Pivoting 
pivoted_check_ads_pencentage = check_ads_pencentage.pivot(columns='experimental_group',
index='is_click',
values='user_id')
print(pivoted_check_ads_pencentage)

experimental_group    A    B
is_click                    
False               517  572
True                310  255


_At this point, we can verify that 310 of the 827 users who were shown ad A clicked on it, while 255 clicked on ad B. So, a greater number of users clicked on ad A._

### Step 9

Checking if the clicks by `experimental_group` have changed by day of the week.

In [10]:
a_clicks = ad_clicks[ad_clicks.experimental_group == 'A']
b_clicks = ad_clicks[ad_clicks.experimental_group == 'B']
#print(a_clicks)

group_A = a_clicks.groupby(['is_click', 'day']).user_id.count().reset_index()
#print(group_A)

group_B = b_clicks.groupby(['is_click', 'day']).user_id.count().reset_index()
#print(group_B)

#Pivoting both table groups
pivoted_A = group_A.pivot(columns='day',
index='is_click',
values='user_id')

pivoted_B = group_B.pivot(columns='day',
index='is_click',
values='user_id')

print(pivoted_A)
print('\n')
print(pivoted_B)

day       1 - Monday  2 - Tuesday  3 - Wednesday  4 - Thursday  5 - Friday  \
is_click                                                                     
False             70           76             86            69          77   
True              43           43             38            47          51   

day       6 - Saturday  7 - Sunday  
is_click                            
False               73          66  
True                45          43  


day       1 - Monday  2 - Tuesday  3 - Wednesday  4 - Thursday  5 - Friday  \
is_click                                                                     
False             81           74             89            87          90   
True              32           45             35            29          38   

day       6 - Saturday  7 - Sunday  
is_click                            
False               76          75  
True                42          34  


### Step 10

Calculating the percentage of people, by group, who clicked on the ad by day.

In [11]:
pivoted_A.loc['percentage_clicked_A_by_day'] = pivoted_A.loc[True] / (pivoted_A.loc[True] + pivoted_A.loc[False]) * 100

print(pivoted_A.loc['percentage_clicked_A_by_day'].round(2))

day
1 - Monday       38.05
2 - Tuesday      36.13
3 - Wednesday    30.65
4 - Thursday     40.52
5 - Friday       39.84
6 - Saturday     38.14
7 - Sunday       39.45
Name: percentage_clicked_A_by_day, dtype: float64


_Thursday was the day most people on group A clicked on the ad._

In [12]:
pivoted_B.loc['percentage_clicked_B_by_day'] = pivoted_B.loc[True] / (pivoted_B.loc[True] + pivoted_B.loc[False]) * 100

print(pivoted_B.loc['percentage_clicked_B_by_day'].round(2))

day
1 - Monday       28.32
2 - Tuesday      37.82
3 - Wednesday    28.23
4 - Thursday     25.00
5 - Friday       29.69
6 - Saturday     35.59
7 - Sunday       31.19
Name: percentage_clicked_B_by_day, dtype: float64


_Tuesday was the day most people on group B clicked on the ad._