### A/B Testing for ShoeFly.com

The data for this project was provided by CodeAcademy.

In this project we will perform an A/B test on two different version of there ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week.

In [2]:
# Importing our modules and loading in data
import numpy as np
import pandas as pd

data = pd.read_csv('ShoeFly AB Data.csv')

In [3]:
# Examining the data
data.head()

Unnamed: 0,user_id,utm_source,day,ad_click_timestamp,experimental_group
0,008b7c6c-7272-471e-b90e-930d548bd8d7,google,6 - Saturday,7:18,A
1,009abb94-5e14-4b6c-bb1c-4f4df7aa7557,facebook,7 - Sunday,,B
2,00f5d532-ed58-4570-b6d2-768df5f41aed,twitter,2 - Tuesday,,A
3,011adc64-0f44-4fd9-a0bb-f1506d2ad439,google,2 - Tuesday,,B
4,012137e6-7ae7-4649-af68-205b4702169c,facebook,7 - Sunday,,B


In [4]:
# Which platform is getting the most views?
data.groupby('utm_source').user_id.count().reset_index()

Unnamed: 0,utm_source,user_id
0,email,255
1,facebook,504
2,google,680
3,twitter,215


In [5]:
# Creating a new column based on who clicked the ads
data['is_click'] = ~data.ad_click_timestamp.isnull()

# Finding out the percentage of people who clicked on ads for each utm_source
# Starting by grouping utm_source and is_click
clicks_by_source = data.groupby(['utm_source', 'is_click']).user_id.count().reset_index()


# Pivoting the data
data_pivot = clicks_by_source.pivot(
    index = 'utm_source',
    columns = 'is_click',
    values = 'user_id'
).reset_index()

# New percent clicked column 
data_pivot['percent_clicked'] = data_pivot[True] / (data_pivot[True] + data_pivot[False])
data_pivot.head()

is_click,utm_source,False,True,percent_clicked
0,email,175,80,0.313725
1,facebook,324,180,0.357143
2,google,441,239,0.351471
3,twitter,149,66,0.306977


In [6]:
# Analysing the A/B Test
# How many people in groups A and B
data.groupby('experimental_group').user_id.count().reset_index()

Unnamed: 0,experimental_group,user_id
0,A,827
1,B,827


In [7]:
# Clicks per experimental group
clicks_by_group = data.groupby(['experimental_group', 'is_click']).user_id.count().reset_index()

# Pivoting the data
group_pivot = clicks_by_group.pivot(
    index = 'experimental_group',
    columns = 'is_click',
    values = 'user_id'
).reset_index()

# New percent clicked column 
group_pivot['percent_clicked'] = group_pivot[True] / (group_pivot[True] + group_pivot[False])
group_pivot.head()

#It looks like group A had the higher amount of % clicked!

is_click,experimental_group,False,True,percent_clicked
0,A,517,310,0.374849
1,B,572,255,0.308343


In [18]:
# Looking at the clicks on particular days

#Dataframes for the A group and B group
a_clicks = data[data['experimental_group'] == 'A']
b_clicks = data[data['experimental_group'] == 'B']

a_grouped = a_clicks.groupby(['is_click', 'day']).user_id.count().reset_index()
b_grouped = b_clicks.groupby(['is_click', 'day']).user_id.count().reset_index()

# Pivoting the data
a_pivot = a_grouped.pivot(
    index = 'day',
    columns = 'is_click',
    values = 'user_id'
).reset_index()

b_pivot = b_grouped.pivot(
    index = 'day',
    columns = 'is_click',
    values = 'user_id'
).reset_index()

# New percent clicked column 
a_pivot['percent_clicked'] = a_pivot[True] / (a_pivot[True] + a_pivot[False])
b_pivot['percent_clicked'] = b_pivot[True] / (b_pivot[True] + b_pivot[False])

print('Experimental Group A:')
print(a_pivot)
print('\n')
print('Experimental Group B:')
print(b_pivot)

# Looks like Ad A is the winner!


Experimental Group A:
is_click            day  False  True  percent_clicked
0            1 - Monday     70    43         0.380531
1           2 - Tuesday     76    43         0.361345
2         3 - Wednesday     86    38         0.306452
3          4 - Thursday     69    47         0.405172
4            5 - Friday     77    51         0.398438
5          6 - Saturday     73    45         0.381356
6            7 - Sunday     66    43         0.394495


Experimental Group B:
is_click            day  False  True  percent_clicked
0            1 - Monday     81    32         0.283186
1           2 - Tuesday     74    45         0.378151
2         3 - Wednesday     89    35         0.282258
3          4 - Thursday     87    29         0.250000
4            5 - Friday     90    38         0.296875
5          6 - Saturday     76    42         0.355932
6            7 - Sunday     75    34         0.311927
