## Analyze A/B Test Results

<a id='intro'></a>
### Introduction

A/B tests are very commonly performed by data analysts and data scientists.

For this project, Our goal is to work through this notebook to help the company understand if they should implement the new page, keep the old page, or perhaps run the experiment longer to make their decision.

### Importing needed libiraries

In [1]:
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline
random.seed(42)

### Wrangling the data

In [2]:
# reading the data and storing it in a dataframe
df = pd.read_csv('Raw_Data/ab_data.csv')
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1


In [3]:
# number of rows in the dataset
df.shape[0]

294478

In [4]:
# the number of unique users in the dataset
len(df.user_id.unique())

290584

In [5]:
# the proportion of total users that converted regardless of page = coverted / total
df[df['converted'] == 1]['converted'].count() / df['converted'].count()

0.11965919355605512

In [6]:
# the number of times a user from the experiment group landed in the old page
df[((df['group'] == 'treatment') == (df['landing_page'] == 'new_page')) == False]['user_id'].count()

3893

In [7]:
# checking for missing values
df.isnull().sum()

user_id         0
timestamp       0
group           0
landing_page    0
converted       0
dtype: int64

### note:
For the rows where **treatment** is not aligned with **new_page** or **control** is not aligned with **old_page**, we cannot be sure if this row truly received the new or old page.

In [8]:
# extracing the rows where each group dont't line up with their landing page
df1 = df[((df['group'] == 'treatment') == (df['landing_page'] == 'new_page')) == False]
df1.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
22,767017,2017-01-12 22:58:14.991443,control,new_page,0
240,733976,2017-01-11 15:11:16.407599,control,new_page,0
308,857184,2017-01-20 07:34:59.832626,treatment,old_page,0
327,686623,2017-01-09 14:26:40.734775,treatment,old_page,0
357,856078,2017-01-12 12:29:30.354835,treatment,old_page,0


In [9]:
# removing the rows where each group dont't align with their landing page
df2 = df.drop(index=df1.index)
df2

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1
...,...,...,...,...,...
294473,751197,2017-01-03 22:28:38.630509,control,old_page,0
294474,945152,2017-01-12 00:51:57.078372,control,old_page,0
294475,734608,2017-01-22 11:45:03.439544,control,old_page,0
294476,697314,2017-01-15 01:20:28.957438,control,old_page,0


In [10]:
# Double Check all of the correct rows were removed - this should be 0
df2[((df2['group'] == 'treatment') == (df2['landing_page'] == 'new_page')) == False].shape[0]

0

In [11]:
# number of unique user_ids are in df2
len(df2.user_id.unique())

290584

In [12]:
# checking for duplicates
df2[df2['user_id'].duplicated(keep=False)]

Unnamed: 0,user_id,timestamp,group,landing_page,converted
1899,773192,2017-01-09 05:37:58.781806,treatment,new_page,0
2893,773192,2017-01-14 02:55:59.590927,treatment,new_page,0


In [13]:
# removing one of the dulicates
df2.drop_duplicates('user_id', inplace=True)

<a id='probability'></a>
### Part I - Probability

In [14]:
# the probability of an individual converting regardless of the page they receive after cleaning the dataset
p_converted = df2[df2['converted'] == 1]['converted'].count() / df2['converted'].count()
p_converted

0.11959708724499628

In [15]:
# the probability of an individual converting given he landed in the control group
p_control_converted = df2[(df2['converted'] == 1) & (df2['group'] == 'control')]['converted'].count() / df2[df2['group'] == 'control']['group'].count()
p_control_converted

0.1203863045004612

In [16]:
# the probability of an individual converting given he landed in the experiment group
p_treat_converted = df2[(df2['converted'] == 1) & (df2['group'] == 'treatment')]['converted'].count() / df2[df2['group'] == 'treatment']['group'].count()
p_treat_converted

0.11880806551510564

In [17]:
#  the probability that an individual received the new page
p_newpage = df2[df2['landing_page'] == 'new_page']['landing_page'].count() / df2['landing_page'].count()
p_newpage

0.5000619442226688

In [18]:
# calculating the actual difference observed in the dataset
obs_diff = p_treat_converted - p_control_converted

**Probability Conclusion**

- from the probability that an individual received the new page =0.5 we see that both groups had the same sample size.
- "p_control_converted" = 0.1204 -- and -- "p_treatment_converted" = 0.1188
- The propapility that an individual from the control group converted is almost equal to the propapility that an individual from the treatment group converted.
- thats is not a sufficient evidence to say that the new treatment page leads to more conversions
- rather it even did less conversions.