## Analyze AB Test Result
- For this project, I've analyzed a test result ran by an e-commerce website. The company has developed a new web page in order to try and increase the number of users who "convert," meaning the number of users who decide to pay for the company's product. Your goal is to help the company understand if they should implement this new page, keep the old page, or perhaps run the experiment longer to make their decision.

In [1]:
# import tools
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
random.seed(42)

In [2]:
# import data
df = pd.read_csv('ab_data.csv')
df.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1


### Assess

- Number of rows

In [4]:
df.shape[0]

294478

- Number of unique users in the dataset

In [5]:
df.user_id.nunique()

290584

- The proportion of users converted

In [6]:
df.converted.mean()

0.11965919355605512

e. The number of times the `new_page` and `treatment` don't line up.

In [7]:
df_treat_nomatch = df[(df.group == 'treatment') & (df.landing_page != 'new_page')]
df_contr_nomatch = df[(df.group == 'control') & (df.landing_page != 'old_page')]

# create dataframe and index incase of dropping later
df_nomatch = pd.concat([df_treat_nomatch, df_contr_nomatch])
df_nomatch_index = df_nomatch.index

# grab lengh of the no match list
len(df_nomatch)

3893

- Do any of the rows have missing values?

In [8]:
df.isnull().any().sum()

0

`2.` For the rows where **treatment** is not aligned with **new_page** or **control** is not aligned with **old_page**, we cannot be sure if this row truly received the new or old page.  Use **Quiz 2** in the classroom to provide how we should handle these rows.  

a. Now use the answer to the quiz to create a new dataset that meets the specifications from the quiz.  Store your new dataframe in **df2**.

In [9]:
# drop no match values list for inaccuracy and create new dataframe
df2 = df.drop(df_nomatch_index)

In [10]:
# Double Check all of the correct rows were removed - this should be 0
df2[((df2['group'] == 'treatment') == (df2['landing_page'] == 'new_page')) == False].shape[0]

0

How many unique **user_id**s are in **df2**?

In [11]:
df2.user_id.nunique()

290584

There is one **user_id** repeated in **df2**.

In [12]:
df2[df2.user_id.duplicated()]

Unnamed: 0,user_id,timestamp,group,landing_page,converted
2893,773192,2017-01-14 02:55:59.590927,treatment,new_page,0


Remove **one** of the rows with a duplicate **user_id**, but keep your dataframe as **df2**.

In [13]:
# drop duplicates
df2.drop_duplicates(inplace=True)

What is the probability of an individual converting regardless of the page they receive?

In [14]:
df2.converted.mean()

0.11959667567149027

Given that an individual was in the `control` group, what is the probability they converted?

In [15]:
df2.query('group == "control"')['converted'].mean()

0.1203863045004612

Given that an individual was in the `treatment` group, what is the probability they converted?

In [16]:
df2.query('group == "treatment"')['converted'].mean()

0.11880724790277405

What is the probability that an individual received the new page?

In [17]:
df.query('landing_page == "new_page"').shape[0] / df2.shape[0]

0.5066985563604454

Is there evidence that one page leads to more convertions?
__Yes, data shows that receiving old page leads to more convertions.__