# Which version of the website should you use?

## 📖 Background
You work for an early-stage startup in Germany. Your team has been working on a redesign of the landing page. The team believes a new design will increase the number of people who click through and join your site. 

They have been testing the changes for a few weeks and now they want to measure the impact of the change and need you to determine if the increase can be due to random chance or if it is statistically significant.

## 💾 The data
The team assembled the following file:

#### Redesign test data
- "treatment" - "yes" if the user saw the new version of the landing page, no otherwise.
- "new_images" - "yes" if the page used a new set of images, no otherwise.
- "converted" - 1 if the user joined the site, 0 otherwise.

The control group is those users with "no" in both columns: the old version with the old set of images.

In [None]:
import numpy as np
import scipy.stats
import pandas as pd

In [None]:
df = pd.read_csv('./data/redesign.csv')
df.head()

Unnamed: 0,treatment,new_images,converted
0,yes,yes,0
1,yes,yes,0
2,yes,yes,0
3,yes,no,0
4,no,yes,0


1. Analyze the conversion rates for each of the four groups: the new/old design of the landing page and the new/old pictures.

In [None]:
convrate = df.groupby(['new_images','treatment']).mean()
convrate

Unnamed: 0_level_0,Unnamed: 1_level_0,converted
new_images,treatment,Unnamed: 2_level_1
no,no,0.107104
no,yes,0.120047
yes,no,0.112538
yes,yes,0.113724


2. Can the increases observed be explained by randomness? (Hint: Think A/B test)

In [None]:
control = convrate.iloc[0][0]
oldpic_newsite = convrate.iloc[1][0]
newpic_oldsite = convrate.iloc[2][0]
newpic_newsite = convrate.iloc[3][0]

In [None]:
pd.crosstab(index = df['new_images'], columns = df['treatment'])

treatment,no,yes
new_images,Unnamed: 1_level_1,Unnamed: 2_level_1
no,10121,10121
yes,10121,10121


In [None]:
## Hypothesis 1: new site vs control

n_control = 10121
n_oldpic_newsite = 10121
p_control = control
p_oldpic_newsite = oldpic_newsite
SE = np.sqrt( p_control*(1-p_control)/n_control + p_oldpic_newsite*(1-p_oldpic_newsite)/n_oldpic_newsite )
t = abs((oldpic_newsite - control)) / SE
print('p_value of differences between new site and control:',scipy.stats.t.sf(abs(t), df=n_control))


p_value of differences between new site and control: 0.0018550158413607619


In [None]:
## Hypothesis 2: new pic vs control

n_control = 10121
n_newpic_oldsite = 10121
p_control = control
p_newpic_oldsite = newpic_oldsite
SE = np.sqrt( p_control*(1-p_control)/n_control + p_newpic_oldsite*(1-p_newpic_oldsite)/n_newpic_oldsite )
t = abs((newpic_oldsite - control)) / SE
print('p_value of differences between new picture and control:',scipy.stats.t.sf(abs(t), df=n_control))

p_value of differences between new picture and control: 0.10816310122876607


In [None]:
## Hypothesis 3: new pic new site vs control

n_control = 10121
n_newpic_newsite = 10121
p_control = control
p_newpic_newsite = newpic_newsite
SE = np.sqrt( p_control*(1-p_control)/n_control + p_newpic_newsite*(1-p_newpic_newsite)/n_newpic_newsite )
t = abs((newpic_newsite - control)) / SE
print('p_value of differences between new picture new sites and control:',scipy.stats.t.sf(abs(t), df=n_control))

p_value of differences between new picture new sites and control: 0.0664764878532551


The conversion rate difference between new site with old pictures and control was significant, while the others were not. Therefore, only the increase in conversion rate by implimenting old pictures and new site was not random. 

3. Which version of the website should they use?

I would recommend the new site with old pictures. This site increases conversion rate significantly. 