# Which version of the website should you use?

## 📖 Background
You work for an early-stage startup in Germany. Your team has been working on a redesign of the landing page. The team believes a new design will increase the number of people who click through and join your site. 

They have been testing the changes for a few weeks and now they want to measure the impact of the change and need you to determine if the increase can be due to random chance or if it is statistically significant.

## 💾 The data
The team assembled the following file:

#### Redesign test data
- "treatment" - "yes" if the user saw the new version of the landing page, no otherwise.
- "new_images" - "yes" if the page used a new set of images, no otherwise.
- "converted" - 1 if the user joined the site, 0 otherwise.

The control group is those users with "no" in both columns: the old version with the old set of images.

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency, chi2

In [2]:
df = pd.read_csv('./data/redesign.csv')
df.head()

Unnamed: 0,treatment,new_images,converted
0,yes,yes,0
1,yes,yes,0
2,yes,yes,0
3,yes,no,0
4,no,yes,0


## 💪 Challenge
Complete the following tasks:

1. Analyze the conversion rates for each of the four groups: the new/old design of the landing page and the new/old pictures.
2. Can the increases observed be explained by randomness? (Hint: Think A/B test)
3. Which version of the website should they use?

---
# Explore the data

In [3]:
# size of dataset
len(df)

40484

In [4]:
# is there any missing entries?
df.isna().sum()

treatment     0
new_images    0
converted     0
dtype: int64

In [5]:
# check converted / not converted proprtions
pd.concat([df['converted'].value_counts(), df['converted'].value_counts(normalize=True)], axis=1)

Unnamed: 0,converted,converted.1
0,35895,0.886647
1,4589,0.113353


In [6]:
df_summary = df.pivot_table(
    values='converted',
    index=['treatment', 'new_images'],
    columns='converted',
    aggfunc=lambda x: len(x),
)
df_summary['Total'] = df_summary[0] + df_summary[1]
df_summary

Unnamed: 0_level_0,converted,0,1,Total
treatment,new_images,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
no,no,9037,1084,10121
no,yes,8982,1139,10121
yes,no,8906,1215,10121
yes,yes,8970,1151,10121


---
# Hypothesis testing

The dataset contains information on two types of changes:
- whether the user saw the new version of the landing page;
- whether the page used a new set of images.

To check the effect of the two types of changes I will analyse them separately using multiple hypothesis testings. Therefore I form 4 separate groups to compare:
- **Control group**: old version / old images
- **Treatment group 1**: old version / new images
- **Treatment group 2**: new version / old images
- **Treatment group 3**: new version / new images

*Hypothesis testing 1 -> Control group vs Treatment group 1*<br>
- **Null hypothesis (H0):** provided that the user saw the old version of the landing page, using a new set of images does not have effect on conversion rate.
- **Alternativ hypothesis (H1):** provided that the user saw the old version of the landing page, using a new set of images has effect on conversion rate.

*Hypothesis testing 2 -> Control group vs Treatment group 2*<br>
- **Null hypothesis (H0):** provided that the page used the old set of images, seeing the new version of the landing page does not have effect on conversion rate.
- **Alternativ hypothesis (H1):** provided that the page used the old set of images, seeing the new version of the landing page has effect on conversion rate.

*Hypothesis testing 3 -> Control group vs Treatment group 3*<br>
- **Null hypothesis (H0):** seeing the new version of the landing page using a new set of images does not have effect on conversion rate.
- **Alternativ hypothesis (H1):** seeing the new version of the landing page using a new set of images has effect on conversion rate.

## Hypothesis testing using Pearson's Chi-Squared Test

using 95% confidence

In [7]:
prob = 0.95

In [8]:
# utility functions
from typing import Optional


def get_contingency_table(treatment: Optional[str]=None, new_images: Optional[str]=None) -> np.array:
    if treatment and new_images:
        return df_summary.xs((treatment, new_images), axis=0, drop_level=True)[[0,1]].values
    elif treatment:
        return df_summary.xs(treatment, axis=0, level='treatment', drop_level=True)[[0,1]].values
    elif new_images:
        return df_summary.xs(new_images, axis=0, level='new_images', drop_level=True)[[0,1]].values
    else:
        return np.array([])
    
    
def get_chi_squared_interpretations(contingency_table: np.array) -> None:
    # get statistics
    stat, p, dof, expected = chi2_contingency(contingency_table)
    
    # interpret test-statistic
    critical = chi2.ppf(prob, dof)
    print(f'probability={prob:.3f}, critical={critical:.3f}, stat={stat:.3f}')
    if abs(stat) >= critical:
        print('Groups are dependent (reject H0)')
    else:
        print('No evidence for the groups being dependent (fail to reject H0)')
          
    print()
    # interpret p-value
    alpha = 1.0 - prob
    print(f'significance={alpha:.3f}, p={p:.3f}')
    if p <= alpha:
        print('Groups are dependent (reject H0)')
    else:
        print('No evidence for the groups being dependent (fail to reject H0)')

### Hypothesis testing 1 -> Control group vs Treatment group 1

In [9]:
get_chi_squared_interpretations(get_contingency_table(treatment='no'))

probability=0.950, critical=3.841, stat=1.474
No evidence for the groups being dependent (fail to reject H0)

significance=0.050, p=0.225
No evidence for the groups being dependent (fail to reject H0)


*Based on the Chi-Squared test, provided that the user saw the old version of the landing page, using a new set of images has significant effect on conversion rate.*

### Hypothesis testing 2 -> Control group vs Treatment group 2

In [10]:
get_chi_squared_interpretations(get_contingency_table(new_images='no'))

probability=0.950, critical=3.841, stat=8.293
Groups are dependent (reject H0)

significance=0.050, p=0.004
Groups are dependent (reject H0)


*Based on the Chi-Squared test, provided that the page used the old set of images, seeing the new version of the landing page has significant effect on conversion rate.*

### Hypothesis testing 3 -> Control group vs Treatment group 3

In [11]:
contingency_table = np.vstack([
    get_contingency_table(treatment='no', new_images='no'),
    get_contingency_table(treatment='yes', new_images='yes'),
])
get_chi_squared_interpretations(contingency_table)

probability=0.950, critical=3.841, stat=2.191
No evidence for the groups being dependent (fail to reject H0)

significance=0.050, p=0.139
No evidence for the groups being dependent (fail to reject H0)


*Based on the Chi-Squared test seeing the new version of the landing page using a new set of images has effect on conversion rate.*

# Hypothesis testing using permutation test (Monte Carlo simulation)

In [12]:
# utility functions
def diff_of_conversions(data_h0: np.array, data_h1:np.array) -> int:
    """ Returns difference in number of conversions (test statistic).
    data1 and data2 are arrays of 0s and 1s. """
    return data_h1.sum() - data_h0.sum()


def permutation_sample(data_h0: np.array, data_h1:np.array) -> (np.array, np.array):
    """Generate a permutation sample from two data sets."""
    data = np.concatenate((data_h0, data_h1))
    permuted_data = np.random.permutation(data)
    perm_sample_h0 = permuted_data[:len(data_h0)]
    perm_sample_h1 = permuted_data[len(data_h1):]
    return perm_sample_h0, perm_sample_h1


def draw_permutation_replicates(data_h0, data_h1, func, size=1):
    """Generate multiple permutation replicates."""
    perm_replicates = np.empty(size)
    for i in range(size):
        perm_sample_h0, perm_sample_h1 = permutation_sample(data_h0, data_h1)
        perm_replicates[i] = func(perm_sample_h0, perm_sample_h1)
    return perm_replicates

def get_permutation_test_interpretations(data_h0: np.array, data_h1: np.array, size: int=10000) -> None:
    # Compute difference of conversions
    empirical_diff_of_conversions = diff_of_conversions(data_h0, data_h1)

    # Draw permutation replicates
    perm_replicates = draw_permutation_replicates(data_h0, data_h1, diff_of_conversions, size=size)
    
    # Compute p-value
    p = np.sum(perm_replicates >= empirical_diff_of_conversions) / len(perm_replicates)
    
    # interpret p-value
    alpha = 1.0 - prob
    print(f'observed difference={empirical_diff_of_conversions}')
    print(f'significance={alpha:.3f}, p={p:.3f}')
    if p <= alpha:
        print(f"""Assuming H0 is true, the probability of obtaining a test statistic that is at least as extreme as observed ({empirical_diff_of_conversions})
                was less than {alpha:.3f} (reject H0)""")
    else:
        print(f"""Assuming H0 is true, the probability of obtaining a test statistic that is at least as extreme as observed ({empirical_diff_of_conversions})
                was more than {alpha:.3f} (fail to reject H0)""")

In [13]:
# prepare datasets
data_h0 = df[(df.treatment == 'no') & (df.new_images == 'no')]['converted']
data_h1_treatment1 = df[(df.treatment == 'no') & (df.new_images == 'yes')]['converted']
data_h1_treatment2 = df[(df.treatment == 'yes') & (df.new_images == 'no')]['converted']
data_h1_treatment3 = df[(df.treatment == 'yes') & (df.new_images == 'yes')]['converted']

### Hypothesis testing 1 -> Control group vs Treatment group 1

In [14]:
np.random.seed(47)
get_permutation_test_interpretations(data_h0, data_h1_treatment1)

observed difference=55
significance=0.050, p=0.116
Assuming H0 is true, the probability of obtaining a test statistic that is at least as extreme as observed (55)
                was more than 0.050 (fail to reject H0)


*Based on the Permutation test, provided that the user saw the old version of the landing page, using a new set of images has **no** significant effect on conversion rate. (under the null hypothesis, probability of seeing 55 or more conversions was 11.6%.)*

### Hypothesis testing 2 -> Control group vs Treatment group 2

In [15]:
np.random.seed(47)
get_permutation_test_interpretations(data_h0, data_h1_treatment2)

observed difference=131
significance=0.050, p=0.003
Assuming H0 is true, the probability of obtaining a test statistic that is at least as extreme as observed (131)
                was less than 0.050 (reject H0)


*Based on the Permutation test, provided that the page used the old set of images, seeing the new version of the landing page has significant effect on conversion rate. (under the null hypothesis, probability of seeing 131 or more conversions was 0.3%.)*

### Hypothesis testing 3 -> Control group vs Treatment group 3

In [16]:
np.random.seed(47)
get_permutation_test_interpretations(data_h0, data_h1_treatment3)

observed difference=67
significance=0.050, p=0.066
Assuming H0 is true, the probability of obtaining a test statistic that is at least as extreme as observed (67)
                was more than 0.050 (fail to reject H0)


*Based on the Permutation test, seeing the new version of the landing page using a new set of images has **no** significant effect on conversion rate. (under the null hypothesis, probability of seeing 67 or more conversions was 6.6%.)*

---
# Final conclusion

Chi-squared test showed that treatment groups (both new landing page with old images, old landing page with new images and new landing page with new images) and control group (old landing page with old images) are dependent suggesting that any of the changes have significant effect on the conversion rates.

In partial contradiction with the above, the permutation test showed that a new landing page without new images has significant effect on the conversion rates while other treatments (having new images irrespectively from having old or new landing page) does not have significant effect on the conversion rates.

**As a conclusion, the website has to go on with the *new* version of landing page, *without* using new images.**