# A/B Test Project

This project looks at a data from an e-commerce website after a recent A/B test. The company has changed their web page to increase the nummber of paid users who buy their product or 'convert'. This dataset is provided by Kaggle from the user PUTDEJUDOMTHAI. https://www.kaggle.com/datasets/putdejudomthai/ecommerce-ab-testing-2022-dataset1/data 

## Project Goals
Objective is to show how successful the new page is when converting users to paid users, and whether there is enough evidence to show that the new site is more effective than the old. 

## Loading Data and Previewing

Beginning the project by importing the libaries I'm going to use. The dataset contains columns named:
user_id - Customer Identifier
timestamp - Time of interaction
group - Control or treatment
landing_page - New or old page
converted - Binary conversion indicator (1= converted, 0= not converted)

## Hypotheses tested
H0: Conversion rate (new) = Conversion rate (old)
H1: Conversion rate (new) â‰  Conversion rate (old)

In [12]:
# Import Libraries
import pandas as pd
import numpy as np 
import os
import statsmodels.api as sm
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

In [5]:
# load dataset
ab_data = pd.read_csv('ab_data.csv')
ab_data.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,11:48.6,control,old_page,0
1,804228,01:45.2,control,old_page,0
2,661590,55:06.2,treatment,new_page,0
3,853541,28:03.1,treatment,new_page,0
4,864975,52:26.2,control,old_page,1


In [14]:
# Load dataset
country_data = pd.read_csv('countries.csv')
country_data.tail()

Unnamed: 0,user_id,country
290581,799368,UK
290582,655535,CA
290583,934996,UK
290584,759899,US
290585,643532,US


### Handling the Data

Checking for unique users.
Identify users with the wrong page and group combinations
Remove duplicate users

In [20]:
print(ab_data.nunique())

user_id         290585
country              3
timestamp        35993
group                2
landing_page         2
converted            2
dtype: int64


In [24]:
page_group = pd.crosstab(ab_data['landing_page'], ab_data['group'])
print(page_group)

group         control  treatment
landing_page                    
new_page         1928     145315
old_page       145274       1965


In [86]:
# Remove missassigned users
clean_data = ab_data[((ab_data['group'] == 'control') & (ab_data['landing_page'] == 'old_page')) &
                     ((ab_data['group'] == 'treatment') & (ab_data['landing_page'] == 'new_page'))]

In [94]:
# Removing Duplicate users 
clean_data = clean_data.drop_duplicates(subset=['user_id'])
print(f"\nCleaned dataset shape: {clean_data.shape}")
print("\nValidated group/page distribution:")
display(pd.crosstab(clean_data['group'], clean_data['landing_page']))


Cleaned dataset shape: (290585, 5)

Validated group/page distribution:


landing_page,new_page,old_page
group,Unnamed: 1_level_1,Unnamed: 2_level_1
control,922,144341
treatment,144395,927
