<a href="https://colab.research.google.com/github/Tanishqchahal/hypothesis-testing/blob/main/hypothesis_testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

An online bookstore is looking to optimize its website design to improve user engagement and ultimately increase book purchases. The website currently offers two themes for its users: “Light Theme” and “Dark Theme.” The bookstore’s data science team wants to conduct an A/B testing experiment to determine which theme leads to better user engagement and higher conversion rates for book purchases.

The data collected by the bookstore contains user interactions and engagement metrics for both the Light Theme and Dark Theme. The dataset includes the following key features:

Theme: dark or light
Click Through Rate: The proportion of the users who click on links or buttons on the website.
Conversion Rate: The percentage of users who signed up on the platform after visiting for the first time.
Bounce Rate: The percentage of users who leave the website without further interaction after visiting a single page.
Scroll Depth: The depth to which users scroll through the website pages.
Age: The age of the user.
Location: The location of the user.
Session Duration: The duration of the user’s session on the website.
Purchases: Whether the user purchased the book (Yes/No).
Added_to_Cart: Whether the user added books to the cart (Yes/No).
Your task is to identify which theme, Light Theme or Dark Theme, yields better user engagement, purchases and conversion rates. You need to determine if there is a statistically significant difference in the key metrics between the two themes.


In [1]:
import pandas as pd
from scipy.stats import ttest_ind

In [3]:
df = pd.read_csv("website_ab_test.csv")

In [6]:
df.head()

Unnamed: 0,Theme,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Location,Session_Duration,Purchases,Added_to_Cart
0,Light Theme,0.05492,0.282367,0.405085,72.489458,25,Chennai,1535,No,Yes
1,Light Theme,0.113932,0.032973,0.732759,61.858568,19,Pune,303,No,Yes
2,Dark Theme,0.323352,0.178763,0.296543,45.737376,47,Chennai,563,Yes,Yes
3,Light Theme,0.485836,0.325225,0.245001,76.305298,58,Pune,385,Yes,No
4,Light Theme,0.034783,0.196766,0.7651,48.927407,25,New Delhi,1437,No,No


In [8]:
df.shape

(1000, 10)

In [9]:
df.isnull().sum()

Theme                 0
Click Through Rate    0
Conversion Rate       0
Bounce Rate           0
Scroll_Depth          0
Age                   0
Location              0
Session_Duration      0
Purchases             0
Added_to_Cart         0
dtype: int64

In [12]:
df.describe()

Unnamed: 0,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Session_Duration
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.256048,0.253312,0.505758,50.319494,41.528,924.999
std,0.139265,0.139092,0.172195,16.895269,14.114334,508.231723
min,0.010767,0.010881,0.20072,20.011738,18.0,38.0
25%,0.140794,0.131564,0.353609,35.655167,29.0,466.5
50%,0.253715,0.252823,0.514049,51.130712,42.0,931.0
75%,0.370674,0.37304,0.648557,64.666258,54.0,1375.25
max,0.499989,0.498916,0.799658,79.997108,65.0,1797.0


In [31]:
df = df.drop(['Location', 'Purchases', 'Added_to_Cart'], axis = 1)

In [32]:
df.head()

Unnamed: 0,Theme,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Session_Duration
0,Light Theme,0.05492,0.282367,0.405085,72.489458,25,1535
1,Light Theme,0.113932,0.032973,0.732759,61.858568,19,303
2,Dark Theme,0.323352,0.178763,0.296543,45.737376,47,563
3,Light Theme,0.485836,0.325225,0.245001,76.305298,58,385
4,Light Theme,0.034783,0.196766,0.7651,48.927407,25,1437


In [33]:
# Grouping the data by theme and calculating mean value for all metrics
theme_performance = df.groupby('Theme').mean()

In [35]:
#sorting the data by conversion rate
theme_performance_sorted = theme_performance.sort_values(by="Conversion Rate", ascending=False)

In [36]:
theme_performance_sorted

Unnamed: 0_level_0,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Session_Duration
Theme,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Light Theme,0.247109,0.255459,0.499035,50.735232,41.734568,930.833333
Dark Theme,0.264501,0.251282,0.512115,49.926404,41.332685,919.48249


H0 : u(conv_rt_light)-u(conv_rt_dark) = 0  
Ha : u(conv_rt_light)-u(conv_rt_dark) != 0  
alpha is 5%

In [37]:
conversion_rate_light = df[df['Theme']=='Light Theme']['Conversion Rate']
conversion_rate_dark = df[df['Theme']=='Dark Theme']['Conversion Rate']

In [50]:
t_stat_conv, p_value_conv = ttest_ind(conversion_rate_light,conversion_rate_dark, equal_var=False)
t_stat_conv, p_value_conv

(0.4748494462782632, 0.6349982678451778)

p-value(0.635) is greater than alpha(0.05) therefore we fail to reject the null hypothesis (based on Conversion Rates)

H0 : u(ctr_light)-u(ctr_dark) = 0  
Ha : u(ctr_light)-u(ctr_dark) != 0  
alpha is 5%

In [40]:
ctr_light = df[df['Theme']=='Light Theme']['Click Through Rate']
ctr_dark = df[df['Theme']=='Dark Theme']['Click Through Rate']

In [51]:
t_stat_ctr, p_value_ctr = ttest_ind(ctr_light,ctr_dark, equal_var=False)
t_stat_ctr, p_value_ctr

(-1.9781708664172253, 0.04818435371010704)

p-value(0.0.49) is less than alpha(0.05) therefore we reject the null hypothesis (based on Click Through Rates)

H0 : u(bounce_rate_light)-u(bounce_rate_dark) = 0  
Ha : u(bounce_rate_light)-u(bounce_rate_dark) != 0  
alpha is 5%

In [44]:
bounce_rate_light = df[df['Theme']=='Light Theme']['Bounce Rate']
bounce_rate_dark = df[df['Theme']=='Dark Theme']['Bounce Rate']

In [52]:
t_stat_brate, p_value_brate = ttest_ind(bounce_rate_light,bounce_rate_dark, equal_var=False)
t_stat_brate, p_value_brate

(-1.2018883310494073, 0.229692077505148)

p-value(0.23) is greater than alpha(0.05) therefore we fail to reject the null hypothesis (based on Bounce Rate)

H0 : u(scroll_depth_light)-u(scroll_depth_dark) = 0  
Ha : u(scroll_depth_light)-u(scroll_depth_dark) != 0  
alpha is 5%

In [47]:
scroll_depth_light = df[df['Theme']=='Light Theme']['Scroll_Depth']
scroll_depth_dark = df[df['Theme']=='Dark Theme']['Scroll_Depth']

In [53]:
t_stat_scrldep, p_value_scrldep = ttest_ind(scroll_depth_light, scroll_depth_dark, equal_var=False)
t_stat_scrldep, p_value_scrldep

(0.7562277864140986, 0.4496919249484911)

p-value(0.45) is greater than alpha(0.05) therefore we fail to reject the null hypothesis (based on Scroll Depth)

In [58]:
comparison_table = pd.DataFrame({
    'Metrics' : ['Conversion Rate','Click Through Rate', 'Bounce Rate', 'Scroll Depth'],
    'T-stat' : [t_stat_conv,t_stat_ctr,t_stat_brate,t_stat_scrldep],
    'P-values': [p_value_conv,p_value_ctr,p_value_brate,p_value_scrldep]
})
comparison_table

Unnamed: 0,Metrics,T-stat,P-values
0,Conversion Rate,0.474849,0.634998
1,Click Through Rate,-1.978171,0.048184
2,Bounce Rate,-1.201888,0.229692
3,Scroll Depth,0.756228,0.449692


Both themes perform similarly for most metrics. The Dark Theme however perform better in terms of click through rate.