<a href="https://colab.research.google.com/github/Sowmya74/Hypothesis_Testing/blob/main/Light_vs_Dark_Theme_Hypothesis_Testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
from scipy.stats import ttest_ind

In [2]:
df = pd.read_csv('website_ab_test.csv')

In [3]:
df.head(5)

Unnamed: 0,Theme,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Location,Session_Duration,Purchases,Added_to_Cart
0,Light Theme,0.05492,0.282367,0.405085,72.489458,25,Chennai,1535,No,Yes
1,Light Theme,0.113932,0.032973,0.732759,61.858568,19,Pune,303,No,Yes
2,Dark Theme,0.323352,0.178763,0.296543,45.737376,47,Chennai,563,Yes,Yes
3,Light Theme,0.485836,0.325225,0.245001,76.305298,58,Pune,385,Yes,No
4,Light Theme,0.034783,0.196766,0.7651,48.927407,25,New Delhi,1437,No,No


In [4]:
df.tail(5)

Unnamed: 0,Theme,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Location,Session_Duration,Purchases,Added_to_Cart
995,Dark Theme,0.282792,0.401605,0.20072,68.478822,25,Kolkata,321,Yes,Yes
996,Dark Theme,0.299917,0.026372,0.762641,73.019821,38,Chennai,1635,Yes,Yes
997,Light Theme,0.370254,0.019838,0.607136,33.963298,32,Bangalore,1237,No,Yes
998,Light Theme,0.095815,0.137953,0.458898,37.429284,24,Chennai,893,Yes,No
999,Dark Theme,0.342588,0.061315,0.45241,31.613326,33,Chennai,129,Yes,Yes


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Theme               1000 non-null   object 
 1   Click Through Rate  1000 non-null   float64
 2   Conversion Rate     1000 non-null   float64
 3   Bounce Rate         1000 non-null   float64
 4   Scroll_Depth        1000 non-null   float64
 5   Age                 1000 non-null   int64  
 6   Location            1000 non-null   object 
 7   Session_Duration    1000 non-null   int64  
 8   Purchases           1000 non-null   object 
 9   Added_to_Cart       1000 non-null   object 
dtypes: float64(4), int64(2), object(4)
memory usage: 78.2+ KB


In [11]:
df.isnull().sum()

Unnamed: 0,0
Theme,0
Click Through Rate,0
Conversion Rate,0
Bounce Rate,0
Scroll_Depth,0
Age,0
Location,0
Session_Duration,0
Purchases,0
Added_to_Cart,0


In [12]:
df.describe()

Unnamed: 0,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Session_Duration
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.256048,0.253312,0.505758,50.319494,41.528,924.999
std,0.139265,0.139092,0.172195,16.895269,14.114334,508.231723
min,0.010767,0.010881,0.20072,20.011738,18.0,38.0
25%,0.140794,0.131564,0.353609,35.655167,29.0,466.5
50%,0.253715,0.252823,0.514049,51.130712,42.0,931.0
75%,0.370674,0.37304,0.648557,64.666258,54.0,1375.25
max,0.499989,0.498916,0.799658,79.997108,65.0,1797.0


The dataset contains 1,000 records across 10 columns, with no missing values.

* **Click Through Rate:** Ranges from about 0.01 to 0.50 with a mean of approximately 0.26.
* **Conversion Rate:** Also ranges from about 0.01 to 0.50 with a mean close to the Click Through Rate, approximately 0.25.
* **Bounce Rate:** Varies between 0.20 and 0.80, with a mean around 0.51.
* **Scroll Depth:** Shows a spread from 20.01 to nearly 80, with a mean of 50.32.
* **Age:** The age of users ranges from 18 to 65 years, with a mean age of about 41.5 years.
* **Session Duration:** This varies widely from 38 seconds to nearly 1800 seconds (30 minutes), with a mean session duration of approximately 925 seconds (about 15 minutes).





In [17]:
# grouping data by theme and calculating mean values for the metrics
theme_performance = df.groupby('Theme').agg({
    'Click Through Rate': 'mean',
    'Conversion Rate': 'mean',
    'Bounce Rate': 'mean',
    'Scroll_Depth': 'mean',
    'Age': 'mean',
    'Session_Duration': 'mean'
})
# sorting the data by conversion rate for a better comparison
theme_performance_sorted = theme_performance.sort_values(by='Conversion Rate', ascending=False)
display(theme_performance_sorted)

Unnamed: 0_level_0,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Session_Duration
Theme,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Light Theme,0.247109,0.255459,0.499035,50.735232,41.734568,930.833333
Dark Theme,0.264501,0.251282,0.512115,49.926404,41.332685,919.48249


* **Click Through Rate (CTR):** The Dark Theme has a slightly higher average CTR (0.2645) compared to the Light Theme (0.2471).
* **Conversion Rate:** The Light Theme leads with a marginally higher average Conversion Rate (0.2555) compared to the Dark Theme (0.2513).
* **Bounce Rate:** The Bounce Rate is slightly higher for the Dark Theme (0.5121) than for the Light Theme (0.4990).
* **Scroll Depth:** Users on the Light Theme scroll slightly further on average (50.74%) compared to those on the Dark Theme (49.93%).
* **Age:** The average age of users is similar across themes, with the Light Theme at approximately 41.73 years and the Dark Theme at 41.33 years.
* **Session Duration:** The average session duration is slightly longer for users on the Light Theme (930.83 seconds) than for those on the Dark Theme (919.48 seconds).

From these insights, it appears that the Light Theme slightly outperforms the Dark Theme in terms of Conversion Rate, Bounce Rate, Scroll Depth, and Session Duration, while the Dark Theme leads in Click Through Rate.

### **Hypothesis Testing**
* **Null Hypothesis (H$0$​):** There is no difference in Conversion Rates, Click Through Rate, Bounce Rate, Scroll_Depth between the Light Theme and Dark Theme.
* **Alternative Hypothesis (H$a$​):** There is a difference in Conversion Rates, Click Through Rate, Bounce Rate, Scroll_Depth between the Light Theme and Dark Theme.


In [21]:
# extracting conversion rates for both themes
conversion_rates_light = df[df['Theme'] == 'Light Theme']['Conversion Rate']
conversion_rates_dark = df[df['Theme'] == 'Dark Theme']['Conversion Rate']

# performing a two-sample t-test
t_stat, p_value = ttest_ind(conversion_rates_light, conversion_rates_dark, equal_var=False)

# extracting click through rates for both themes
ctr_light = df[df['Theme'] == 'Light Theme']['Click Through Rate']
ctr_dark = df[df['Theme'] == 'Dark Theme']['Click Through Rate']

# performing a two-sample t-test
t_stat_ctr, p_value_ctr = ttest_ind(ctr_light, ctr_dark, equal_var=False)

# extracting bounce rates for both themes
bounce_rates_light = df[df['Theme'] == 'Light Theme']['Bounce Rate']
bounce_rates_dark = df[df['Theme'] == 'Dark Theme']['Bounce Rate']

# performing a two-sample t-test for bounce rate
t_stat_bounce, p_value_bounce = ttest_ind(bounce_rates_light, bounce_rates_dark, equal_var=False)

# extracting scroll depths for both themes
scroll_depth_light = df[df['Theme'] == 'Light Theme']['Scroll_Depth']
scroll_depth_dark = df[df['Theme'] == 'Dark Theme']['Scroll_Depth']

# performing a two-sample t-test for scroll depth
t_stat_scroll, p_value_scroll = ttest_ind(scroll_depth_light, scroll_depth_dark, equal_var=False)

# creating a table for comparison
comparison_table = pd.DataFrame({
    'Metric': ['Click Through Rate', 'Conversion Rate', 'Bounce Rate', 'Scroll Depth'],
    'T-Statistic': [t_stat_ctr, t_stat, t_stat_bounce, t_stat_scroll],
    'P-Value': [p_value_ctr, p_value, p_value_bounce, p_value_scroll]
})

comparison_table

Unnamed: 0,Metric,T-Statistic,P-Value
0,Click Through Rate,-1.978171,0.048184
1,Conversion Rate,0.474849,0.634998
2,Bounce Rate,-1.201888,0.229692
3,Scroll Depth,0.756228,0.449692


Click Through Rate:
* There is a statistically significant difference in Click Through Rate between the two themes (p < 0.05). The negative t-statistic suggests that the Dark theme may have a higher Click Through Rate than the Light theme.

Conversion Rate:
* There is no statistically significant difference in Conversion Rate between the two themes (p > 0.05).
Bounce Rate:
* There is no statistically significant difference in Bounce Rate between the two themes (p > 0.05).
Scroll Depth:
* There is no statistically significant difference in Scroll Depth between the two themes (p > 0.05).

###  ` Summary : while the two themes perform similarly across most metrics, the Dark Theme has a slight edge in terms of engaging users to click through. For other key performance indicators like Conversion Rate, Bounce Rate, and Scroll Depth, the choice between a Light Theme and a Dark Theme does not significantly affect user behaviour according to the data provided. `