In [1]:
import pandas as pd

In [2]:
df=pd.read_csv("E:\Python Projects\Hypothesis Testing\data.csv")
df.head()

Unnamed: 0,Theme,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Location,Session_Duration,Purchases,Added_to_Cart
0,Light Theme,0.05492,0.282367,0.405085,72.489458,25,Chennai,1535,No,Yes
1,Light Theme,0.113932,0.032973,0.732759,61.858568,19,Pune,303,No,Yes
2,Dark Theme,0.323352,0.178763,0.296543,45.737376,47,Chennai,563,Yes,Yes
3,Light Theme,0.485836,0.325225,0.245001,76.305298,58,Pune,385,Yes,No
4,Light Theme,0.034783,0.196766,0.7651,48.927407,25,New Delhi,1437,No,No


In [5]:
#dataset summary
summary={"Number of Records":df.shape[0],
        "Number of Columns": df.shape[1],
        "Null Values": df.isnull().sum(),
        "defination of data" :df.describe()}
summary

{'Number of Records': 1000,
 'Number of Columns': 10,
 'Null Values': Theme                 0
 Click Through Rate    0
 Conversion Rate       0
 Bounce Rate           0
 Scroll_Depth          0
 Age                   0
 Location              0
 Session_Duration      0
 Purchases             0
 Added_to_Cart         0
 dtype: int64,
 'defination of data':        Click Through Rate  Conversion Rate  Bounce Rate  Scroll_Depth  \
 count         1000.000000      1000.000000  1000.000000   1000.000000   
 mean             0.256048         0.253312     0.505758     50.319494   
 std              0.139265         0.139092     0.172195     16.895269   
 min              0.010767         0.010881     0.200720     20.011738   
 25%              0.140794         0.131564     0.353609     35.655167   
 50%              0.253715         0.252823     0.514049     51.130712   
 75%              0.370674         0.373040     0.648557     64.666258   
 max              0.499989         0.498916     0.79

In [13]:
# grouping data by theme and calculating mean values for the metrics
theme_performance = df.groupby('Theme').mean(numeric_only=True)

# sorting the data by conversion rate for a better comparison
theme_performance_sorted = theme_performance.sort_values(by='Conversion Rate', ascending=False)

print(theme_performance_sorted)

             Click Through Rate  Conversion Rate  Bounce Rate  Scroll_Depth  \
Theme                                                                         
Light Theme            0.247109         0.255459     0.499035     50.735232   
Dark Theme             0.264501         0.251282     0.512115     49.926404   

                   Age  Session_Duration  
Theme                                     
Light Theme  41.734568        930.833333  
Dark Theme   41.332685        919.482490  


Click Through Rate (CTR): The Dark Theme has a slightly higher average CTR (0.2645) compared to the Light Theme (0.2471).

Conversion Rate: The Light Theme leads with a marginally higher average Conversion Rate (0.2555) compared to the Dark Theme (0.2513).

Bounce Rate: The Bounce Rate is slightly higher for the Dark Theme (0.5121) than for the Light Theme (0.4990).

Scroll Depth: Users on the Light Theme scroll slightly further on average (50.74%) compared to those on the Dark Theme (49.93%).

Age: The average age of users is similar across themes, with the Light Theme at approximately 41.73 years and the Dark Theme at 41.33 years.

Session Duration: The average session duration is slightly longer for users on the Light Theme (930.83 seconds) than for those on the Dark Theme (919.48 seconds).

From these insights, it appears that the Light Theme slightly outperforms the Dark Theme in terms of Conversion Rate, Bounce Rate, Scroll Depth, and Session Duration, while the Dark Theme leads in Click Through Rate. However, the differences are relatively minor across all metrics.

# Getting Started with Hypothesis Testing

alpha value =  0.05 = It means we’ll consider a result statistically significant if the p-value from our test is less than 0.05

Let’s start with hypothesis testing based on the Conversion Rate between the Light Theme and Dark Theme

Null Hypothesis (H0​): There is no difference in Conversion Rates between the Light Theme and Dark Theme.


Alternative Hypothesis (Ha​): There is a difference in Conversion Rates between the Light Theme and Dark Theme.

# hypothesis testing based on the Conversion Rate (CTR)

In [16]:
# extracting conversion rates for both themes
conversion_rates_light=df[df["Theme"]=="Light Theme"]["Conversion Rate"]
conversion_rates_dark = df[df["Theme"]=="Dark Theme"]["Conversion Rate"]

In [23]:
# performing a two-sample t-test
from scipy.stats import ttest_ind

t_stat, p_value = ttest_ind(conversion_rates_light, conversion_rates_dark, equal_var=False)

t_stat, p_value

(0.4748494462782632, 0.6349982678451778)

The result of the two-sample t-test gives a p-value of approximately 0.635. Since this p-value is much greater than our significance level of 0.05, we do not have enough evidence to reject the null hypothesis. Therefore, we conclude that there is no statistically significant difference in Conversion Rates between the Light Theme and Dark Theme based on the data provided.

# hypothesis testing based on the Click Through Rate (CTR)

Null Hypothesis (H0​): There is no difference in Click Through Rates between the Light Theme and Dark Theme.

Alternative Hypothesis (Ha​): There is a difference in Click Rates between the Light Theme and Dark Theme.

In [20]:
ctr_light = df[df["Theme"]=="Light Theme"]["Click Through Rate"]
ctr_dark = df[df["Theme"]=="Dark Theme"]["Click Through Rate"]

In [22]:
t_stat_ctr, p_value_ctr = ttest_ind(ctr_light,ctr_dark,equal_var= False)
t_stat_ctr, p_value_ctr

(-1.9781708664172253, 0.04818435371010704)

The two-sample t-test for the Click Through Rate (CTR) between the Light Theme and Dark Theme yields a p-value of approximately 0.048. This p-value is slightly below our significance level of 0.05, indicating that there is a statistically significant difference in Click Through Rates between the Light Theme and Dark Theme, with the Dark Theme likely having a higher CTR given the direction of the test statistic.



# Hypothesis Testing based on two other metrics: bounce rate and scroll depth

In [25]:
# extracting bounce rates for both themes
bounce_rates_light = df[df["Theme"]=="Light Theme"]["Bounce Rate"]
bounce_rates_dark = df[df["Theme"]=="Dark Theme"]["Bounce Rate"]

In [27]:
# performing a two-sample t-test for bounce rate
t_stat_bounce,p_value_bounce = ttest_ind(bounce_rates_light,bounce_rates_dark,equal_var = False)
t_stat_bounce,p_value_bounce

(-1.2018883310494073, 0.229692077505148)

In [29]:
# extracting scroll depths for both themes
scroll_depth_light = df[df["Theme"]=="Light Theme"]["Scroll_Depth"]
scroll_depth_dark = df[df["Theme"]=="Dark Theme"]["Scroll_Depth"]

In [30]:
# performing a two-sample t-test for scroll depth
t_stat_scroll, p_value_scroll = ttest_ind(scroll_depth_light, scroll_depth_dark, equal_var=False)
t_stat_scroll, p_value_scroll

(0.7562277864140986, 0.4496919249484911)

In [31]:
# creating a table for comparison
comparison_table = pd.DataFrame({
    'Metric': ['Click Through Rate', 'Conversion Rate', 'Bounce Rate', 'Scroll Depth'],
    'T-Statistic': [t_stat_ctr, t_stat, t_stat_bounce, t_stat_scroll],
    'P-Value': [p_value_ctr, p_value, p_value_bounce, p_value_scroll]
})

comparison_table

Unnamed: 0,Metric,T-Statistic,P-Value
0,Click Through Rate,-1.978171,0.048184
1,Conversion Rate,0.474849,0.634998
2,Bounce Rate,-1.201888,0.229692
3,Scroll Depth,0.756228,0.449692


In summary, while the two themes perform similarly across most metrics, the Dark Theme has a slight edge in terms of engaging users to Click Through Rate. For other key performance indicators like Conversion Rate, Bounce Rate, and Scroll Depth, the choice between a Light Theme and a Dark Theme does not significantly affect user behaviour according to the data provided.