## Problem Statement

An e-commerce company is evaluating two different website designs to see which one results in higher customer engagement. Design A is the current design, while Design B incorporates new features aimed at improving user experience. The company hypothesizes that Design B will lead to a higher average time spent on the website by users.

**Datasets:**
- current_design.csv: Contains data for user interactions with the current website design (Design A), with columns user_id and time_spent_minutes.
- new_design.csv: Contains data for user interactions with the new website design (Design B), with columns user_id and time_spent_minutes.

**Objective:**
- To determine whether Design B results in a higher average time spent on the website compared to Design A. 

**Steps to perform:**
- Set the null and alternate hypothesis for this analysis.
- Load the datasets current_design.csv and new_design.csv.
- Calculate the mean and standard deviation of the time spent for both designs.
- Determine the sizes of both groups.
- Calculate the z-score to compare the means of both groups.
- Set the significance level (alpha) at 5% for a right-tailed test.
- Calculate the critical z-value for the right-tailed test at the 5% significance level.
- Compare the calculated z-score with the critical z-value to decide whether to reject the null hypothesis.
- Write down your observations in the end.

**Import Necessary Libraries**

In [1]:
 import pandas as pd 
import numpy as np 
import seaborn as sns 
from matplotlib import pyplot as plt 
from scipy import stats as st 

**Define hypothesis**



In [12]:
#H0: design_A_mean has higher average time than design_B_mean
#Ha: design_B_mean has higher average time than design_A_mean 

**1: Load the datasets**

In [8]:
design_A_group = pd.read_csv("current_design.csv")
design_A_group.head()

Unnamed: 0,user_id,time_spent_minutes
0,C001,5.93
1,C002,5.21
2,C003,5.07
3,C004,5.06
4,C005,6.33


In [9]:
design_B_group = pd.read_csv("new_design.csv")
design_B_group.head()

Unnamed: 0,user_id,time_spent_minutes
0,T001,7.49
1,T002,7.37
2,T003,7.32
3,T004,6.85
4,T005,7.1


**2: Calculate the mean and standard deviation of the time spent for both designs.**

In [25]:
#control statistics
design_A_mean = design_A_group.time_spent_minutes.mean()
design_A_std = design_A_group.time_spent_minutes.std()
design_A_size = design_A_group.shape[0]

design_A_mean,design_A_std,design_A_size


(6.015199999999998, 0.6182550877553322, 100)

In [26]:
#test statistics
design_B_mean = design_B_group.time_spent_minutes.mean()
design_B_std = design_B_group.time_spent_minutes.std()
design_B_size = design_B_group.shape [0]

design_B_mean,design_B_std, design_B_size

(8.062599999999998, 0.9025257711981236, 100)

**3: Test using rejection region (i.e. critical z value)**

In [27]:
a = (design_A_std**2/design_A_size)
b = (design_B_std**2/design_B_size)

z_score= (design_B_mean-design_A_mean)/np.sqrt(a+b)

a,b,z_score

(0.0038223935353535353, 0.008145527676767678, 18.715151117476786)

In [22]:
alpha = 0.05 

z_critical_value = st.norm.ppf(1-0.05)

z_critical_value 

1.6448536269514722

In [23]:
z_score > z_critical_value

True

### Observations and Conclusion



Z_score is greater than z_Critical value hence we reject the null hypothesis

It means that Design B results in a higher average time spent on the website compared to Design A.