## Problem Statement

An e-commerce company is evaluating two different website designs to see which one results in higher customer engagement. Design A is the current design, while Design B incorporates new features aimed at improving user experience. The company hypothesizes that Design B will lead to a higher average time spent on the website by users.

**Datasets:**
- current_design.csv: Contains data for user interactions with the current website design (Design A), with columns user_id and time_spent_minutes.
- new_design.csv: Contains data for user interactions with the new website design (Design B), with columns user_id and time_spent_minutes.

**Objective:**
- To determine whether Design B results in a higher average time spent on the website compared to Design A.

**Steps to perform:**
- Set the null and alternate hypothesis for this analysis.
- Load the datasets current_design.csv and new_design.csv.
- Calculate the mean and standard deviation of the time spent for both designs.
- Determine the sizes of both groups.
- Calculate the z-score to compare the means of both groups.
- Set the significance level (alpha) at 5% for a right-tailed test.
- Calculate the critical z-value for the right-tailed test at the 5% significance level.
- Compare the calculated z-score with the critical z-value to decide whether to reject the null hypothesis.
- Write down your observations in the end.

**Import Necessary Libraries**

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

**Define hypothesis**



#### Null hypothesis(Ho): Design B doesn't lead to higher average time spent than Design A.
#### Alternate Hypothesis(Ha): Design B's average time spent is more than Design A.

**1: Load the datasets**

In [2]:
current_design = pd.read_csv('current_design.csv')
print(current_design.shape)
current_design.head()

(100, 2)


Unnamed: 0,user_id,time_spent_minutes
0,C001,5.93
1,C002,5.21
2,C003,5.07
3,C004,5.06
4,C005,6.33


In [3]:
new_design = pd.read_csv('new_design.csv')
print(new_design.shape)
new_design.head()

(100, 2)


Unnamed: 0,user_id,time_spent_minutes
0,T001,7.49
1,T002,7.37
2,T003,7.32
3,T004,6.85
4,T005,7.1


**2: Calculate the mean and standard deviation of the time spent for both designs.**

In [7]:
#control statistics
control_mean=current_design.time_spent_minutes.mean()
print("Control Mean: ",control_mean)
control_std=current_design.time_spent_minutes.std()
print("Control Std Dev: ",control_std)
control_size=current_design.shape[0]
print("Control Size: ",control_size)

Control Mean:  6.015199999999998
Control Std Dev:  0.6182550877553322
Control Size:  100


In [8]:
#test statistics
test_mean=new_design.time_spent_minutes.mean()
print("Control Mean: ",test_mean)
test_std=new_design.time_spent_minutes.std()
print("Test Std Dev: ",test_std)
test_size=new_design.shape[0]
print("Test Size: ",test_size)


Control Mean:  8.062599999999998
Test Std Dev:  0.9025257711981236
Test Size:  100


**3: Test using rejection region (i.e. critical z value)**

In [11]:
a=control_std**2/control_size
b=test_std**2/test_size
z_score=abs((control_mean-test_mean)/np.sqrt(a+b))
print("Z Score: ",z_score)

Z Score:  18.715151117476786


In [16]:
alpha=0.05
z_critical=stats.norm.ppf(1-alpha)
z_critical

1.644853626951472

In [15]:
z_score>z_critical

True

### Observations and Conclusion



#### Since, The Z score is greater than Z critical we have a strong evidence to reject the null hypothesis. And we can conclude that Design B has a higher average time spent compared to Design A.