## Problem Statement

An e-commerce company is evaluating two different website designs to see which one results in higher customer engagement. Design A is the current design, while Design B incorporates new features aimed at improving user experience. The company hypothesizes that Design B will lead to a higher average time spent on the website by users.

**Datasets:**
- current_design.csv: Contains data for user interactions with the current website design (Design A), with columns user_id and time_spent_minutes.
- new_design.csv: Contains data for user interactions with the new website design (Design B), with columns user_id and time_spent_minutes.

**Objective:**
- To determine whether Design B results in a higher average time spent on the website compared to Design A.

### null hypothesis: The Average time will be same for both the websites current and new.
### alternate hypothesis: Design B will lead to a higher average time spent on the website by users.

**Steps to perform:**
- Set the null and alternate hypothesis for this analysis.
- Load the datasets current_design.csv and new_design.csv.
- Calculate the mean and standard deviation of the time spent for both designs.
- Determine the sizes of both groups.
- Calculate the z-score to compare the means of both groups.
- Set the significance level (alpha) at 5% for a right-tailed test.
- Calculate the critical z-value for the right-tailed test at the 5% significance level.
- Compare the calculated z-score with the critical z-value to decide whether to reject the null hypothesis.
- Write down your observations in the end.

**Import Necessary Libraries**

### Current Website

In [1]:
import pandas as pd

df_current = pd.read_csv("current_design.csv")
df_current.head()

Unnamed: 0,user_id,time_spent_minutes
0,C001,5.93
1,C002,5.21
2,C003,5.07
3,C004,5.06
4,C005,6.33


In [8]:
current_SIZE = df_current.shape[0]
current_SIZE

100

In [3]:
df_current_mean = df_current.time_spent_minutes.mean()
df_current_std = df_current.time_spent_minutes.std()

df_current_mean, df_current_std

(6.015199999999998, 0.6182550877553322)

### New Website

In [4]:
df_new = pd.read_csv("new_design.csv")
df_new.head()

Unnamed: 0,user_id,time_spent_minutes
0,T001,7.49
1,T002,7.37
2,T003,7.32
3,T004,6.85
4,T005,7.1


In [5]:
new_SIZE = df_new.shape[0]
new_SIZE

100

In [6]:
df_new_mean = df_new.time_spent_minutes.mean()
df_new_std = df_new.time_spent_minutes.std()

df_new_mean, df_new_std

(8.062599999999998, 0.9025257711981236)

**Define hypothesis**



**1: Load the datasets**

**2: Calculate the mean and standard deviation of the time spent for both designs.**

In [9]:
#control statistics

control_variance = df_current_std**2 / current_SIZE
control_variance

0.0038223935353535353

In [12]:
#test statistics

test_variance = df_new_std**2 / new_SIZE
test_variance

0.008145527676767678

In [None]:
# finding z_score for collective distribution

In [18]:
collective_variance = control_variance + test_variance
collective_mean = df_new_mean - df_current_mean 

collective_mean, collective_variance

(2.0473999999999997, 0.011967921212121212)

In [15]:
# calcualting collective z-score

In [20]:
import numpy as np

population_mean = 0

z_score = (collective_mean - population_mean) / np.sqrt(collective_variance)
z_score

18.715151117476786

In [None]:
# finding z_critical and campare with z_score

In [21]:
from scipy import stats as st

# For a significance level of 5% (0.05) in a right-tailed test, the critical Z-value is approximately 1.645

alpha = 0.05 # significance level of 5%

z_crit = st.norm.ppf(1 - alpha)
z_crit

1.644853626951472

**3: Test using rejection region (i.e. critical z value)**

In [22]:
z_score > z_crit

True

### Observations and Conclusion

