## Problem Statement

An e-commerce company is evaluating two different website designs to see which one results in higher customer engagement. Design A is the current design, while Design B incorporates new features aimed at improving user experience. The company hypothesizes that Design B will lead to a higher average time spent on the website by users.

**Datasets:**
- current_design.csv: Contains data for user interactions with the current website design (Design A), with columns user_id and time_spent_minutes.
- new_design.csv: Contains data for user interactions with the new website design (Design B), with columns user_id and time_spent_minutes.

**Objective:**
- To determine whether Design B results in a higher average time spent on the website compared to Design A.

**Steps to perform:**
- Set the null and alternate hypothesis for this analysis.
- Load the datasets current_design.csv and new_design.csv.
- Calculate the mean and standard deviation of the time spent for both designs.
- Determine the sizes of both groups.
- Calculate the z-score to compare the means of both groups.
- Set the significance level (alpha) at 5% for a right-tailed test.
- Calculate the critical z-value for the right-tailed test at the 5% significance level.
- Compare the calculated z-score with the critical z-value to decide whether to reject the null hypothesis.
- Write down your observations in the end.

**Import Necessary Libraries**

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as st

**Define hypothesis**



1. Null -> Design A and B has same time. 
2. Alternate -> Design B results in a higher average time spent on the website compared to Design A.

**1: Load the datasets**

In [2]:
control = pd.read_csv('current_design.csv')
test = pd.read_csv('new_design.csv')

In [3]:
control.head()


Unnamed: 0,user_id,time_spent_minutes
0,C001,5.93
1,C002,5.21
2,C003,5.07
3,C004,5.06
4,C005,6.33


**2: Calculate the mean and standard deviation of the time spent for both designs.**

In [5]:
#control statistics

control_mean = control.time_spent_minutes.mean().round(2)
control_std = control.time_spent_minutes.std().round(2)
control_size = control.shape[0]

control_mean, control_std, control_size

(np.float64(6.02), np.float64(0.62), 100)

In [6]:
#test statistics

test_mean = test.time_spent_minutes.mean().round(2)
test_std = test.time_spent_minutes.std().round(2)
test_size = test.shape[0]

test_mean, test_std, test_size

(np.float64(8.06), np.float64(0.9), 100)

**3: Test using rejection region (i.e. critical z value)**

In [7]:
a = control_std**2/control_size
b = test_std**2/test_size

In [16]:
z_score = (control_mean - test_mean)/np.sqrt(a+b)
z_score = abs(z_score).round(2)
z_score

np.float64(18.67)

In [17]:
alpha = 0.05

In [18]:
critical_zscore = st.norm.ppf(1 - alpha).round(2)
critical_zscore

np.float64(1.64)

In [20]:
print(z_score > critical_zscore)

True


### Observations and Conclusion



we can reject the Null hypotheses and say Design B results in a higher average time spent on the website compared to Design A.

In [22]:
p_value = 1 - st.norm.cdf(z_score)

In [23]:
p_value

np.float64(0.0)

In [24]:
print(p_value < alpha)

True
