# Hypothesis testing

### **1. Hypothesis: Completion Rate**

**Objective:** Determine if the difference in completion rates between the Test and Control groups is statistically significant.

**Hypotheses:**

•	**Null Hypothesis (H0):** There is no significant difference in completion rates between the Test and Control groups.

•	**Alternative Hypothesis (H1):** The completion rate is significantly higher in the Test group compared to the Control group.

In [10]:
# import libraries

import pandas as pd
from scipy.stats import chi2_contingency

# import data

test_data = pd.read_csv('/Users/alexandreribeiro/Desktop/Ironhacks Booty/5th week/Project/Datasets/df_test_random.csv')

control_data = pd.read_csv('/Users/alexandreribeiro/Desktop/Ironhacks Booty/5th week/Project/Datasets/df_control_random.csv')


### Explanation:

We chose the Chi-Square test for the following reasons:

1.	Categorical Data: The completion rate is a categorical variable, as it indicates whether a user completed the process (yes/no).

2.	Comparison of Proportions: The Chi-Square test is suitable for comparing the proportions of categorical outcomes between two groups (Test and Control).

In [11]:
# Calculate completion rates

completion_rate_test = (test_data['process_step'] == 'confirm').mean()
completion_rate_control = (control_data['process_step'] == 'confirm').mean()

print(f"Test Group Completion Rate: {completion_rate_test * 100}")
print(f"Control Group Completion Rate: {completion_rate_control * 100}")

# Create a contingency table for the Chi-Square test

contingency_table = pd.crosstab(pd.Series(['Test']*len(test_data) + ['Control']*len(control_data)), 
                                pd.Series((test_data['process_step'] == 'confirm').tolist() + 
                                          (control_data['process_step'] == 'confirm').tolist()))

contingency_table

Test Group Completion Rate: 14.358
Control Group Completion Rate: 12.272


col_0,False,True
row_0,Unnamed: 1_level_1,Unnamed: 2_level_1
Control,87728,12272
Test,85642,14358


In [12]:
# Perform the Chi-Square test

chi2, p, _, _ = chi2_contingency(contingency_table) #the 3rd and 4th outputs are the expected frequencies and degrees of freedom

print(f"Chi-Square Statistic: {chi2}, P-value: {p}")

Chi-Square Statistic: 188.32023986260222, P-value: 7.395963406364342e-43


### Conclusions:

1.	Chi-Square Statistic:

The Chi-Square statistic of 188.32 is quite high, indicating a significant difference between the observed and expected frequencies of completion rates in the Test and Control groups.

2.	P-value:

The extremely low p-value (7.39e-43) is much smaller than the conventional significance level of 0.05. This means we reject the null hypothesis with high confidence.

Conclusion:

- Significant Difference: There is a statistically significant difference in completion rates between the Test and Control groups.
		
- Higher Completion Rate in Test Group: Given the significantly lower p-value, we can conclude that the new design (Test group) has a significantly higher completion rate compared to the old design (Control group).

Summary of Hypothesis 1: Completion Rate

Objective: Determine if the difference in completion rates between the Test and Control groups is statistically significant.

Hypotheses:

- Null Hypothesis (H0): There is no significant difference in completion rates between the Test and Control groups.
- Alternative Hypothesis (H1): The completion rate is significantly higher in the Test group compared to the Control group.

Results:

- Test Group Completion Rate: 0.14225 (14.225%)
- Control Group Completion Rate: 0.12303 (12.303%)
- Chi-Square Statistic: 188.32
- P-value: 7.39e-43

Conclusion:

- Reject H0: The null hypothesis is rejected.

- Significant Increase: The completion rate in the Test group is significantly higher than in the Control group.

### **2. Hypothesis: Completion Rate with Cost-Effectiveness Threshold**

**Objective:** Ensure that the observed increase in completion rate from the A/B test meets or exceeds a 5% threshold for cost-effectiveness.

**Hypotheses:**

•	**Null Hypothesis (H0):** The increase in completion rate is less than 5%.

•	**Alternative Hypothesis (H1):** The increase in completion rate is at least 5%.

**Threshold:** Minimum increase of 5% in completion rate to justify the costs of the new design.