In [None]:
from snowflake.snowpark.context import get_active_session

In [None]:
import pandas as pd

In [None]:
session = get_active_session()

In [None]:
query = "SELECT * FROM DBT_DB.DBT_SCHEMA.FACT_AB_TEST_DATA"
df = session.sql(query).to_pandas()

df.head()

In [None]:
df.info()
df.describe()

In [None]:
df.isnull().sum()

A/B testing typically has:
- Null Hypothesis (H₀): No significant difference between Group A and Group B.
- Alternative Hypothesis (H₁): A significant difference exists.


- H₀: The mean Total Paid Amount is the same for Group A and Group B.
- H₁: The mean Total Paid Amount is different between Group A and Group B.

In [None]:
# Check the count of observations in each test group
df["TEST_GROUP"].value_counts()

In [None]:
from scipy.stats import shapiro

# Perform normality test for both groups
stat_A, p_A = shapiro(df[df["TEST_GROUP"] == "A"]["TOTAL_PAID_AMOUNT"])
stat_B, p_B = shapiro(df[df["TEST_GROUP"] == "B"]["TOTAL_PAID_AMOUNT"])

print(f"Group A - p-value: {p_A}")
print(f"Group B - p-value: {p_B}")

In [None]:
from scipy.stats import mannwhitneyu

# Extract values for each group
group_A = df[df["TEST_GROUP"] == "A"]["TOTAL_PAID_AMOUNT"]
group_B = df[df["TEST_GROUP"] == "B"]["TOTAL_PAID_AMOUNT"]

# Perform Mann-Whitney U test
stat, p_value = mannwhitneyu(group_A, group_B, alternative="two-sided")

print(f"Mann-Whitney U test statistic: {stat}")
print(f"P-value: {p_value}")

### What does this mean?
- Since p-value > 0.05, we fail to reject the null hypothesis (H₀).
- This means there is no statistically significant difference between the Total Paid Amount for Group A and Group B.

### Conclusion

Based on this test, the intervention, change tested in Group B did not significantly impact the Total Paid Amount compared to Group A.


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt



In [None]:
plt.figure(figsize=(8,5))
sns.boxplot(x="TEST_GROUP", y="TOTAL_PAID_AMOUNT", data=df)
plt.title("Total Paid Amount Distribution by Test Group")
plt.xlabel("Test Group")
plt.ylabel("Total Paid Amount")
plt.show()

In [None]:
plt.figure(figsize=(8,5))
sns.histplot(df[df["TEST_GROUP"]=="A"]["TOTAL_PAID_AMOUNT"], color="blue", label="Group A", kde=True, bins=30)
sns.histplot(df[df["TEST_GROUP"]=="B"]["TOTAL_PAID_AMOUNT"], color="red", label="Group B", kde=True, bins=30)
plt.title("Total Paid Amount Distribution")
plt.xlabel("Total Paid Amount")
plt.ylabel("Frequency")
plt.legend()
plt.show()

## A/B Test Summary

### Objective

The A/B test aimed to determine whether there was a significant difference in Total Paid Amount between Test Group A and Test Group B.

### Hypotheses
-	Null Hypothesis (H₀): There is no significant difference in the mean Total Paid Amount between Group A and Group B.
-	Alternative Hypothesis (H₁): There is a significant difference in the mean Total Paid Amount between Group A and Group B.

### Key Steps
1.	Data Exploration & Cleaning
  -	Checked for missing values (None found).
  -	Verified the distribution of samples across the test groups (Group A: 106, Group B: 113).
2.	Normality Test (Shapiro-Wilk)
  - Both groups had p-values < 0.05, indicating that the data is not normally distributed.
  -	This justified the use of a non-parametric test.
3.	Statistical Testing (Mann-Whitney U Test)
  - p-value = 0.709, meaning no significant difference between the two groups in terms of Total Paid Amount.
4.	Data Visualization
  - Boxplot confirmed the presence of outliers.
  - Histogram with KDE plot showed skewed distributions in both groups.

Conclusion
- Since p > 0.05, we fail to reject the null hypothesis.
- The results suggest that there is no statistically significant difference in Total Paid Amount between Group A and Group B.
- Effect size was not calculated due to package limitations, but based on the p-value, there is no strong evidence of an impact.