# A/B Testing Project

In [5]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import missingno as msno

In [6]:
Control = pd.read_csv('/Users/noah/Desktop/Work on Resume/A:B Testing/archive (3)/control_group.csv', delimiter= ";")
test = pd.read_csv('/Users/noah/Desktop/Work on Resume/A:B Testing/archive (3)/test_group.csv', delimiter= ";")

# EDA

In [None]:
# Check the dimensions of the dataset (number of rows and columns)
print(f"Control Group Shape: {Control.shape}")
print(f"Test Group Shape: {test.shape}")

# Display the first few rows of the dataset
print("Control Group Sample:")
print(Control.head())

print("Test Group Sample:")
print(test.head())


In [None]:
# Check for missing values in both the control and test datasets
print("Control Group Missing Values:")
print(Control.isnull().sum())

print("Test Group Missing Values:")
print(test.isnull().sum())

# Visualize missing values using missingno
msno.matrix(Control)
msno.matrix(test)

In [None]:
# Drop rows with missing values
Control.dropna(inplace=True)

# Summary statistics for control group
print("Control Group Summary:")
print(Control.describe())

# Summary statistics for test group
print("Test Group Summary:")
print(test.describe())

# Evaluating  Metrics

In [None]:
# Summary statistics for all key metrics
metrics = ['# of Impressions', '# of Website Clicks', '# of Searches', '# of View Content', '# of Add to Cart', '# of Purchase']

# Display summary statistics for control and test groups
print("Control Group Summary for Key Metrics:")
print(Control[metrics].describe())

print("\nTest Group Summary for Key Metrics:")
print(test[metrics].describe())


In [None]:
# Visualize metrics comparison for each key metric
for metric in metrics:
    plt.figure(figsize=(10,6))
    sns.histplot(Control[metric], color='blue', label='Control Group', kde=True)
    sns.histplot(test[metric], color='red', label='Test Group', kde=True)
    plt.legend()
    plt.title(f'{metric} - Control vs Test')
    plt.show()

In [None]:
# Visualize metrics comparison for each key metric
for metric in metrics:
    plt.figure(figsize=(10,6))
    sns.boxplot(data=[Control[metric], test[metric]])
    plt.xticks([0, 2], ['Control Group', 'Test Group'])  # Set x-axis labels for the groups
    plt.title(f'{metric} - Control vs Test')
    plt.show()

In [27]:
# Perform t-tests for each key metric
for metric in metrics:
    t_stat, p_value = stats.ttest_ind(Control[metric], test[metric])
    print(f"T-test for {metric}:")
    print(f"T-statistic: {t_stat}, P-value: {p_value}")
    
    # Interpret the result
    if p_value < 0.05:
        print(f"Reject the null hypothesis: There is a significant difference in {metric} between the groups.\n")
    else:
        print(f"Fail to reject the null hypothesis: There is no significant difference in {metric} between the groups.\n")


T-test for # of Impressions:
T-statistic: 4.884544325740239, P-value: 8.774394114329264e-06
Reject the null hypothesis: There is a significant difference in # of Impressions between the groups.

T-test for # of Website Clicks:
T-statistic: -1.576909404840952, P-value: 0.12035072366063823
Fail to reject the null hypothesis: There is no significant difference in # of Website Clicks between the groups.

T-test for # of Searches:
T-statistic: -1.1373340684043094, P-value: 0.26015715752487034
Fail to reject the null hypothesis: There is no significant difference in # of Searches between the groups.

T-test for # of View Content:
T-statistic: 0.47615455602474466, P-value: 0.6357843704297139
Fail to reject the null hypothesis: There is no significant difference in # of View Content between the groups.

T-test for # of Add to Cart:
T-statistic: 4.24906420944249, P-value: 8.032960071149043e-05
Reject the null hypothesis: There is a significant difference in # of Add to Cart between the groups.


In [28]:
# Calculate Cohen's d for effect size for each significant metric
for metric in metrics:
    mean_control = Control[metric].mean()
    mean_test = test[metric].mean()
    std_control = Control[metric].std()
    std_test = test[metric].std()
    
    cohens_d = (mean_test - mean_control) / ((std_control + std_test) / 2)
    print(f"Cohen's d for {metric}: {cohens_d}")


Cohen's d for # of Impressions: -1.299935454973749
Cohen's d for # of Website Clicks: 0.4105904871209517
Cohen's d for # of Searches: 0.31503240638487723
Cohen's d for # of View Content: -0.12477180753825634
Cohen's d for # of Add to Cart: -1.1084589843445156
Cohen's d for # of Purchase: -0.007876107565613272


# A/B Testing Results Summary

## Statistical Significance

### 1. **# of Impressions**
- **T-statistic**: 4.88
- **P-value**: 8.77e-06
- **Conclusion**: I found a **significant difference** between the control and test groups. This suggests that the changes I made in the test design had a noticeable impact on the number of impressions.

### 2. **# of Website Clicks**
- **T-statistic**: -1.58
- **P-value**: 0.12
- **Conclusion**: There was **no significant difference** in website clicks. The test group did not outperform the control group, indicating that the changes didn’t lead to a meaningful increase in clicks.

### 3. **# of Searches**
- **T-statistic**: -1.14
- **P-value**: 0.26
- **Conclusion**: There was **no significant difference** in the number of searches. The test group didn’t significantly affect how many users performed searches on the website.

### 4. **# of View Content**
- **T-statistic**: 0.48
- **P-value**: 0.64
- **Conclusion**: I found **no significant difference** in the number of content views. The test group did not lead to a substantial change in content interaction.

### 5. **# of Add to Cart**
- **T-statistic**: 4.25
- **P-value**: 8.03e-05
- **Conclusion**: There was a **significant difference** in the number of items added to the cart. This suggests that the test design had a positive impact on user engagement in terms of cart additions.

### 6. **# of Purchases**
- **T-statistic**: 0.03
- **P-value**: 0.98
- **Conclusion**: There was **no significant difference** in the number of purchases. The changes made to the test group did not result in a higher conversion rate.

---

## Effect Size (Cohen’s d)

### 1. **# of Impressions**
- **Cohen’s d**: -1.30 (Large effect)
  - Even though the control group had higher impressions, the difference is practically meaningful, indicating that the changes in the test group were impactful.

### 2. **# of Website Clicks**
- **Cohen’s d**: 0.41 (Medium effect)
  - There was a moderate difference in website clicks between the two groups, but it wasn’t large enough to reach statistical significance.

### 3. **# of Searches**
- **Cohen’s d**: 0.32 (Small to medium effect)
  - A small practical effect, suggesting the test didn’t influence searches in any major way.

### 4. **# of View Content**
- **Cohen’s d**: -0.12 (Very small effect)
  - The difference is negligible, indicating that the changes did not significantly affect the number of content views.

### 5. **# of Add to Cart**
- **Cohen’s d**: -1.11 (Large effect)
  - A substantial effect, indicating that the test design had a strong influence on the number of items users added to their cart.

### 6. **# of Purchases**
- **Cohen’s d**: -0.01 (Very small effect)
  - There was no meaningful practical difference in purchases, confirming that the changes did not lead to more conversions.

---

## Key Insights

1. **Significant Differences**:
   - I saw clear improvements in **Impressions** and **Add to Cart** for the test group, suggesting that the changes made had a noticeable impact in these areas. These could be valuable metrics to explore further.

2. **No Significant Differences**:
   - **Website Clicks**, **Searches**, **View Content**, and **Purchases** didn’t show significant differences, meaning the test group didn’t affect these behaviors in a meaningful way.

3. **Effect Size**:
   - Even though some metrics didn’t reach statistical significance (e.g., **Website Clicks**, **Purchases**), the **effect size** suggests that some of the differences could still be practically meaningful, especially in **Add to Cart**.

---

## Recommendations

1. **For Improving Impressions and Add to Cart**:
   - Since I saw significant differences in **Impressions** and **Add to Cart**, it would make sense to scale these changes. These metrics indicate that the test group had a positive impact, which could lead to better performance.

2. **For Website Clicks and Purchases**:
   - Given that **Website Clicks** and **Purchases** didn’t show significant improvements, I could consider testing other changes in design, UI, or features to improve conversion rates.
