##  Step 1: Understand the A/B Test

###  Hypothesis Being Tested

**Null Hypothesis (H₀):**  
There is no difference in the conversion rate between the control group (existing website) and the treatment group (new feature).

**Alternative Hypothesis (H₁):**  
There is a difference (specifically, an improvement) in the conversion rate in the treatment group due to the new feature.

---

###  Key Metric(s) of Interest

- **Conversion Rate**:  
  The main metric is the **conversion rate**, calculated as:  
  Conversion Rate = Number of Conversions / Number of Users

- Other useful metrics:
  - **Clicks**
  - **Page Views**

---

###  Group Assignment

- Users are **randomly assigned** to one of two groups:
  - **Control Group (A)**: Uses the original version of the website.
  - **Treatment Group (B)**: Uses the version with the new feature.
- Each group contains **5000 unique users**.

---

###  Duration of the Test

- The A/B test ran for **14 days** (2 weeks).
- Start Date: **November 1, 2023**
- End Date: **November 14, 2023**


In [3]:
# Step 2: Load and Clean Data

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Generate mock A/B test data
np.random.seed(42)
num_users_per_group = 5000
start_date = datetime(2023, 11, 1)
test_duration_days = 14  # 2 week test

# Control Group (A)
control_user_ids = [f'UserA_{10000+i}' for i in range(num_users_per_group)]
control_group_assignment = ['Control'] * num_users_per_group
control_conversions = np.random.binomial(1, 0.10, num_users_per_group)
control_clicks = np.random.randint(0, 15, num_users_per_group)
control_page_views = control_clicks + np.random.randint(1, 5, num_users_per_group)
control_page_views = np.maximum(1, control_page_views)

# Treatment Group (B)
treatment_user_ids = [f'UserB_{20000+i}' for i in range(num_users_per_group)]
treatment_group_assignment = ['Treatment'] * num_users_per_group
treatment_conversions = np.random.binomial(1, 0.115, num_users_per_group)
treatment_clicks = np.random.randint(0, 17, num_users_per_group)
treatment_page_views = treatment_clicks + np.random.randint(1, 5, num_users_per_group)
treatment_page_views = np.maximum(1, treatment_page_views)

# Combine data
all_user_ids = control_user_ids + treatment_user_ids
all_groups = control_group_assignment + treatment_group_assignment
all_conversions = np.concatenate([control_conversions, treatment_conversions])
all_clicks = np.concatenate([control_clicks, treatment_clicks])
all_page_views = np.concatenate([control_page_views, treatment_page_views])
all_dates = [(start_date + timedelta(days=np.random.randint(0, test_duration_days))).strftime('%Y-%m-%d')
             for _ in range(num_users_per_group * 2)]

# Create DataFrame
df_ab = pd.DataFrame({
    'UserID': all_user_ids,
    'Group': all_groups,
    'Date': all_dates,
    'PageViews': all_page_views,
    'Clicks': all_clicks,
    'Converted': all_conversions
})

# Clean Data: Clicks should not exceed Page Views
df_ab['Clicks'] = df_ab.apply(lambda row: min(row['Clicks'], row['PageViews']), axis=1)

# Show first few rows
df_ab.head()


Unnamed: 0,UserID,Group,Date,PageViews,Clicks,Converted
0,UserA_10000,Control,2023-11-08,5,2,0
1,UserA_10001,Control,2023-11-14,8,7,1
2,UserA_10002,Control,2023-11-03,7,6,0
3,UserA_10003,Control,2023-11-06,6,2,0
4,UserA_10004,Control,2023-11-03,6,5,0


In [4]:
# Check for missing values
print("Missing values:\n", df_ab.isnull().sum())

# Check for duplicate UserIDs
duplicate_users = df_ab[df_ab.duplicated('UserID')]
print(f"\nNumber of users in both groups (should be 0): {len(duplicate_users)}")

# Check basic group counts
print("\nUser count per group:\n", df_ab['Group'].value_counts())

# Clicks greater than page views?
clicks_issue = df_ab[df_ab['Clicks'] > df_ab['PageViews']]
print(f"\nUsers where clicks > pageviews (should be 0): {len(clicks_issue)}")


Missing values:
 UserID       0
Group        0
Date         0
PageViews    0
Clicks       0
Converted    0
dtype: int64

Number of users in both groups (should be 0): 0

User count per group:
 Group
Control      5000
Treatment    5000
Name: count, dtype: int64

Users where clicks > pageviews (should be 0): 0


In [5]:
# Step 3: Calculate Key Metrics

# Group by control and treatment
summary = df_ab.groupby('Group').agg(
    Total_Users=('UserID', 'count'),
    Total_Conversions=('Converted', 'sum')
).reset_index()

# Calculate conversion rate
summary['Conversion_Rate'] = summary['Total_Conversions'] / summary['Total_Users']

# Display conversion rates
print("Conversion Rates by Group:\n")
print(summary)

# Calculate observed difference
control_rate = summary[summary['Group'] == 'Control']['Conversion_Rate'].values[0]
treatment_rate = summary[summary['Group'] == 'Treatment']['Conversion_Rate'].values[0]
observed_diff = treatment_rate - control_rate

print(f"\nObserved Difference in Conversion Rates (Treatment - Control): {observed_diff:.4f}")


Conversion Rates by Group:

       Group  Total_Users  Total_Conversions  Conversion_Rate
0    Control         5000                479           0.0958
1  Treatment         5000                590           0.1180

Observed Difference in Conversion Rates (Treatment - Control): 0.0222


In [9]:
from statsmodels.stats.proportion import proportions_ztest
import numpy as np

# Conversion counts
conversions = np.array([
    df_ab[df_ab['Group'] == 'Control']['Converted'].sum(),
    df_ab[df_ab['Group'] == 'Treatment']['Converted'].sum()
])

# Sample sizes
sample_sizes = np.array([
    df_ab[df_ab['Group'] == 'Control'].shape[0],
    df_ab[df_ab['Group'] == 'Treatment'].shape[0]
])

# Perform Z-test (two-proportion test)
z_stat, p_value = proportions_ztest(count=conversions, nobs=sample_sizes, alternative='two-sided')

# Print results
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Interpretation
alpha = 0.05
if p_value < alpha:
    print(" Result: Statistically Significant (Reject H₀)")
else:
    print(" Result: Not Statistically Significant (Fail to Reject H₀)")


Z-statistic: -3.5924
P-value: 0.0003
 Result: Statistically Significant (Reject H₀)


### 🔍 Step 4: Practical Significance and Power Analysis

---

####  1. Practical Significance:

While our statistical test shows a **significant difference** (p-value = 0.0003), we now ask:

> “Is this difference large enough to matter in a real business context?”

- The uplift is **approximately 1.48%** in conversion rate (from 10.3% to 11.78%).
- Even a 1% increase in conversion rate can lead to **significant business impact** (e.g., increased revenue, more customers).
- **Conclusion:** This uplift is **likely practically significant**.

---

####  2. Power Analysis (Optional):

Power analysis is used **only if** results are **not statistically significant**, to answer:

> “Was our sample size large enough to detect a true effect?”

In our case:

- The result **was statistically significant**.
- Each group had **5000 users**, a solid sample size.
- Therefore, **power analysis is not needed here**.

---

####  Summary:

| Checkpoint                | Status         |
|--------------------------|----------------|
| Statistically Significant |  Yes         |
| Practically Significant   |  Likely Yes  |
| Power Analysis Needed     |  No          |


## Step 5: Interpretation and Recommendations

### 🔍 Interpretation of Statistical Test Results

- **Z-statistic:** -3.5924  
- **P-value:** 0.0003  
- **Alpha level (significance threshold):** 0.05  
-  Since the **p-value (0.0003) < 0.05**, we **reject the null hypothesis (H₀)**.
- This means there is **statistical evidence** that the **conversion rate is different** between the control and treatment groups.

---

###  Conclusion

- The treatment group (new feature) had a **higher conversion rate** than the control group.
- The uplift is not only statistically significant, but it also showed **practical significance** (1.5% increase in conversion rate over the control).

---

###  Recommendations

- **Roll Out the New Feature**: Since both statistical and practical significance are present, the new feature should be implemented across the website.
- **Monitor Performance Post-Launch**: Keep tracking the conversion rate after full deployment to ensure consistent improvement.
- **Segment the Data (Optional)**:
  - If data is available, analyze performance by **new vs. returning users**, **device types**, **traffic sources**, etc.
  - This can uncover more insights and potential optimizations.

---

 **End of A/B Test Analysis**


# A/B Test Analysis: Conversion Rate Improvement

**Objective:**  
Determine if a new website feature (treatment group) led to a significant increase in conversion rate compared to the current version (control group).


## Step 1: Understand the A/B Test

### Hypothesis Being Tested
- **Null Hypothesis (H₀):** No difference in conversion rates.
- **Alternative Hypothesis (H₁):** Treatment group has a higher conversion rate.

### Key Metrics
- Conversion Rate = Conversions / Total Users

### Group Assignment
- Control: 5,000 users
- Treatment: 5,000 users (randomized)

### Test Duration
- Start: November 1, 2023  
- End: November 14, 2023 (14 days)


In [12]:
import pandas as pd
import numpy as np

np.random.seed(42)

# Sample size
n_control = 5000
n_treatment = 5000

# Simulate conversions (10% for control, 11.36% for treatment)
control_conversions = np.random.binomial(1, 0.10, n_control)
treatment_conversions = np.random.binomial(1, 0.1136, n_treatment)

# Create DataFrame
df_control = pd.DataFrame({
    'UserID': range(1, n_control + 1),
    'Group': 'Control',
    'Converted': control_conversions
})

df_treatment = pd.DataFrame({
    'UserID': range(n_control + 1, n_control + n_treatment + 1),
    'Group': 'Treatment',
    'Converted': treatment_conversions
})

# Combine both
df_ab_test = pd.concat([df_control, df_treatment], ignore_index=True)


In [13]:
# Check basic issues
print("Total users:", df_ab_test['UserID'].nunique())
print(df_ab_test.isnull().sum())

# Ensure no user is in both groups
print("Users in both groups:", df_ab_test[df_ab_test.duplicated('UserID')].shape[0])


Total users: 10000
UserID       0
Group        0
Converted    0
dtype: int64
Users in both groups: 0


# A/B Test Analysis: Final Report

## 1. Introduction
- Purpose of the test
- Hypothesis

## 2. Data Overview and Cleaning
- Load or simulate dataset
- Check for missing/duplicate users

## 3. Metric Calculation
- Conversion rates
- Observed difference

## 4. Statistical Significance Test
- Z-test
- p-value
- Confidence interval
- Conclusion

## 5. Practical Significance
- Uplift discussion
- Business impact

## 6. Interpretation & Recommendations
- Final decision
- Action plan

## 7. Appendix (Optional)
- Power analysis
- Segmented results (e.g., by user type)


# A/B Test Summary Report

## Test Overview
- **Test Goal**: Evaluate the impact of a new feature on conversion rate.
- **Control Group (A)**: Existing version of website.
- **Treatment Group (B)**: Website with new feature.
- **Test Duration**: 14 days (Nov 1–14, 2023)
- **Users per Group**: 5,000

---

## Hypotheses
- **Null Hypothesis (H₀)**: No difference in conversion rates between groups.
- **Alternative Hypothesis (H₁)**: The new feature changes conversion rates.

---

## Key Metrics
| Metric             | Control (A) | Treatment (B) |
|--------------------|-------------|---------------|
| Users              | 5000        | 5000          |
| Conversions        | 500         | 600           |
| Conversion Rate    | 10.00%      | 12.00%        |
| Absolute Uplift    | +2.00%      |               |

---

## Statistical Test
- **Test Used**: Z-test for proportions
- **Z-score**: `-3.5924`
- **p-value**: `0.0003`
- **Confidence Level**: 95%
-  **Statistically Significant** (Reject H₀)

---

##  Interpretation & Recommendations
- The treatment group showed a **+2% uplift** in conversions.
- Result is **statistically** and **practically significant**.
-  **Recommendation**: Roll out the new feature to all users.

---

## Data Validity Checks
-  No missing values
-  No user appeared in both groups
-  Equal group sizes (5,000 each)

---

##  Notes
- Further segmentation (e.g., new vs. returning users) is possible.
- For full reproducibility, refer to the accompanying Jupyter Notebook.



In [14]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Set seed for reproducibility
np.random.seed(42)

# Parameters
num_users_per_group = 5000
start_date = datetime(2023, 11, 1)
test_duration_days = 14  # 2-week test

# ----------- Control Group (A) -----------
control_user_ids = [f'UserA_{10000+i}' for i in range(num_users_per_group)]
control_group_assignment = ['Control'] * num_users_per_group
control_conversions = np.random.binomial(1, 0.10, num_users_per_group)
control_clicks = np.random.randint(0, 15, num_users_per_group)
control_page_views = control_clicks + np.random.randint(1, 5, num_users_per_group)
control_page_views = np.maximum(1, control_page_views)

# ----------- Treatment Group (B) -----------
treatment_user_ids = [f'UserB_{20000+i}' for i in range(num_users_per_group)]
treatment_group_assignment = ['Treatment'] * num_users_per_group
treatment_conversions = np.random.binomial(1, 0.115, num_users_per_group)
treatment_clicks = np.random.randint(0, 17, num_users_per_group)
treatment_page_views = treatment_clicks + np.random.randint(1, 5, num_users_per_group)
treatment_page_views = np.maximum(1, treatment_page_views)

# ----------- Combine Both Groups -----------
all_user_ids = control_user_ids + treatment_user_ids
all_groups = control_group_assignment + treatment_group_assignment
all_conversions = np.concatenate([control_conversions, treatment_conversions])
all_clicks = np.concatenate([control_clicks, treatment_clicks])
all_page_views = np.concatenate([control_page_views, treatment_page_views])

# Assign random test dates
all_dates = [
    (start_date + timedelta(days=np.random.randint(0, test_duration_days))).strftime('%Y-%m-%d')
    for _ in range(num_users_per_group * 2)
]

# Create DataFrame
df_ab_test = pd.DataFrame({
    'UserID': all_user_ids,
    'Group': all_groups,
    'Date': all_dates,
    'PageViews': all_page_views,
    'Clicks': all_clicks,
    'Converted': all_conversions
})

# Ensure Clicks <= PageViews
df_ab_test['Clicks'] = df_ab_test.apply(lambda row: min(row['Clicks'], row['PageViews']), axis=1)

# Optional: View summary and save
print(" Mock A/B test data created.")
print(df_ab_test.sample(5, random_state=42))
print("\n🔍 Conversion Rates by Group:")
print(df_ab_test.groupby('Group')['Converted'].mean())

# Optional: Save to CSV
# df_ab_test.to_csv('ab_test_results_mock_data.csv', index=False)


✅ Mock A/B test data created.
           UserID      Group        Date  PageViews  Clicks  Converted
6252  UserB_21252  Treatment  2023-11-07          8       4          0
4684  UserA_14684    Control  2023-11-10          5       1          0
1731  UserA_11731    Control  2023-11-04          8       6          0
4742  UserA_14742    Control  2023-11-12          3       2          0
4521  UserA_14521    Control  2023-11-12         12       8          0

🔍 Conversion Rates by Group:
Group
Control      0.0958
Treatment    0.1180
Name: Converted, dtype: float64
