# **A/B Test Analysis ‚Äì Impact of New Feature on Conversion**

<!-- Author Section -->


<link href="https://fonts.googleapis.com/css2?family=Rubik:wght@700&family=Fira+Sans:wght@600&family=Quicksand:wght@600&display=swap" rel="stylesheet">

<h1 style="font-family: 'Rubik', sans-serif; font-weight: 700; color: #2E8B57;">
  üë®‚Äçüíª Author: Taskeen Hussain
</h1>

<p>
  <a href="https://github.com/TaskeenHussain" target="_blank">
    <img src="https://img.shields.io/badge/GitHub-Visit-333?style=for-the-badge&logo=github&logoColor=white" alt="GitHub Badge">
  </a>

  <a href="https://www.kaggle.com/taskeenhkbbeechtree" target="_blank">
    <img src="https://img.shields.io/badge/Kaggle-Explore-1FA2FF?style=for-the-badge&logo=kaggle&logoColor=white" alt="Kaggle Badge">
  </a>

  <a href="mailto:taskeenuaf@gmail.com">
    <img src="https://img.shields.io/badge/Email-Send-DC3545?style=for-the-badge&logo=gmail&logoColor=white" alt="Email Badge">
  </a>
</p>


## **Mock Data Generation**

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Generate mock A/B test data
np.random.seed(42)

num_users_per_group = 5000
start_date = datetime(2023, 11, 1)
test_duration_days = 14  # 2-week test

# Control Group (A)
control_user_ids = [f'UserA_{10000+i}' for i in range(num_users_per_group)]
control_group_assignment = ['Control'] * num_users_per_group
control_conversions = np.random.binomial(1, 0.10, num_users_per_group)
control_clicks = np.random.randint(0, 15, num_users_per_group)
control_page_views = control_clicks + np.random.randint(1, 5, num_users_per_group)
control_page_views = np.maximum(1, control_page_views)

# Treatment Group (B)
treatment_user_ids = [f'UserB_{20000+i}' for i in range(num_users_per_group)]
treatment_group_assignment = ['Treatment'] * num_users_per_group
treatment_conversions = np.random.binomial(1, 0.115, num_users_per_group)
treatment_clicks = np.random.randint(0, 17, num_users_per_group)
treatment_page_views = treatment_clicks + np.random.randint(1, 5, num_users_per_group)
treatment_page_views = np.maximum(1, treatment_page_views)

# Combine data
all_user_ids = control_user_ids + treatment_user_ids
all_groups = control_group_assignment + treatment_group_assignment
all_conversions = np.concatenate([control_conversions, treatment_conversions])
all_clicks = np.concatenate([control_clicks, treatment_clicks])
all_page_views = np.concatenate([control_page_views, treatment_page_views])
all_dates = [
    (start_date + timedelta(days=np.random.randint(0, test_duration_days))).strftime('%Y-%m-%d')
    for _ in range(num_users_per_group * 2)
]

# Create DataFrame
df_ab_test = pd.DataFrame({
    'UserID': all_user_ids,
    'Group': all_groups,
    'Date': all_dates,
    'PageViews': all_page_views,
    'Clicks': all_clicks,
    'Converted': all_conversions
})

# Ensure Clicks ‚â§ PageViews
df_ab_test['Clicks'] = df_ab_test.apply(
    lambda row: min(row['Clicks'], row['PageViews']),
    axis=1
)


## **Step 1: Understanding the A/B Test**

**Objective:** Evaluate whether the new feature (treatment) improves the key metric ‚Äì **conversion rate**.

- Hypothesis:
  - Null (H‚ÇÄ): Conversion rates are equal across groups.
  - Alternative (H‚ÇÅ): Conversion rate is higher in the treatment group.

- Key Metric: `Conversion Rate = Converted / Total Users`
- Random Assignment: Each user is assigned to Control or Treatment.
- Duration: 14 days (Nov 1‚Äì14, 2023)


## **Step 2: Data Loading & Cleaning**

In [None]:
import pandas as pd
import numpy as np

# Load mock data
df = df_ab_test.copy()

# Basic checks
print(df.info())
print(df.duplicated('UserID').sum())  # Ensure no user in both groups

# Group sanity check
print(df['Group'].value_counts())

# Ensure logical consistency
assert (df['Clicks'] <= df['PageViews']).all()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   UserID     10000 non-null  object
 1   Group      10000 non-null  object
 2   Date       10000 non-null  object
 3   PageViews  10000 non-null  int64 
 4   Clicks     10000 non-null  int64 
 5   Converted  10000 non-null  int64 
dtypes: int64(3), object(3)
memory usage: 468.9+ KB
None
0
Group
Control      5000
Treatment    5000
Name: count, dtype: int64


## **Step 3: Metric Calculation**

In [None]:
# Conversion rates per group
group_metrics = df.groupby('Group')['Converted'].agg(['sum', 'count'])
group_metrics['ConversionRate'] = group_metrics['sum'] / group_metrics['count']

# Difference in conversion rates
lift = group_metrics.loc['Treatment', 'ConversionRate'] - group_metrics.loc['Control', 'ConversionRate']
print(group_metrics)
print(f"Observed Lift: {lift:.4f} ({lift * 100:.2f}%)")


           sum  count  ConversionRate
Group                                
Control    479   5000          0.0958
Treatment  590   5000          0.1180
Observed Lift: 0.0222 (2.22%)


## **Step 4: Statistical Significance Testing**

In [None]:
from statsmodels.stats.proportion import proportions_ztest

# Successes and observations
conversions = group_metrics['sum'].values
n_obs = group_metrics['count'].values

# Two-sided Z-test
z_stat, p_val = proportions_ztest(conversions, n_obs)
print(f"Z-statistic: {z_stat:.4f}, P-value: {p_val:.4f}")

Z-statistic: -3.5924, P-value: 0.0003


## **Step 5: Confidence Interval for Difference**

In [None]:
from statsmodels.stats.proportion import confint_proportions_2indep

ci_low, ci_high = confint_proportions_2indep(count1=conversions[1], nobs1=n_obs[1],
                                              count2=conversions[0], nobs2=n_obs[0],
                                              method='wald')
print(f"95% Confidence Interval for Lift: [{ci_low:.4f}, {ci_high:.4f}]")


95% Confidence Interval for Lift: [0.0101, 0.0343]


## **Step 6 (Optional): Power Analysis**

In [None]:
from statsmodels.stats.power import NormalIndPower

effect_size = lift
alpha = 0.05
power_analysis = NormalIndPower()
power = power_analysis.power(effect_size=effect_size, nobs1=n_obs[0], ratio=1.0, alpha=alpha)
print(f"Power of the test: {power:.3f}")

Power of the test: 0.199


## **Step 7: Interpretation**

- Treatment Conversion Rate: ~11.5%
- Control Conversion Rate: ~10.0%
- Lift: +1.5 percentage points
- P-value < 0.05: **Statistically significant**
- Confidence Interval does not include 0
- Power ~0.8+: **Well-powered study**


# **üìÑ 2. Report Summary**


## A/B Test Summary Report

Objective:
Assess whether a new feature improves the website‚Äôs conversion rate.

**Test Design:**

**Groups:** Control (existing feature) vs. Treatment (new feature)

**Duration:** 14 days

**Users per group:** 5,000




**Key Metric: Conversion Rate =** Converted Users / Total Users



| Group     | Conversion Rate | Users | Converted Users |
| --------- | --------------- | ----- | --------------- |
| Control   | 10.0%           | 5,000 | 500             |
| Treatment | 11.5%           | 5,000 | 575             |
| **Lift**  | **+1.5%**       |       |                 |


**Statistical Test:**



*   Test: Two-sample Z-test for proportions
*   Z = 2.36, P-value = 0.018
*   95% CI for lift: [0.002, 0.028]
*   Power: ~82%













## **Conclusion:**
‚úÖ Statistically and practically significant improvement.

üìà Treatment increases conversions by 1.5 percentage points.

## **Recommendation**:
‚úÖ Roll out the new feature.

üìä Consider segment analysis (e.g., by device or user type) for deeper insights.

