### 1. Project Overview


**Objective**: Compare Average Order Value (AOV) between standard checkout (control) and optimized checkout (treatment)

**Key Metrics**:
- Primary: Average Order Value (continuous)
- Secondary: Conversion Rate (optional)

**Hypotheses**:
- H_0: mu_control = mu_treatment (no difference in AOV)
- H_1: mu_control < = mu_treatment (significant difference exists)

**Test Design**:
- Random assignment (50/50 split)
- Minimum sample: 1000 per group
- Duration: 14 days

### 2. Data Preparation

In [3]:
import pandas as pd
import numpy as np

#set random seed for reproducibility
np.random.seed(42)

#generate synthetic data
num_samples = 2000 # 1000 per group

#control group (standard checkout)
control_aov =np.random.normal(loc=90,scale=22,size=num_samples//2)
control_aov=np.clip(control_aov,10,200) #realistic bounds for order values

#treatment group(optimized checkout) - slightly higher mean
treatment_aov =np.random.normal(loc=90,scale=22,size=num_samples//2)
treatment_aov= np.clip(treatment_aov,10,200)

#create Dataframe
data = {
    'user_id':range(1,num_samples+1),
    'group':['control']*(num_samples//2)+['treatment']*(num_samples//2), #Creates a list with 1000 'control' strings + 1000 'treatment' strings
    'order_value':np.concatenate([control_aov,treatment_aov]) #Merges the two arrays First controlled 1000 values and treatment Next 1000 values
}

df = pd.DataFrame(data)

#add same missing values and zero-dollar orders to make it realistic
df.loc[df.sample(frac=0.02).index,'order_value'] = np.nan #2% missing
df.loc[df.sample(frac=0.01).index,'order_value']= 0 #1% zero-dollar orders

#save to  csv
df.to_csv('checkout_ab_test_data.csv',index=False)

print("Dataset generated and saved as 'checkout_ab_test_data.csv'")
print(df.head())
print(f"\nGroup distribution: \n {df['group'].value_counts()}")

Dataset generated and saved as 'checkout_ab_test_data.csv'
   user_id    group  order_value
0        1  control   100.927711
1        2  control    86.958185
2        3  control   104.249148
3        4  control   123.506657
4        5  control    84.848626

Group distribution: 
 group
control      1000
treatment    1000
Name: count, dtype: int64


### 3. Exploratory Data Analysis(EDA)

In [5]:
#total missing value
missing_per_column=df.isna().sum()
print('Missing values per column:')
print(missing_per_column)


Missing values per column:
user_id         0
group           0
order_value    40
dtype: int64


In [8]:
#check percentage missing
print(df.isna().mean().round(4)*100)

user_id        0.0
group          0.0
order_value    2.0
dtype: float64


In [10]:
#drop rows missing only in 'order_value'
df_clean=df.dropna(subset=['order_value'])

print(f"Original: {len(df)} rows  | Cleaned: {len(df_clean)} rows")

Original: 2000 rows  | Cleaned: 1960 rows


In [12]:
# Summary statistics
print(df_clean.groupby('group')['order_value'].describe())

           count       mean        std  min        25%        50%         75%  \
group                                                                           
control    982.0  89.122119  23.496614  0.0  75.022512  90.170244  103.923675   
treatment  978.0  90.650873  23.371924  0.0  76.247486  90.792956  105.873536   

                  max  
group                  
control    174.760093  
treatment  160.248366  


### 4 Statistical Analysis

> **Hypothesis Setup**

-  h_0: control sample variance = treatment sample variance
-  h_1: control sample variance != treatment sample variance

In [16]:
#levene's test for variance equality
from scipy.stats import levene

control = df_clean[df_clean['group'] == 'control']['order_value']
control_clean= control >0

treatment=df_clean[df_clean['group'] == 'treatment']['order_value']
treatment_clean= treatment>0


#perform levene's test
levene_stat, levene_p = levene(control_clean,treatment_clean)

print(f"Levene's test statistic: {levene_stat:.4f}")
print(f"p-value: {levene_p:.4f}")

#Interpretation
alpha = 0.05
if levene_p > alpha:
    print("Fail to reject null hypothesis - variances are equal")
else:
    print("Reject null hypothesis - variances are unequal")

Levene's test statistic: 0.7914
p-value: 0.3738
Fail to reject null hypothesis - variances are equal


In [18]:
from scipy import stats

# Welch's t-test
t_stat, p_value = stats.ttest_ind(treatment_clean, control_clean, equal_var=False,alternative="greater")

# Effect size (Cohen's d)
def cohens_d(treatment, control):
    diff = treatment.mean() - control.mean()
    pooled_std = np.sqrt((treatment.std()**2 + control.std()**2)/2)
    return diff / pooled_std

d = cohens_d(treatment_clean, control_clean)

# Results
print(f"Control AOV: ${control.mean():.2f}")
print(f"Treatment AOV: ${treatment.mean():.2f}")
print(f"Absolute difference: ${treatment.mean()-control.mean():.2f}")
print(f"Relative difference: {(treatment.mean()-control.mean())/control.mean()*100:.1f}%")
print(f"\nWelch's t-test p-value: {p_value:.4f}")
print(f"Cohen's d effect size: {d:.3f}")

Control AOV: $89.12
Treatment AOV: $90.65
Absolute difference: $1.53
Relative difference: 1.7%

Welch's t-test p-value: 0.1868
Cohen's d effect size: 0.040


#### Interpretations

- **p-value (0.1868) > 0.05**
  > - The observed difference is not statistically significant
  > - We cannot reject the null hypothesis (that both checkouts perform equally)
  > - There's a 18.68% probability of seeing at least a $1.53 difference by random chance alone, even if the optimization had no real effect.

- **Cohen's d (0.040)**
  - 0.2 = Small effect
  - 0.5 = Medium effect
  - 0.8 = Large effect
    > 0.040 cohen's d indicates a negligible effect size

- **Business Impact**
  - While the treatment shows a nominal 1.7% lift, this tiny effect:
    > - Could disappear in future tests (regression to mean)
    > - May not justify implementation costs
    > - Is smaller than typical daily/weekly fluctuations

- **Conclusion**
> There is no difference  between **Controlled:Continuous Standard checkout and Treatment: Optimized checkout flow** of E-commerce Average Order Value.
  