# Week 2 Day 5 Assignment: Customer Retention

**Dataset:** `storedata_total1.csv`

Tasks:
- Analyze customer retention vs non‑retention
- Conduct a t‑test to check statistical significance
- Report findings in plain English

In [None]:
import pandas as pd
import numpy as np
from scipy import stats

# Load dataset
path = "storedata_total1.csv"
df = pd.read_csv(path)

df.head()

In [None]:
# Basic overview
print(df.shape)
print(df.columns)

# Retention counts
retention_counts = df["retained"].value_counts(dropna=False)
retention_counts

In [None]:
# Split groups
retained = df[df["retained"] == 1]
not_retained = df[df["retained"] == 0]

# Compare average order value (avgorder)
retained_avg = retained["avgorder"].dropna()
not_retained_avg = not_retained["avgorder"].dropna()

retained_avg.mean(), not_retained_avg.mean()

In [None]:
# Welch's t-test for avgorder
avg_ttest = stats.ttest_ind(retained_avg, not_retained_avg, equal_var=False, nan_policy="omit")

avg_ttest

In [None]:
# Optional: compare order frequency (ordfreq)
retained_freq = retained["ordfreq"].dropna()
not_retained_freq = not_retained["ordfreq"].dropna()

freq_ttest = stats.ttest_ind(retained_freq, not_retained_freq, equal_var=False, nan_policy="omit")

retained_freq.mean(), not_retained_freq.mean(), freq_ttest

## Findings (Plain English)

- We compared **retained** vs **not retained** customers using a t‑test.
- The test checks whether the average values (like `avgorder` or `ordfreq`) are statistically different between the two groups.
- If the p‑value is below 0.05, we can say the difference is statistically significant.
- Use the output above to state the final conclusion (significant or not) for each metric.