# Customer Churn Prediction (AWS Mini-Project)

Sections: Data Loading → EDA → Feature Engineering → Baseline Models → Improved Model → Evaluation → Threshold Selection → Feature Importance → (Optional) AWS SageMaker Deploy → Conclusions

Links:
- Project brief: https://github.com/springboard-curriculum/mec2-projects/blob/main/Student_MLE_MiniProject_Churn_Prediction_AWS.md
- AWS reference: https://aws.amazon.com/blogs/machine-learning/build-tune-and-deploy-an-end-to-end-churn-prediction-model-using-amazon-sagemaker-pipelines/


In [None]:
# Data Loading
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

pd.set_option('display.max_columns', 200)

# TODO: set your dataset path
CSV_PATH = Path('data/churn.csv')
if CSV_PATH.exists():
    df = pd.read_csv(CSV_PATH)
else:
    # fallback tiny demo
    df = pd.DataFrame({
        'customer_id': range(1, 11),
        'tenure_months': [1,3,6,12,24,36,48,60,72,84],
        'monthly_spend': [35,40,38,45,50,60,65,70,80,90],
        'churn': [1,0,1,0,0,0,1,0,0,0],
        'plan': ['basic','basic','plus','plus','pro','pro','pro','plus','pro','basic']
    })

df.head()


In [None]:
# EDA — churn rate & class balance
assert 'churn' in df.columns, "Dataset must include 'churn' column (0/1)."
churn_rate = df['churn'].mean()
print(f"Churn rate: {churn_rate:.3f}")

# Class counts plot
ax = df['churn'].value_counts().sort_index().plot(kind='bar', color=['#4daf4a','#e41a1c'])
ax.set_xticklabels(['No churn (0)','Churn (1)'], rotation=0)
ax.set_title('Class Balance')
plt.show()

# Basic stats
display(df.describe(include='all'))
