# Telco Customer Churn Growth Analytics
This notebook walks through:
1. **EDA** of the Telco Customer Churn dataset
2. **Cohort & Retention Analysis**
3. **Funnel Analysis**
4. **Simulated A/B Test**
5. **Churn Prediction Model**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, classification_report, confusion_matrix
from scipy.stats import proportions_ztest
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
sns.set(style='whitegrid')

In [None]:
# 1. Load Telco Customer Churn dataset
df = pd.read_csv('https://raw.githubusercontent.com/blastchar/telco-customer-churn/master/WA_Fn-UseC_-Telco-Customer-Churn.csv')
df.head()

## 1. Exploratory Data Analysis (EDA)
Load data, inspect shape, missing values, and basic distributions.

In [None]:
# Dataset shape and info
print('Shape:', df.shape)
df.info()

# Check for missing values
df.isnull().sum()

### 1.1 Churn Distribution and Feature Relationships

In [None]:
# Churn count and percentage
churn_counts = df['Churn'].value_counts()
print(churn_counts)
print('\nChurn Percentage:')
print(churn_counts / df.shape[0] * 100)

# Visualize churn by contract type
plt.figure(figsize=(8, 4))
sns.countplot(x='Contract', hue='Churn', data=df)
plt.title('Churn by Contract Type')
plt.show()

# Tenure distribution
plt.figure(figsize=(8, 4))
sns.histplot(df['tenure'], bins=30, kde=False)
plt.title('Distribution of Tenure (Months)')
plt.show()

## 2. Cohort & Retention Analysis
Define cohorts based on the month of signup (using 'tenure'), and track retention over time.

In [None]:
# Convert 'Churn' to numeric flag
df['Churn_flag'] = df['Churn'].map({'Yes': 1, 'No': 0})

# Simulate 'signup_date' assuming today's date is '2020-02-01' and using 'tenure' backwards
import datetime as dt
reference_date = pd.to_datetime('2020-02-01')
df['signup_date'] = reference_date - pd.to_timedelta(df['tenure'], unit='M')
# Note: This is an approximation for illustrative purposes

# Extract signup month
df['signup_month'] = df['signup_date'].dt.to_period('M')

# Compute retention: 'still_active' if Churn_flag == 0
cohort_data = df.groupby('signup_month').agg(
    total_customers=('customerID', 'count'),
    retained=('Churn_flag', lambda x: (x == 0).sum())
).reset_index()
cohort_data['retention_rate'] = cohort_data['retained'] / cohort_data['total_customers']

cohort_data

In [None]:
# Plot cohort retention rate
plt.figure(figsize=(10, 5))
plt.plot(cohort_data['signup_month'].astype(str), cohort_data['retention_rate'], marker='o')
plt.xticks(rotation=45)
plt.title('Monthly Cohort Retention Rate')
plt.xlabel('Signup Month')
plt.ylabel('Retention Rate')
plt.tight_layout()
plt.show()

## 3. Funnel Analysis
Simulate a user funnel:
1. Signed Up
2. Has Internet Service
3. Uses Streaming TV or Movies
4. Still Active
Compute stepwise conversion rates.

In [None]:
# Step 1: Total signed up (all customers)
total_signed = df.shape[0]

# Step 2: Has Internet Service != 'No'
has_internet = df[df['InternetService'] != 'No'].shape[0]

# Step 3: Uses StreamingTV or StreamingMovies
streaming = df[(df['StreamingTV'] == 'Yes') | (df['StreamingMovies'] == 'Yes')].shape[0]

# Step 4: Still active (Churn_flag == 0)
still_active = df[df['Churn_flag'] == 0].shape[0]

funnel = pd.DataFrame({
    'Step': [
        'Signed Up', 'Has Internet', 'Uses Streaming', 'Still Active'
    ],
    'Count': [
        total_signed, has_internet, streaming, still_active
    ]
})
funnel['Conversion'] = funnel['Count'] / total_signed
funnel

In [None]:
# Plot funnel conversion rates
plt.figure(figsize=(8, 5))
sns.barplot(x='Step', y='Conversion', data=funnel)
plt.ylim(0, 1)
plt.title('User Funnel Conversion Rates')
plt.ylabel('Conversion Rate')
plt.show()

## 4. Simulated A/B Test
Randomize 'monthly_contract' customers into Control (A) and Treatment (B). Simulate a retention campaign that reduces churn for Treatment.

In [None]:
# Filter month-to-month customers
mtm = df[df['Contract'] == 'Month-to-month'].copy()
mtm['group'] = np.random.choice(['A', 'B'], size=mtm.shape[0], replace=True)

# Baseline churn counts for each group
grouped = mtm.groupby('group')['Churn_flag'].agg(
    total='count',
    churned='sum'
).reset_index()
grouped['rate'] = grouped['churned'] / grouped['total']
grouped

In [None]:
# Simulate treatment effect: reduce churn to zero in group B (for demonstration)
mtm_effect = mtm.copy()
mtm_effect.loc[mtm_effect['group'] == 'B', 'Churn_flag'] = 0

# Recompute counts after treatment
grouped_effect = mtm_effect.groupby('group')['Churn_flag'].agg(
    total='count',
    churned='sum'
).reset_index()
grouped_effect['rate'] = grouped_effect['churned'] / grouped_effect['total']
grouped_effect

In [None]:
# Perform two-sample z-test on churn proportions
count = grouped_effect['churned'].values
nobs = grouped_effect['total'].values
stat, pval = proportions_ztest(count, nobs)
print(f'z-statistic: {stat:.3f}')
print(f'p-value: {pval:.4f}')

## 5. Churn Prediction Model
Train a Logistic Regression to predict churn, evaluate performance.

In [None]:
# Select features and encode where necessary
features = ['SeniorCitizen', 'tenure', 'MonthlyCharges']
X = df[features]
y = df['Churn_flag']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Logistic Regression
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)

# Predictions & evaluation
y_pred_proba = clf.predict_proba(X_test)[:, 1]
y_pred = clf.predict(X_test)
print('ROC AUC:', roc_auc_score(y_test, y_pred_proba))
print('\nClassification Report:\n', classification_report(y_test, y_pred))
print('\nConfusion Matrix:\n', confusion_matrix(y_test, y_pred))