# Telecom Customer Churn Prediction – End-to-End Notebook

- Executive summary and business context
- Data loading/generation, preprocessing, and feature engineering
- Models: RandomForest, XGBoost, LogisticRegression with GridSearchCV
- Evaluation: classification report, confusion matrix, ROC-AUC
- Explainability: SHAP and ELI5 (feature importances, force plots, summaries)
- Customer segmentation and dashboards
- Save best model and metrics

In [None]:
# Ensure project root is on sys.path for local imports
import sys
from pathlib import Path
nb_cwd = Path.cwd()
proj_root = nb_cwd if (nb_cwd / 'src').exists() else nb_cwd.parent
if str(proj_root) not in sys.path:
    sys.path.insert(0, str(proj_root))
print('Using project root for imports:', proj_root)

In [None]:
# Setup
import os, json, warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from pathlib import Path
from typing import Tuple

from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, roc_auc_score, f1_score, accuracy_score, confusion_matrix, roc_curve
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

from xgboost import XGBClassifier
import shap, eli5
from eli5.sklearn import PermutationImportance
import joblib

# Local modules
from src.data_generation import get_dataset

BASE = Path('..').resolve().parent if Path('.').name == 'notebooks' else Path('.')
OUTPUTS = BASE / 'outputs'
OUTPUTS.mkdir(parents=True, exist_ok=True)
SEED = 42
np.random.seed(SEED)
print('Outputs dir:', OUTPUTS)

## Data loading / generation
Generate synthetic dataset (10k customers, 6 months).