| library_name | scikit-learn | ||||
|---|---|---|---|---|---|
| tags |
|
||||
| datasets |
|
||||
| language |
|
||||
| metrics |
|
||||
| base_model | sklearn-pipeline-random-forest |
This model is a trained Scikit-learn pipeline designed to predict whether a telecom customer is likely to churn based on account, service, and billing attributes.
This model acts as a churn-risk scoring engine for retention workflows. It combines preprocessing (imputation, scaling, one-hot encoding) and classification in a single serialized pipeline artifact for consistent training and inference behavior.
- Developed by: Aashir Hameed
- Model type: Scikit-learn Tabular Classification Pipeline
- Language(s): English (
en) for feature labels/documentation - License: Apache 2.0
- Trained from: Telco customer churn tabular dataset
- Repository: GitHub: aashir92/Customer_Churn_Prediction
- Model: Hugging Face: Aashir92/Customer-Churn-Prediction
- Demo: Hugging Face Spaces Live UI
This model is intended for churn risk scoring in:
- CRM prioritization and retention campaigns
- Proactive outreach workflows for high-risk customers
- Batch scoring of customer cohorts
Binary output mapping:
0: No Churn1: Churn
This model is not intended for:
- Causal inference on churn drivers
- Fairness-critical automated decisions without human review
- Data distributions that significantly differ from the Telco training data
Like all supervised models, this pipeline may reflect historical biases and collection artifacts present in source data. Prediction confidence can degrade under distribution shift (for example new plans, pricing structures, or service bundles not represented in training data). The model should be monitored for drift and recalibrated/retrained on a schedule.
Use the code below for inference with joblib:
from pathlib import Path
import joblib
import pandas as pd
model = joblib.load(Path("churn_model_v1.pkl"))
sample = pd.DataFrame(
[
{
"gender": "Female",
"SeniorCitizen": "0",
"Partner": "Yes",
"Dependents": "No",
"tenure": 12,
"PhoneService": "Yes",
"MultipleLines": "No",
"InternetService": "Fiber optic",
"OnlineSecurity": "No",
"OnlineBackup": "Yes",
"DeviceProtection": "No",
"TechSupport": "No",
"StreamingTV": "Yes",
"StreamingMovies": "Yes",
"Contract": "Month-to-month",
"PaperlessBilling": "Yes",
"PaymentMethod": "Electronic check",
"MonthlyCharges": 89.1,
"TotalCharges": 1069.2,
}
]
)
prediction = model.predict(sample)[0]
probability = model.predict_proba(sample)[0][1]
print(prediction, probability)The model was trained on WA_Fn-UseC_-Telco-Customer-Churn.csv with the standard churn target column (Churn).
- Dropped non-predictive
customerID - Coerced
TotalChargesto numeric and removed rows with invalid target/critical numeric values - Numeric preprocessing: median imputation + standard scaling
- Categorical preprocessing: most-frequent imputation + one-hot encoding (
handle_unknown='ignore')
- Validation: Stratified K-Fold cross-validation (
n_splits=5) - Model search:
GridSearchCVwith scoring =f1 - Candidates: Logistic Regression and Random Forest
- Winning model: Random Forest
- Best params (winner):
class_weight=balancedmax_depth=8min_samples_leaf=4min_samples_split=2n_estimators=200
Held-out split from the Telco dataset with stratified train/test partitioning.
- Accuracy
- F1-score
- Final Test Accuracy: 75.05%
- Final Test F1-Score: 62.38%
- Best CV F1-score: 63.96%
Carbon emissions can be estimated using the Machine Learning Impact calculator.
- Hardware Type: Standard local CPU training environment
- Training profile: Classical ML grid-search over two model families
Aashir Hameed
- 🌐 Website: aashir92.github.io
- 💼 LinkedIn: Aashir Hameed
- 🐙 GitHub: @aashir92