Model Card for Customer Churn Prediction Pipeline

library_name

scikit-learn

Model Card for Customer Churn Prediction Pipeline

This model is a trained Scikit-learn pipeline designed to predict whether a telecom customer is likely to churn based on account, service, and billing attributes.

Model Details

Model Description

This model acts as a churn-risk scoring engine for retention workflows. It combines preprocessing (imputation, scaling, one-hot encoding) and classification in a single serialized pipeline artifact for consistent training and inference behavior.

Developed by: Aashir Hameed
Model type: Scikit-learn Tabular Classification Pipeline
Language(s): English (en) for feature labels/documentation
License: Apache 2.0
Trained from: Telco customer churn tabular dataset

Model Sources

Repository: GitHub: aashir92/Customer_Churn_Prediction
Model: Hugging Face: Aashir92/Customer-Churn-Prediction
Demo: Hugging Face Spaces Live UI

Uses

Direct Use

This model is intended for churn risk scoring in:

CRM prioritization and retention campaigns
Proactive outreach workflows for high-risk customers
Batch scoring of customer cohorts

Binary output mapping:

0: No Churn
1: Churn

Out-of-Scope Use

This model is not intended for:

Causal inference on churn drivers
Fairness-critical automated decisions without human review
Data distributions that significantly differ from the Telco training data

Bias, Risks, and Limitations

Like all supervised models, this pipeline may reflect historical biases and collection artifacts present in source data. Prediction confidence can degrade under distribution shift (for example new plans, pricing structures, or service bundles not represented in training data). The model should be monitored for drift and recalibrated/retrained on a schedule.

How to Get Started with the Model

Use the code below for inference with joblib:

from pathlib import Path
import joblib
import pandas as pd

model = joblib.load(Path("churn_model_v1.pkl"))

sample = pd.DataFrame(
    [
        {
            "gender": "Female",
            "SeniorCitizen": "0",
            "Partner": "Yes",
            "Dependents": "No",
            "tenure": 12,
            "PhoneService": "Yes",
            "MultipleLines": "No",
            "InternetService": "Fiber optic",
            "OnlineSecurity": "No",
            "OnlineBackup": "Yes",
            "DeviceProtection": "No",
            "TechSupport": "No",
            "StreamingTV": "Yes",
            "StreamingMovies": "Yes",
            "Contract": "Month-to-month",
            "PaperlessBilling": "Yes",
            "PaymentMethod": "Electronic check",
            "MonthlyCharges": 89.1,
            "TotalCharges": 1069.2,
        }
    ]
)

prediction = model.predict(sample)[0]
probability = model.predict_proba(sample)[0][1]
print(prediction, probability)

Training Details

Training Data

The model was trained on WA_Fn-UseC_-Telco-Customer-Churn.csv with the standard churn target column (Churn).

Training Procedure

Preprocessing

Dropped non-predictive customerID
Coerced TotalCharges to numeric and removed rows with invalid target/critical numeric values
Numeric preprocessing: median imputation + standard scaling
Categorical preprocessing: most-frequent imputation + one-hot encoding (handle_unknown='ignore')

Training Hyperparameters

Validation: Stratified K-Fold cross-validation (n_splits=5)
Model search: GridSearchCV with scoring = f1
Candidates: Logistic Regression and Random Forest
Winning model: Random Forest
Best params (winner):
- class_weight=balanced
- max_depth=8
- min_samples_leaf=4
- min_samples_split=2
- n_estimators=200

Evaluation

Testing Data, Factors & Metrics

Testing Data

Held-out split from the Telco dataset with stratified train/test partitioning.

Metrics

Accuracy
F1-score

Results

Final Test Accuracy: 75.05%
Final Test F1-Score: 62.38%
Best CV F1-score: 63.96%

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator.

Hardware Type: Standard local CPU training environment
Training profile: Classical ML grid-search over two model families

Author & Contact

Aashir Hameed

🌐 Website: aashir92.github.io
💼 LinkedIn: Aashir Hameed
🐙 GitHub: @aashir92

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
README_HF.md		README_HF.md
app.py		app.py
churn_model_v1.pkl		churn_model_v1.pkl
publish_hf.py		publish_hf.py
requirements.txt		requirements.txt
task1_churn_pipeline.py		task1_churn_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Card for Customer Churn Prediction Pipeline

Model Details

Model Description

Model Sources

Uses

Direct Use

Out-of-Scope Use

Bias, Risks, and Limitations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Metrics

Results

Environmental Impact

Author & Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Model Card for Customer Churn Prediction Pipeline

Model Details

Model Description

Model Sources

Uses

Direct Use

Out-of-Scope Use

Bias, Risks, and Limitations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Metrics

Results

Environmental Impact

Author & Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages