<a href="https://colab.research.google.com/github/team-ben-okri/mtn-churn-prediction/blob/main/mtn_churn_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MTN Churn Prediction

<p align="center">
  <img src="https://upload.wikimedia.org/wikipedia/fr/thumb/e/e9/Mtn-logo-svg.svg/623px-Mtn-logo-svg.svg.png?20220727031824"
       alt="mtn logo" width="1000"/>
</p>

### 📌 Introduction

In the telecom sector, customer churn is a pressing concern, particularly in highly competitive markets like Nigeria. By studying patterns in customer demographics, satisfaction, and usage, companies can better anticipate churn and strengthen retention strategies.

### 🎯 Problem Statement

The goal of this project is to explore MTN Nigeria’s customer data to uncover insights into churn behavior, identify influencing factors, and build predictive models that can flag customers likely to leave.

### 📊 Dataset

This dataset captures synthetic records of MTN Nigeria customers in Q1 2025, covering age, gender, state, tenure, subscription plans, device types, satisfaction scores, usage behavior, revenue, and churn reasons. It contains 974 entries and is suitable for EDA, churn prediction, and customer segmentation.

Link to dataset: [Link](https://www.kaggle.com/datasets/oluwademiladeadeniyi/mtn-nigeria-customer-churn)

### 🎯 Objectives

- Perform Exploratory Data Analysis (EDA) to understand customer usage and churn patterns.
- Build and evaluate machine learning models to predict churn.
- Identify the key drivers of churn for actionable business insights.
- **_Segment customers to support targeted retention strategies._**

## Import necessary libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Load dataset

In [2]:
data_url = "https://raw.githubusercontent.com/team-ben-okri/" \
           "mtn-churn-prediction/refs/heads/main/data/" \
           "mtn_customer_churn.csv"
churn_df = pd.read_csv(data_url)
churn_df.head()

Unnamed: 0,Customer ID,Full Name,Date of Purchase,Age,State,MTN Device,Gender,Satisfaction Rate,Customer Review,Customer Tenure in months,Subscription Plan,Unit Price,Number of Times Purchased,Total Revenue,Data Usage,Customer Churn Status,Reasons for Churn
0,CUST0001,Ngozi Berry,Jan-25,27,Kwara,4G Router,Male,2,Fair,2,165GB Monthly Plan,35000,19,665000,44.48,Yes,Relocation
1,CUST0002,Zainab Baker,Mar-25,16,Abuja (FCT),Mobile SIM Card,Female,2,Fair,22,12.5GB Monthly Plan,5500,12,66000,19.79,Yes,Better Offers from Competitors
2,CUST0003,Saidu Evans,Mar-25,21,Sokoto,5G Broadband Router,Male,1,Poor,60,150GB FUP Monthly Unlimited,20000,8,160000,9.64,No,
3,CUST0003,Saidu Evans,Mar-25,21,Sokoto,Mobile SIM Card,Male,1,Poor,60,1GB+1.5mins Daily Plan,500,8,4000,197.05,No,
4,CUST0003,Saidu Evans,Mar-25,21,Sokoto,Broadband MiFi,Male,1,Poor,60,30GB Monthly Broadband Plan,9000,15,135000,76.34,No,


## Exploratory Data Analysis (EDA)

Print data summary.

In [None]:
churn_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 974 entries, 0 to 973
Data columns (total 17 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Customer ID                974 non-null    object 
 1   Full Name                  974 non-null    object 
 2   Date of Purchase           974 non-null    object 
 3   Age                        974 non-null    int64  
 4   State                      974 non-null    object 
 5   MTN Device                 974 non-null    object 
 6   Gender                     974 non-null    object 
 7   Satisfaction Rate          974 non-null    int64  
 8   Customer Review            974 non-null    object 
 9   Customer Tenure in months  974 non-null    int64  
 10  Subscription Plan          974 non-null    object 
 11  Unit Price                 974 non-null    int64  
 12  Number of Times Purchased  974 non-null    int64  
 13  Total Revenue              974 non-null    int64  

In [5]:
churn_df.shape

(974, 17)

Check missing values.

In [4]:
churn_df.isnull().sum()

Unnamed: 0,0
Customer ID,0
Full Name,0
Date of Purchase,0
Age,0
State,0
MTN Device,0
Gender,0
Satisfaction Rate,0
Customer Review,0
Customer Tenure in months,0


No missing values for all feature and target variables, except for `Reasons for Churn`. This is alright because only customers who have churned have a reason for churn.

In [6]:
974 - 690

284

#### Descriptive statistics

In [3]:
churn_df.describe(include="all")

Unnamed: 0,Customer ID,Full Name,Date of Purchase,Age,State,MTN Device,Gender,Satisfaction Rate,Customer Review,Customer Tenure in months,Subscription Plan,Unit Price,Number of Times Purchased,Total Revenue,Data Usage,Customer Churn Status,Reasons for Churn
count,974,974,974,974.0,974,974,974,974.0,974,974.0,974,974.0,974.0,974.0,974.0,974,284
unique,496,484,3,,35,4,2,,5,,21,,,,,2,7
top,CUST0003,Halima Walker,Feb-25,,Osun,Mobile SIM Card,Female,,Very Good,,60GB Monthly Broadband Plan,,,,,No,High Call Tarriffs
freq,3,5,450,,43,301,495,,212,,81,,,,,690,54
mean,,,,48.043121,,,,2.947639,,31.422998,,19196.663244,10.564682,204669.6,99.304764,,
std,,,,17.764307,,,,1.384219,,17.191256,,25586.726985,5.709427,324785.5,57.739511,,
min,,,,16.0,,,,1.0,,1.0,,350.0,1.0,350.0,0.82,,
25%,,,,32.0,,,,2.0,,17.0,,5500.0,5.0,33000.0,47.6375,,
50%,,,,49.0,,,,3.0,,31.0,,14500.0,11.0,108000.0,103.33,,
75%,,,,63.75,,,,4.0,,47.0,,24000.0,15.0,261000.0,149.6975,,


In the dataset, `Customer ID` can appear multiple times because some customers own more than one MTN device (SIM card, MiFi, router, etc.). This makes the dataset **customer-level** and **device-level** rather than strictly one-row-per-customer.

For this reason, we will keep two portions of the dataset:
- Device level - `device_df`
- Customer level - `customer_df`

In [7]:
# Device-level dataframe (no changes, just the raw dataset)
device_df = churn_df.copy()

# Customer-level aggregation
customer_df = (
    churn_df
    .groupby("Customer ID")
    .agg({
        "Full Name": "first",
        "Age": "first",
        "Gender": "first",
        "State": "first",
        "Customer Tenure in months": "max",
        "Satisfaction Rate": "mean",
        "Data Usage": "sum",
        "Total Revenue": "sum",
        "MTN Device": "nunique",
        "Subscription Plan": "count",
        "Customer Churn Status": lambda x: "Yes" if "Yes" in x.values else "No",
        "Reasons for Churn": lambda x: ', '.join(x.dropna().unique())
    })
    .reset_index()
)

# Rename cutomer-level dataframe columns
customer_df = customer_df.rename(columns={
    "MTN Device": "Num_Device_Types",
    "Subscription Plan": "Num_Subscriptions"
})

print("Device-level shape:", device_df.shape)
print("Customer-level shape:", customer_df.shape)

customer_df.head()

Device-level shape: (974, 17)
Customer-level shape: (496, 13)


Unnamed: 0,Customer ID,Full Name,Age,Gender,State,Customer Tenure in months,Satisfaction Rate,Data Usage,Total Revenue,Num_Device_Types,Num_Subscriptions,Customer Churn Status,Reasons for Churn
0,CUST0001,Ngozi Berry,27,Male,Kwara,2,2.0,44.48,665000,1,1,Yes,Relocation
1,CUST0002,Zainab Baker,16,Female,Abuja (FCT),22,2.0,19.79,66000,1,1,Yes,Better Offers from Competitors
2,CUST0003,Saidu Evans,21,Male,Sokoto,60,1.0,283.03,299000,3,3,No,
3,CUST0004,Ejiro Walker,36,Female,Gombe,14,1.0,92.72,40500,1,1,No,
4,CUST0005,Nura Mann,57,Male,Oyo,53,3.0,42.92,144000,1,1,No,


#### Target Variable Distribution (**`Customer Churn Status`**)

#### Numerical Features Analysis

#### Categorical Features Analysis

#### Correlation Analysis

#### State Insights

## Feature Engineering & Modeling

## Model Evaluation

## Conclusion

Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt.