# <span style='color:#2E86C1'>SyriaTel Customer Churn</span>

<h2><b><span style='font-family:Georgia'> 1. Business Understanding </span></b></h2>

<div style="background-color:#FFD100; padding:10px; border-radius:5px;">
<b>1.1 Business Overview</b> 
</div>

<span style='font-family:Georgia'>The telecommunications industry is one of the most competitive sectors worldwide, where customer retention plays a critical role in profitability and long-term sustainability. While acquiring new customers is costly, retaining existing ones is significantly more profitable (Propello, 2024).Globally, churn rates in telecom average around <b>31%</b> annually, with mobile churn around 20%, highlighting the magnitude of the challenge.

Within Syria, the mobile telecom market is dominated by two key players: <b>Syriatel</b> and <b>MTN-Syria</b>. Syriatel, founded in 2000, currently holds about 71% of the market share and reported 20% revenue growth in 2019, equivalent to SYP 221bn (~US$242m) . Despite its dominance, Syriatel faces challenges from economic sanctions, instability, and increasing customer expectations for service quality, pricing, and personalization.
(The Syria Report, 2020),For this reason, understanding what factors drive churn and how to reduce it is a key business priority.

<b>Churn</b> in this context refers to the proportion of customers who stop using Syriatel’s services, either by terminating contracts or switching to competitors. Understanding and reducing churn is crucial for Syriatel’s financial health and market leadership.</span>

<div style="background-color:#FFD100; padding:10px; border-radius:5px;">
<b>1.2 Problem Statement</b> 
</div>

<span style='font-family:Georgia'>Syriatel is experiencing customer churn that directly threatens its revenues and market position. While it has historically maintained a strong market share, rising competition, service quality issues, evolving customer demands, and broader political-economic instability increase the likelihood of customer attrition. Without effective churn prediction and retention strategies, Syriatel risks losing valuable customers, resulting in revenue loss, reduced market share, and diminished competitiveness.</span>

<span style='font-family:Georgia'><div style="background-color:#FFD100; padding:10px; border-radius:5px;">
<b>1.3 Business Objective</b> 
</div></span>

<span style='font-family:Georgia'><h7><b>1.3.1 Main objective:</b></h7>

To develop a <b>machine learning classifier</b> that predicts whether a Syriatel customer is likely to churn, enabling data-driven strategies for proactive retention.</span>

<span style='font-family:Georgia'><h7><b>1.3.2 Specific objectives:</b></h7>

1. To explore customer demographics and usage behaviour influencing churn.
2. To determine how charges influence customer churn
3. To develop and evaluate machine learning models that classify whether a customer is likely to churn.
4. Optimize the models for best perfomance.
5. To provide actionable insights that support Syriatel in designing targeted retention campaigns (e.g., loyalty programs, personalized offers)</span>

<span style='font-family:Georgia'><div style="background-color:#FFD100; padding:5px; border-radius:3px;">
<b>1.4 Research Questions</b> 
</div>

1. How does customer demographics and usage behaviour influence churn?
2. How charges influence customer churn?
3. What machine learning models best predict whether a customer is likely to churn.
4. Which optimization techniques and modeling approaches most effectively improve the predictive performance of machine learning model
5. How can predictive insights be applied to practical retention strategies to minimize churn?</span>

<span style='font-family:Georgia'><div style="background-color:#FFD100; padding:5px; border-radius:5px;">
<b>1.5 Success Criteria</b> 
</div>

* <b>Model Performance</b>: Achieve at least 85% accuracy and a high AUC score (>0.85) in predicting churn.
* <b>Business Impact</b>: Provide insights that reduce churn rates by enabling proactive retention strategies, targeting high-risk customers before they leave.

<h2><b><span style='font-family:Georgia'> 2. Data Understanding </span></b></h2>

<div style="background-color:#FFD100; padding:10px; border-radius:5px;">
<b> Data overview</b> 
</div>

<span style='font-family:Georgia'>The dataset is from [kaggle](https://www.kaggle.com/datasets/becksddf/churn-in-telecoms-dataset) with 3333 rows aand 21 columns</span>

<div style="background-color:#FFD100; padding:10px; border-radius:5px;">
<b> Key Attributes</b> 
</div>

* <span style='font-family:Georgia'>`state`: The state of the customer.
* `account length`: The length of the account in days .
* `area code`: The area code of the customer's phone number.
* `phone number`: The phone number of the customer.
* `international plan`: Whether the customer has an international plan or not.
* `voice mail plan`: Whether the customer has a voicemail plan or not.
* `number vmail messages`: The number of voicemail messages the customer has.
* `total day minutes`: Total minutes of day calls.
* `total day calls`: Total number of day calls.
* `total day charge`: Total charge for the day calls.
* `total eve minutes`: Total minutes of evening calls.
* `total eve calls`: Total number of evening calls.
* `total eve charge`: Total charge for the evening calls.
* `total night minutes`: Total minutes of night calls.
* `total night call`s`: Total number of night calls.
* `total night charge`: Total charge for the night calls.
* `total intl minutes`: Total minutes of international calls.
* `total intl calls`: Total number of international calls.
* `total intl charge`: Total charge for the international calls.
* `customer service calls`: Number of times the customer called customer service.
* `churn`: Whether the customer churned or not (True/False).</span>

<div style="background-color:#FFD100; padding:10px; border-radius:5px;">
<b> Data Quality</b> 
</div>

<span style='font-family:Georgia'>

* <b>Missing values</b>: none detected.
* <b>Duplicates</b>:
* <b>Data types</b>:Data types: Variables are a mix of categorical example : plans, churn and numerical examples: minutes, charges, calls.</span>

<h2><b><span style='font-family:Georgia'> 3. Data Preparation </span></b></h2>


* <span style='font-family:Georgia'> <b>Import libraries</b> - for Exploratory Data Analysis and for modelling later.
* <b>Load dataset.</b>
* <b>Inspect data</b> - check the shape, types, missing values.
* <b>Data cleaning</b> - checksinf for duplicates, missing values, data types, outliers.
* <b>Univariate analysis</b> - distributions of features.
* <b>Bivariate analysis</b> - relationships with churn.
* <b>Multivariate analysis</b> - correlations and interactions.</span>

In [43]:
# 1. Import Libraries

import pandas as pd
import numpy as np
from skimpy import skim

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Stats
from scipy import stats

# For later modeling (import now so kernel is ready)
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report


In [44]:
df = pd.read_csv("bigml_59c28831336c6604c800002a.csv")

In [45]:
df.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [46]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   state                   3333 non-null   object 
 1   account length          3333 non-null   int64  
 2   area code               3333 non-null   int64  
 3   phone number            3333 non-null   object 
 4   international plan      3333 non-null   object 
 5   voice mail plan         3333 non-null   object 
 6   number vmail messages   3333 non-null   int64  
 7   total day minutes       3333 non-null   float64
 8   total day calls         3333 non-null   int64  
 9   total day charge        3333 non-null   float64
 10  total eve minutes       3333 non-null   float64
 11  total eve calls         3333 non-null   int64  
 12  total eve charge        3333 non-null   float64
 13  total night minutes     3333 non-null   float64
 14  total night calls       3333 non-null   

In [47]:
df.columns

Index(['state', 'account length', 'area code', 'phone number',
       'international plan', 'voice mail plan', 'number vmail messages',
       'total day minutes', 'total day calls', 'total day charge',
       'total eve minutes', 'total eve calls', 'total eve charge',
       'total night minutes', 'total night calls', 'total night charge',
       'total intl minutes', 'total intl calls', 'total intl charge',
       'customer service calls', 'churn'],
      dtype='object')

In [48]:
df.shape

(3333, 21)

In [49]:
df.describe()

Unnamed: 0,account length,area code,number vmail messages,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls
count,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0
mean,101.064806,437.182418,8.09901,179.775098,100.435644,30.562307,200.980348,100.114311,17.08354,200.872037,100.107711,9.039325,10.237294,4.479448,2.764581,1.562856
std,39.822106,42.37129,13.688365,54.467389,20.069084,9.259435,50.713844,19.922625,4.310668,50.573847,19.568609,2.275873,2.79184,2.461214,0.753773,1.315491
min,1.0,408.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,23.2,33.0,1.04,0.0,0.0,0.0,0.0
25%,74.0,408.0,0.0,143.7,87.0,24.43,166.6,87.0,14.16,167.0,87.0,7.52,8.5,3.0,2.3,1.0
50%,101.0,415.0,0.0,179.4,101.0,30.5,201.4,100.0,17.12,201.2,100.0,9.05,10.3,4.0,2.78,1.0
75%,127.0,510.0,20.0,216.4,114.0,36.79,235.3,114.0,20.0,235.3,113.0,10.59,12.1,6.0,3.27,2.0
max,243.0,510.0,51.0,350.8,165.0,59.64,363.7,170.0,30.91,395.0,175.0,17.77,20.0,20.0,5.4,9.0


In [50]:
skim(df)