In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, VotingRegressor
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
import re
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, roc_auc_score
import numpy as np

# **Telecommunication company customer churn prediction using different machine leargning algorithms for classifying**

## **ABSTRACT**
In th

## **1.Introduction**

Customer churn, defined as the act of customers discontinuing their service [1], is a significant challenge faced by the telecommunications industry, where retaining existing customers is often more cost-effective than acquiring new ones [2]. Churn refers to the number of customers who leave a service provider over a certain period [3], and predicting which customers are likely to churn is critical to minimizing revenue losses [4]. The goal of churn prediction is to identify potential churners early, so that businesses can take proactive steps, such as offering personalized incentives and retention strategies, to keep them.

Telecom companies face high acquisition costs compared to retention costs [2], and predicting churn accurately can prevent the misallocation of resources. When the prediction of churn is incorrect, companies may waste money on customers who would not have left, while missing opportunities to retain those who were actually at risk. Studies show that acquisition costs can be up to five times higher than retention costs [5], making it crucial to target the right customers with retention efforts.

With the rise of popularity of data-driven solutions, machine learning has emerged as a powerful tool to predict customer churn [6] based on historical data. By analyzing patterns in customer behavior, service usage, billing information, and other relevant factors like demographic features and more, machine learning models can provide insights that were previously difficult to capture through traditional methods. These predictive models allow businesses to effectively target at-risk customers, offering retention campaigns before the churn occurs [7].Using algorithms such as Random Forests, Support Vector Machines (SVM), Decision Trees and more, companies could gain valuable insights into customer behavior and identify the key factors that drive churn. Accurate churn prediction models can enable telecom companies to reduce churn rates, increase customer loyalty, and enhance profitability. [8]

In addition to improving churn prediction, feature selection and the identification of relevant factors that influence churn may be critical for building effective models. [9] Telecom customers churn for a variety of reasons, and these factors should be considered to develop personalized retention strategies. Machine learning models allow for a more realistic and targeted approach, splitting customers based on their churn risk and offering the companies to develop customized solutions to prevent them from leaving.

This project aims to explore the factors that lead to customer churn in order to allocate resources on their improvement or to gain knowledge on where the resources could be spared and to develop a churn prediction model using machine learning algorithms, with the goal of helping telecom companies predict customer churn and improve customer retention strategies. For that we will be trying different ML algorithms in order to find and tune the best model for our goal - to find out which customers would churn so that resources and strategies are built for their retention and which wouldn't so that resources aren't misallocated into retaining a non churner.

## **2. Related work**

1. [9] Ullah et al. (2019) - This paper used data preprocessing with noise removal and feature selection, applied multiple classifiers including(Random Forest, Decision Tree, Naive Bayes, Multilayer Perceptron (MLP), and Logistic Regression.) employed k-means clustering for customer segmentation, used rule-based algorithms for churn factor identification, and evaluated the model with standard metrics like accuracy and ROC area.

2. [10] Wagh et al. (2023) - This paper used data preprocessing to clean data, Pearson correlation for feature selection, applied Random Forest and Decision Tree for classification, utilized Cox Proportional Hazard Model and Kaplan-Meier analysis for survival prediction, and used SMOTE and ENN for handling imbalanced datasets.

3. [11] Ahmed and Maheswari (2017) - The paper used a hybrid Firefly algorithm, enhanced with Simulated Annealing to optimize comparisons between fireflies, for more efficient classification on large and sparse telecom datasets, and evaluated the model using metrics like ROC, Precision-Recall, F-Measure, and accuracy

4. [12] Jain et al. (2020) - The paper used Logistic Regression and Logit Boost machine learning algorithms, applied after data cleaning and preparation, to predict customer churn, and evaluated their performance using metrics such as Kappa statistic, Mean Absolute Error, Root Mean Square Error, and Accuracy

5. [13] Poudel et al. (2024) - The paper used data preprocessing to clean telecom data, implemented feature engineering and one-hot encoding for categorical variables, applied various machine learning models like SVM, Logistic Regression, Random Forest, Gradient Boosting Machine (GBM), and Neural Networks for churn prediction, evaluated models using 10-fold cross-validation, and employed SHAP (SHapley Additive Explanations) plots for global and local interpretability of predictions​

## **3.Data acquisition and cleaning**

## **References**

[1] Solutions, H. (2024, April 2). Churn Management Basics: How to reduce customer churn. Hitachi Solutions. https://global.hitachi-solutions.com/blog/reduce-customer-churn/#:~:text=Churn%20%E2%80%94%20also%20known%20as%20customer%20churn%2C%20customer%20attrition%2C%20and%20customer%20turnover%20%E2%80%94%20is%20what%20happens%20when%20a%20customer%20ceases%20to%20use%20your%20product%20or%20service%20and%20terminates%20their%20relationship%20with%20your%20company.

[2] El-Abidin, R. (2024, September 25). 50 Customer Retention Statistics to Know. Hubspot. https://blog.hubspot.com/service/statistics-on-customer-retention#:~:text=Acquiring%20new%20customers%20is%20five%20times%20more%20expensive%20than%20retaining%20existing%20customers.

[3] Team, I. (2024, March 21). Churn Rate: What it means, examples, and calculations. Investopedia. https://www.investopedia.com/terms/c/churnrate.asp#:~:text=Churn%20rate%20in%20business,in%20a%20given%20period.

[4] Dzou, C. (2024, August 5). How to predict churn early and increase customer retention. Gong. https://www.gong.io/blog/predicting-churn/#:~:text=Predicting%20customer%20churn%20is%20crucial%20for%20businesses%20because%20it%20allows%20them%20to%20take%20proactive%20measures%20to%20retain%20their%20customer%20base%20and%20improve%20their%20bottom%20line.%C2%A0

[5] Understanding Churn Prediction with Machine Learning | Churned. (n.d.). https://www.churned.io/knowledge-base/understanding-churn-prediction-with-machine-learning#:~:text=Churn%20prediction%20is%20crucial%20for%20businesses%20as%20it%20enables%20them%20to%20identify%20potential%20churn%20risks%20before%20they%20occur.

[6] Prabadevi, B., Shalini, R., & Kavitha, B. (2023). Customer churning analysis using machine learning algorithms. International Journal of Intelligent Networks, 4, 145–154. https://doi.org/10.1016/j.ijin.2023.05.005

[7] Understanding Churn Prediction with Machine Learning | Churned. (n.d.). https://www.churned.io/knowledge-base/understanding-churn-prediction-with-machine-learning#:~:text=Churn%20prediction%20is%20crucial%20for%20businesses%20as%20it%20enables%20them%20to%20identify%20potential%20churn%20risks%20before%20they%20occur.

[8] Wagh, S. K., Andhale, A. A., Wagh, K. S., Pansare, J. R., Ambadekar, S. P., & Gawande, S. (2023). Customer churn prediction in telecom sector using machine learning techniques. Results in Control and Optimization, 14, 100342. https://doi.org/10.1016/j.rico.2023.100342

[9] Ullah I, Raza B, Malik AK, Imran M, Islam SU, Kim SW. A churn prediction model using random forest: analysis of machine learning, techniques for churn prediction and factor identification in telecom sector. IEEE Access May 6/2019. Link: https://ieeexplore.ieee.org/abstract/document/8706988 

[10] Wagh, S. K., Andhale, A. A., Wagh, K. S., Pansare, J. R., Ambadekar, S. P., & Gawande, S. (2023). Customer churn prediction in telecom sector using machine learning techniques. Results in Control and Optimization, 14, 100342. https://doi.org/10.1016/j.rico.2023.100342

[11] Ahmed, A. A., & Maheswari, D. (2017). Churn prediction on huge telecom data using hybrid firefly based classification. Egyptian Informatics Journal, 18(3), 215–220. https://doi.org/10.1016/j.eij.2017.02.002

[12] Jain, H., Khunteta, A., & Srivastava, S. (2020). Churn Prediction in Telecommunication using Logistic Regression and Logit Boost. Procedia Computer Science, 167, 101–112. https://doi.org/10.1016/j.procs.2020.03.187

[13] Poudel, S. S., Pokharel, S., & Timilsina, M. (2024). Explaining customer churn prediction in telecom industry using tabular machine learning models. Machine Learning With Applications, 17, 100567. https://doi.org/10.1016/j.mlwa.2024.100567