# PREDICTING CHURN IN TELECOM'S DATASET

* **AUTHOR** : PETER MAINA  
* **TMs**    : Anthony Muiko / Diana Mongina  
* **DATE**   : 27Th August 2024  
* **COHORT** : DSFT-09

![alt text](Images/Churn.jpg)



## **1. BUSINESS UNDERSTANDING**

#### **PROJECT OVERVIEW**

> * Churn occurs when customers are leaving a company's services in pursuit of better services from other network providers.  
> * This is caused by dissatisfaction of the company's services or competitors offering better prices.  
> * Churn causes loss of the revenue to the company and it makes it hard to retain customers.  
> * Identifying potential churners will help to retain customers and improve customer satisfaction.

**BUSINESS PROBLEM**

> 1. The business problem is to identify the customers who have a high likelihood of churning and to develop effective strategies to reduce or to retain churning customers.  
> 2. Identify factors that cause customer dissatisfaction and churn, such as network quality, customer service issues, or pricing concerns.   
> 3. To identify customer segments based on their behavior and likelihood of churn inorder to tailor marketing and retention strategies to each group's specific needs and preferences.

**PROJECT OBJECTIVE**

> 1. **Churn Prediction**: To build predictive machine learning models that can predict which customers are likely to churn by using data to analyze customer features.
> 2. **Model Performance Assessment**: Comparing the machine learning models and determine which is the most accurate model in prediction.
> 3. **Increase Revenue**: Retaining more customers would allow for more revenue and also an increase in market share.
> 4. **Feature Insights**: Examining individual features will help gain insights on the causes of customer churn within the telecommunication company

**DATA SOURCE**

> My project utilizes data obtained from [Kaggle](https://www.kaggle.com/datasets/becksddf/churn-in-telecoms-dataset/data), it is about customer churn in a telecommunication company.

**STAKEHOLDERS**

> Stakeholders are telecommunications companies.  
> These companies can use this dataset and models to predict which customers are likely to churn.

**METHODOLOGY**

* The project will use the CRISP-DM that is Cross-Industry Standard Process for Data Mining methodology, which has several stages:

   >> Business understanding  
   >> Data Understanding  
   >> Data preparation  
   >> Modeling  
   >> Evaluation  
   >> Deployment  

## **2. DATA UNDERSTANDING**

In [4]:
# import relevant libraries
import csv 
import pandas as pd 
import seaborn as sns

# Data visualization
import seaborn as sns 
import matplotlib.pyplot as plt 
%matplotlib inline

# Modeling
import sklearn
from sklearn.model_selection import train_test_split,cross_val_score,GridSearchCV 
from imblearn.over_sampling import SMOTE, SMOTENC
from sklearn.metrics import f1_score,recall_score,precision_score,confusion_matrix,roc_curve,roc_auc_score,classification_report 

# performance metrics
from scipy import stats
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler

# Algorithms for supervised learning methods
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# Filtering future warnings
import warnings
warnings.filterwarnings('ignore')
