# TELECOMMUNICATION ANALYSIS  

---

## BUSINESS UNDERSTANDING  

SyriaTel is a telecommunication company that specializes in the provision of data and voice services. Like its competitors, the company’s overarching goal is profit maximization. However, one of the major challenges it faces is customer churn, which occurs when subscribers cancel their services and switch to competitors. This project seeks to address that challenge through data-driven methods.  

---

### 1.2 Problem Statement  

SyriaTel is losing a significant amount of revenue because many customers are canceling their services. At present, the company does not have a reliable system to predict which customers are most likely to leave. Without such a system, it is difficult to intervene in time to retain customers, which ultimately leads to reduced profits and loss of market share.  

---

### 1.3 Business Objective  

The main business objective is to reduce customer churn by predicting which customers are at risk of leaving. Achieving this will allow SyriaTel to take timely action and improve customer retention.  

---

### 1.4 Project Objectives  

**Main Objective**  
The main objective of this project is to develop a machine learning classifier that can accurately predict whether a SyriaTel customer is likely to churn.  

**Specific Objectives**  
The specific objectives of the project are:  
1. To analyze customer attributes and usage patterns in order to identify the key drivers of churn.  
2. To develop, train, and evaluate predictive models that classify customers as churners or non-churners.  
3. To generate actionable business insights and recommendations, based on the model outputs, that can support strategies to reduce customer churn.  

---

### 1.5 Research Questions  

The project is guided by the following research questions:  
1. What characteristics are most strongly associated with customers who churn?  
2. Which machine learning algorithm provides the best predictive performance for churn classification?  
3. How can the insights from the churn model be applied to design effective customer retention strategies?  

---

### 1.6 Success Criteria  

The success of this project will be assessed in three ways. First, it should generate actionable insights that SyriaTel can use to reduce churn rates in the future. Second, the predictive model should achieve acceptable levels of performance, with high accuracy and a strong ability to correctly identify customers who are likely to leave. Finally, the results should be presented in a way that is clear and interpretable, so that they can be easily understood and applied by business managers and decision-makers.  

---


## 2. DATA UNDERSTANDING  

The dataset used in this project is the Churn in Telecoms dataset obtained from Kaggle. It contains customer account and usage information for a telecommunications company.  

The dataset does not specify the time frame; it is a cross-sectional snapshot of customer behavior.  

It consists of 3,333 rows and 21 columns.  

The target variable is churn. This is a binary variable that indicates whether a customer has churned (True) or not (False). Since the target is categorical, it will be encoded during data preparation to allow machine learning models to process it.  


## 3. DATA PREPARATION
In this section the Syria Tel dataset is prepared for analysis by cleaning, transforming and standardizing the data. The goal is to ensure accuracy, handle missing values, and make the dataset ready for further exploration and modeling.

---
## 3.1 Importing Relevant Libraries.
We import pandas for data manipulation, Numpy for numerical operations, seaborn and matplotlib for visualizations of patterns and trends.






In [8]:
#importing relevant libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

## 3.2 Loading the data 
Load the already zipped csv file to a dataframe called Syriatel_data for analysis.

In [10]:
#loading the data
Syriatel_data=pd.read_csv('bigml_59c28831336c6604c800002a.csv')
#the first 5
Syriatel_data.head()


Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [11]:
#familiarizing with our data set
Syriatel_data.shape

(3333, 21)

## 3.3 Data Inspection and Data Cleaning
In this step, we preview the dataset to understand its structure, identify missing values, detect duplicates, and check data types. This helps ensure the data is ready for cleaning and preparation.

In [13]:
#checking the data types.
Syriatel_data.dtypes

state                      object
account length              int64
area code                   int64
phone number               object
international plan         object
voice mail plan            object
number vmail messages       int64
total day minutes         float64
total day calls             int64
total day charge          float64
total eve minutes         float64
total eve calls             int64
total eve charge          float64
total night minutes       float64
total night calls           int64
total night charge        float64
total intl minutes        float64
total intl calls            int64
total intl charge         float64
customer service calls      int64
churn                        bool
dtype: object

In [16]:
#changing area code data type to object rather than int becaue it is categorical.
Syriatel_data['area code']= Syriatel_data['area code'].astype ('object')

In [22]:
#previewing the dat
Syriatel_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   state                   3333 non-null   object 
 1   account length          3333 non-null   int64  
 2   area code               3333 non-null   object 
 3   phone number            3333 non-null   object 
 4   international plan      3333 non-null   object 
 5   voice mail plan         3333 non-null   object 
 6   number vmail messages   3333 non-null   int64  
 7   total day minutes       3333 non-null   float64
 8   total day calls         3333 non-null   int64  
 9   total day charge        3333 non-null   float64
 10  total eve minutes       3333 non-null   float64
 11  total eve calls         3333 non-null   int64  
 12  total eve charge        3333 non-null   float64
 13  total night minutes     3333 non-null   float64
 14  total night calls       3333 non-null   

In [26]:
Syriatel_data.describe() #for statistical summary statistics.

Unnamed: 0,account length,number vmail messages,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls
count,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0
mean,101.064806,8.09901,179.775098,100.435644,30.562307,200.980348,100.114311,17.08354,200.872037,100.107711,9.039325,10.237294,4.479448,2.764581,1.562856
std,39.822106,13.688365,54.467389,20.069084,9.259435,50.713844,19.922625,4.310668,50.573847,19.568609,2.275873,2.79184,2.461214,0.753773,1.315491
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,23.2,33.0,1.04,0.0,0.0,0.0,0.0
25%,74.0,0.0,143.7,87.0,24.43,166.6,87.0,14.16,167.0,87.0,7.52,8.5,3.0,2.3,1.0
50%,101.0,0.0,179.4,101.0,30.5,201.4,100.0,17.12,201.2,100.0,9.05,10.3,4.0,2.78,1.0
75%,127.0,20.0,216.4,114.0,36.79,235.3,114.0,20.0,235.3,113.0,10.59,12.1,6.0,3.27,2.0
max,243.0,51.0,350.8,165.0,59.64,363.7,170.0,30.91,395.0,175.0,17.77,20.0,20.0,5.4,9.0


In [28]:
#checking foor missing values
Syriatel_data.isna().sum()

state                     0
account length            0
area code                 0
phone number              0
international plan        0
voice mail plan           0
number vmail messages     0
total day minutes         0
total day calls           0
total day charge          0
total eve minutes         0
total eve calls           0
total eve charge          0
total night minutes       0
total night calls         0
total night charge        0
total intl minutes        0
total intl calls          0
total intl charge         0
customer service calls    0
churn                     0
dtype: int64

In [32]:
#checking for duplicates
Syriatel_data.duplicated().sum()

0

In [38]:
#dropping the phone number column since it is irrelevant.
Syriatel_data=Syriatel_data.drop('phone number',axis=1)


During the data inspection and cleaning it was found out that the Syria Tel data set contained no missing values, no duplicates.

The area code column was converted to object since it is a categorical variable.

The phone number column was dropped because it acts as an identifier and does not affect whether a customer churns.

In [42]:
#saving the cleaned data set
Syriatel_data.to_csv('Syriatel_cleaned.csv', index=False)