# SyriaTel Telecom Customer Churn Prediction 
## 1. Business Understanding
###  Business Problem
In the highly competitive telecommunications industry, customer retention is one of the most important drivers of profitability. Acquiring new customers is often more expensive than retaining existing ones, and high churn (customer dropout) can significantly impact revenue and long-term growth.
SyrialTel, a telecommunications provider is losing revenue due to customer churn. Customer churn is the act of customers ceasing their subscriptions or leaving the service. Reducing churn is crucial for maintaining a stable customer base and profitability.
The current system lacks a predictive mechanism to identify customers at risk of churning, meaning the business is reactive rather than proactive in handling customer dissatisfaction. This limits the ability of the Customer Retention and Marketing teams to design timely and targeted interventions (e.g., special offers, upgrades, or customer service outreach).

### Business Objective
The goal is to develop a **machine learning classification model** that predicts whether a customer will churn using available customer data.

To solve this problem, I will:
    - Use Logistic regression as a baseline model
    - Build a decision tree classifier as an improved, non linear model with tuned hyperparameters.
    - Focus on maximising recall for churners, ensuring customer data.
    - Present actionable insights

### Stakeholder
SyrialTel's Marketing and Customer Retention team who will use the model's predictions to design targeted campaigns to reduce churn.

## 2. Data Understanding
The dataset being used for this project was obtained from kaggle. The dataset is reviewed to access the structure and characteristics of the data.

In [7]:
# import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier,plot_tree
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn import tree

In [8]:
df = pd.read_csv('customer-churn.csv')
df.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


In [3]:
df.columns

Index(['state', 'account length', 'area code', 'phone number',
       'international plan', 'voice mail plan', 'number vmail messages',
       'total day minutes', 'total day calls', 'total day charge',
       'total eve minutes', 'total eve calls', 'total eve charge',
       'total night minutes', 'total night calls', 'total night charge',
       'total intl minutes', 'total intl calls', 'total intl charge',
       'customer service calls', 'churn'],
      dtype='object')

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   state                   3333 non-null   object 
 1   account length          3333 non-null   int64  
 2   area code               3333 non-null   int64  
 3   phone number            3333 non-null   object 
 4   international plan      3333 non-null   object 
 5   voice mail plan         3333 non-null   object 
 6   number vmail messages   3333 non-null   int64  
 7   total day minutes       3333 non-null   float64
 8   total day calls         3333 non-null   int64  
 9   total day charge        3333 non-null   float64
 10  total eve minutes       3333 non-null   float64
 11  total eve calls         3333 non-null   int64  
 12  total eve charge        3333 non-null   float64
 13  total night minutes     3333 non-null   float64
 14  total night calls       3333 non-null   