## SyrialTel Customer Churn

### Problem Statement: Predicting Customer Churn for SyriaTel
`Customer Churn` refers to the phenomenon where customers stop using a company's products or services. In telecommunications industry, churn occurs when a subscriber cancels their service, switches to a competitor, or stops engaging with the company altogether

For Syrialtel, a telcom provider, high churn rates lead to significant revenue losses, increased customer acquisition costs, and a weakened market position.Retaining existing customers is generally more cost-effective than acquiring new ones, making churn prediction a critical business priority.

### Disadvantages of Customer Churn:
1. Revenue loss - Losing customers reduces recurring revenue, impacting overall profitability
2. Higher Acquisition Costs - Acquiring new customers is often more expensive than retaining existiong ones.
3. Reputational Damage - High churn ratess may inidcate poor service quality,leading to negative word-of-mouth
4. Reduced Customer Lifetime Value (CLV) - Frequent customer exits lower the long-tern revenue a company can generate from each user
5. Operational Inefficiencies - constantly replacing lost customers requires continous marketing and sales efforts, increase costs

### Objective
The goal is to build a predictive model that identifies customers who are likely to churn in the near future. By analyzing patterns in customer behaviour, the company can implement targeted retetion strategies, such as personalized offers, improved customer support, or proactive engagement, to reduced churn and enhance customer loyalty

### 1.0 Import Libraries


In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder,StandardScaler
from sklearn.metrics import classification_report,roc_auc_score,accuracy_score,precision_score,recall_score,f1_score,roc_curve
from sklearn.linear_model import LogisticRegression
from sklearn import svm
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import Pipeline as ImbPipeline

### 2.0 Understanding the dataset

In [3]:
#Read file from the csv as a dataframe and display the first 5 rows
df = pd.read_csv('CustomerChurnData.csv')
df.head()

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,...,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,...,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,...,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,...,110,10.3,162.6,104,7.32,12.2,5,3.29,0,False
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.9,...,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,...,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False


### Data Description /Features in the Dataset
This features will help in determining if there is a pattern in customers that have churned versus customers that have not

- `state` : state the customer lives in
- `Account length` : The number of days the customer has had the account
- `Area code` : the area code of the customer
- `Phone number` : The phone number of the customer
- `Internation plan` : true if the customer has the international plan, otherwise false
- `Voice mail plan` : true if the customer has the voice mail plan, otherwise false
- `number vmail messages` : Number of voicemails the customer has sent
- `total day minutes` : total number of minutes the customer has used in calls made during the day
- `total day calls` : total number of calls the user has done during the day
- ` total day charge` : total amount of money the customer was charged by the Telecom company for calls made during the day
- `total eve minutes` : total number of minutes the customer has used in calls made in the evening
- `total eve calls` : total number of calls the user has done in the evening
- `total eve charge` : total amount of money the customer was charged by the Telecom company for calls made in the evening
- `total night minutes` : total number of minutes the customer has used during the night 
- `total night calls` : total number of calls the user has done during the night
- `total night charge` : total amount of money the customer was charged by the Telecom company for calls made at night`
- `total intl minutes`: total number of minutes the user has been in international calls
- `total intl calls` : total number of international calls the customer has done 
- `total intl charge`: total amount of monye the customer was charged by the Telcom company for international calls
- `customer service calls` : number of calls the customer has mase to customer service
- `churn` - true if the customer terminated their contract, otherwise false

In [8]:
# Check the number of records and features using the .shape method
print(f'The dataset has {df.shape[0]} rows')
print(f'The dataset has {df.shape[1]} columns')

The dataset has 3333 rows
The dataset has 21 columns
