Understanding the Business  

Stakeholder SyriaTel

In this case study, the stakeholder is SyriaTel, a telecommunications company which is interested in retaining customers. SyriaTel is concerned with customer churn or customers discontinuing their services, which significantly reduces the revenue, and raises the costs associated with acquiring new customers. SyriaTel attempts to analyze customer profiles of those who are likely to churn so that appropriate measures can be taken on time to prevent them from leaving and consequently increase retention and profitability.  

Business Problem  

SyriaTel's business problem focuses on revenue loss due to customers discontinuing their services. The main issue here is trying to ascertain if a customer is likely to stop doing business with the company in the near future based on their account usage metrics. This is classification problem since the target variable "churn" is a categorical Yes (True) or No (False) will the customer churn. By studying customer behavioral data, SyriaTel will be able to implement preventative measures against potential churn.

Data Preparation

A. Data Preparation  
B. Preprocessing Steps  
C. Dropping Irrelevant Features: 
D. Train-Test Split: We’ll divide the data into 70% for training  and 30% for testing.
E. Encoding Categorical Variables:  
F. Scaling: We’ll use StandardScaler on numeric features like minutes and charges to make sure they work well with distance-based models like logistic regression. We will fit the scaler only on the training data to avoid any data leakage.  
G. Handling Imbalance: To balance the churn classes in our training set, we’ll apply SMOTE (Synthetic Minority Oversampling Technique).

Data Libraries Importing and Dataset Loading

In [12]:
import pandas as pd
import numpy as np
import searborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score
from imblearn.over_sampling import SMOTE

In [15]:
df = pd.read_csv("bigml_59c28831336c6604c800002a.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 21 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   state                   3333 non-null   object 
 1   account length          3333 non-null   int64  
 2   area code               3333 non-null   int64  
 3   phone number            3333 non-null   object 
 4   international plan      3333 non-null   object 
 5   voice mail plan         3333 non-null   object 
 6   number vmail messages   3333 non-null   int64  
 7   total day minutes       3333 non-null   float64
 8   total day calls         3333 non-null   int64  
 9   total day charge        3333 non-null   float64
 10  total eve minutes       3333 non-null   float64
 11  total eve calls         3333 non-null   int64  
 12  total eve charge        3333 non-null   float64
 13  total night minutes     3333 non-null   float64
 14  total night calls       3333 non-null   

Irrelevant Data featues dropping

In [19]:
df= df.drop(["phone number", "state", "area code"], axis=1)
df

Unnamed: 0,account length,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,customer service calls,churn
0,128,no,yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.70,1,False
1,107,no,yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.70,1,False
2,137,no,no,0,243.4,114,41.38,121.2,110,10.30,162.6,104,7.32,12.2,5,3.29,0,False
3,84,yes,no,0,299.4,71,50.90,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False
4,75,yes,no,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3328,192,no,yes,36,156.2,77,26.55,215.5,126,18.32,279.1,83,12.56,9.9,6,2.67,2,False
3329,68,no,no,0,231.1,57,39.29,153.4,55,13.04,191.3,123,8.61,9.6,4,2.59,3,False
3330,28,no,no,0,180.8,109,30.74,288.8,58,24.55,191.9,91,8.64,14.1,6,3.81,2,False
3331,184,yes,no,0,213.8,105,36.35,159.6,84,13.57,139.2,137,6.26,5.0,10,1.35,2,False
