# **Phase 3 Project**

# **<u>Project Title**
## **Dialing into Retention -**  ***Predicting Customer Churn at SyriaTel Using Machine Learning***

# **Project Objective**

This project aims to support SyriaTel, a leading telecommunications provider, in proactively reducing customer churn through predictive modeling. Churn remains a critical concern for the company, as each lost customer directly impacts recurring revenue and increases acquisition costs. By identifying which customers are at high risk of leaving, SyriaTel can target interventions more effectively, strengthen customer loyalty, and optimize resource allocation.


# **1. Understanding the Business Context**

### **<u>1.1. Brief Description of the Business**

Customer churn can be defined the phenomenon where clients discontinue their service. This is a major concern for subscription-based businesses like SyriaTel. Every lost customer represents not only a direct loss of monthly revenue but also incurs costs associated with acquiring new users to replace them. 

In today’s competitive telecom environment, customer expectations are high, and dissatisfaction with pricing, service quality, or support can quickly lead to attrition.SyriaTel currently lacks an efficient, data-driven method to anticipate customer churn. The existing approach is largely reactive, identifying churn only after it occurs—when it’s too late to intervene. 

The key business question that need to be answered is: *Can customer behavior and service usage patterns be used to accurately predict churn, enabling proactive engagement to retain at-risk customers?* 

By answering this question, SyriaTel can move from a reactive to a proactive posture—intervening before a customer leaves rather than responding after the fact. Such predictive capability would allow for the creation of targeted offers, personalized outreach, and strategic improvements in service delivery. The ultimate objective is not only to reduce churn but to enhance customer lifetime value, improve satisfaction, and gain a competitive edge through data-informed decision-making.

![alt text](image.png)
 
 **Source:** *medium.com/@islamhasabo/predicting-customer-churn-bc76f7760377*
  
 

### **<u>1.2. Stakeholders and Business Value**<u>

In any business endeavor to enhance operational efficiency and profitability, it is vital first to identify the primary decision-makers and users of the insights generated. For this project, the primary stakeholders are **the Customer Retention** and **Marketing teams** at SyriaTel, a leading telecommunications provider. These teams are directly responsible for maintaining and growing the subscriber base, making them the most affected by customer churn issues. 

Secondary stakeholders include the **Call Center Operations unit**, which manages customer grievances and service interactions, factors influencing churn, and **Product Managers**, who design telecom plans that impact customer satisfaction. 

Finally, **Executive Leadership** benefits from strategic insights that guide customer-centric policies and long-term planning. Understanding the goals and concerns of these stakeholder groups ensures that the data science solution is aligned with the company’s strategic priorities.

### **<u>1.3. The Business Question**

### 1.3.1 Key Business Question
The main business question that this project aim to address is:

- Can SyriaTel identify patterns in customer behavior and service usage that reliably predict churn, enabling proactive retention strategies?

### 1.3.2 The Specific Business Questions

Based on the defined business problem, predicting customer churn to enable proactive retention,this project will aim to answer the following specific questions:

**1.	Which customers are most likely to churn?**

*→ Will involve identifying high-risk individuals based on their service usage, behavior, and support interactions.*

**2.	What are the main drivers of customer churn?**
        
*→ Will entail understanding which variables (e.g., international plan, total charges, customer service calls) are most strongly associated with churn.*

**3.	How accurate is our churn prediction model?**
        
*→ Will evaluate the model’s performance using appropriate classification metrics such as accuracy, precision, recall, and F1-score.*

**4.	How can SyriaTel use these predictions to retain customers?**
        
*→ Will provide business recommendations on targeted interventions (e.g., offers, support improvements) for high-risk customers.*

**5.	What trade-offs exist between model complexity and interpretability?**
       
*→ Will balance predictive power with the ability to explain model results to non-technical stakeholders.*


### **<u>1.4 A Brief Description of the Chosen Dataset**

To address the stated business problem, this project uses a well-known, publicly available Telco Customer Churn dataset titled bigml_59c28831336c6604c800002a.csv, *(applied as bigml_59.csv)*, available at https://www.kaggle.com/datasets/becksddf/churn-in-telecoms-dataset. 
While not proprietary to SyriaTel, the dataset closely mirrors the type of operational and behavioral data that a typical telecom provider, such as SyriaTel, collects on its subscribers. 

The dataset contains 3,333 customer records and 21 variables, including demographic information (e.g., account length, state), service plan details (e.g., international and voicemail plans), usage statistics (e.g., total minutes and charges by time of day), and customer interaction data (e.g., number of customer service calls). The target variable, *churn*, is a binary field indicating whether the customer discontinued their service during the observation period.
This rich feature set allows for both exploratory and predictive analysis providing a robust foundation for building classification models that can distinguish between customers likely to remain loyal and those at risk of leaving. 

The structure of this dataset is consistent with the types of data typically captured by telecom providers, making it an appropriate and realistic foundation for model development. It enables the application of classification algorithms to detect early warning signs of churn, and ultimately offers a blueprint for a predictive system that SyriaTel could implement with its internal data.
Moreover, the dataset’s structure supports the use of interpretable models, which are essential when making business decisions that require stakeholder trust and accountability.


### **<u>1.5 The Project Approach Taken**

This section outlines the methodology used in the project, following a standard data science pipeline:
 
**Step 1: Defining the Business Context and Problem.**

•	Business Framing and Problem Definition.

•	Identify primary and secondary stakeholders.

•	Formulate key business questions and success criteria.

 
**Step 2: Data Exploration and Understanding**

•	Load the dataset and examine structure, types, and value distributions.

•	Perform univariate and bivariate analysis to uncover patterns.

•	Identify data quality issues (missing values, outliers, class imbalance).

 
**Step 3: Data Cleaning and Preprocessing**

•	Drop irrelevant features (e.g., phone number).

•	Encode categorical variables using appropriate methods (label or one-hot encoding).

•	Convert Boolean fields to numeric (e.g., True → 1, False → 0).

•	Normalize or scale numerical features if needed.

 
**Step 4: Feature Selection and Engineering**

•	Analyze correlations and importance scores to select useful features.

•	Create derived variables (e.g., total charges per call, high usage flags).

•	Reduce dimensionality if needed (e.g., via PCA or feature pruning).

 
**Step 5: Model Building**

•	Start with a simple baseline model (e.g., logistic regression or decision tree).

•	Split data into training and test sets to avoid overfitting.

•	Train multiple models including tuned versions (e.g., Random Forest, Gradient Boosting).

 
**Step 6: Model Evaluation**

•	Use classification metrics (accuracy, precision, recall, F1-score, ROC-AUC).

•	Evaluate models on both training and test data.

•	Compare performance to select the most balanced model.
 

**Step 7: Interpretation and Insights**

•	Analyze feature importance to understand what drives churn.

•	Visualize decision boundaries or trees for model transparency.

•	Link data patterns to customer behaviors and business policies.

 
**Step 8: Recommendations and Next Steps**

•	Propose targeted retention actions for high-risk customer segments.

•	Suggest data collection improvements for future modeling.

•	Recommend integration of the model into SyriaTel’s CRM or customer service platform.
