## Business Understanding

### Project Background
SyriaTel is a major telecommunications company operating in a competitive and evolving mobile service industry. Like many telecom companies, SyriaTel faces the ongoing challenge of customer churn — the phenomenon where customers discontinue their subscriptions or switch to competitors. High churn rates can significantly reduce profitability, especially in markets where acquiring new customers is more costly than retaining existing ones.

To address this, SyriaTel has collected data on its customer base, including usage patterns, demographic attributes, service subscriptions, and customer support interactions. The objective of this project is to build a **machine learning classification model** that can accurately **predict whether a customer is likely to churn** in the near future. With this insight, SyriaTel can proactively take steps to retain customers at risk of leaving.

---

### Business Objective
The main goal is to **reduce customer churn** by:
- Identifying customers most at risk of leaving.
- Understanding the **key drivers** or reasons behind churn behavior.
- Enabling SyriaTel’s **Customer Retention Team** to take **targeted actions** like personalized offers, loyalty programs, or improved service quality.

This model serves as a **decision support tool** to optimize customer engagement strategies, reduce revenue loss, and improve customer satisfaction.

---

### Stakeholders
1. **Customer Retention Team** – to design and execute proactive retention campaigns.
2. **Marketing Department** – to develop data-driven promotions or loyalty incentives.
3. **Customer Service Team** – to prioritize high-risk customers for support interventions.
4. **Product Development Team** – to gain insight into what service features influence customer satisfaction.
5. **Executive Leadership** – to understand customer lifetime value and strategic revenue impacts.

---

### Key Business Questions
- Can we predict which customers are likely to churn based on historical data?
- What are the **most important factors** influencing customer churn?
- How can we **reduce churn rates** using actionable insights from the model?
- Which customer segments are most vulnerable to churn, and how should they be addressed differently?

---

### Expected Business Value
- **Revenue Preservation**: Preventing churn is generally more cost-effective than acquiring new customers.
- **Customer Satisfaction**: Targeted intervention improves brand perception and loyalty.
- **Operational Efficiency**: Helps prioritize resources (e.g., call center time, retention campaigns) where they are most needed.
- **Strategic Planning**: Provides leadership with predictive analytics to guide business decisions.

---

### Success Criteria
- A **classification model** with acceptable performance metrics (e.g., F1-score, precision, recall) especially for the **churn class**.
- Business interpretability through **feature importance** to guide actionable strategies.
- Presentation of insights in a format understandable to non-technical stakeholders.

---


##  Data Understanding

###  Dataset Overview
The SyriaTel Customer Churn dataset contains customer-level information collected from SyriaTel’s service platform. Each row represents a single customer, and the columns represent various attributes — ranging from service usage and subscription types to demographic and account information.

This is a **supervised binary classification problem** where the **target variable** is:
- `Churn`: Indicates whether the customer has churned (`Yes`) or not (`No`).

---

### Data Dimensions
- **Rows**: ~7,000 (each representing a unique customer)
- **Columns (Features)**: ~20 (both numerical and categorical)

---

### Key Features
Below is a summary of the most important features commonly found in this dataset:

| Feature | Description | Type |
|--------|-------------|------|
| `CustomerID` | Unique identifier for each customer | Categorical (ID) |
| `gender` | Customer’s gender (Male/Female) | Categorical |
| `SeniorCitizen` | Indicates if the customer is a senior (1 = Yes, 0 = No) | Binary |
| `Partner` | Whether the customer has a partner (Yes/No) | Categorical |
| `Dependents` | Whether the customer has dependents (Yes/No) | Categorical |
| `tenure` | Number of months the customer has stayed with the company | Numerical |
| `PhoneService` | Whether the customer has phone service (Yes/No) | Categorical |
| `MultipleLines` | Whether the customer has multiple phone lines | Categorical |
| `InternetService` | Type of internet service (DSL, Fiber optic, No) | Categorical |
| `OnlineSecurity` | Whether the customer has online security (Yes/No/No internet) | Categorical |
| `OnlineBackup` | Whether the customer has online backup (Yes/No/No internet) | Categorical |
| `DeviceProtection` | Whether the customer has device protection (Yes/No/No internet) | Categorical |
| `TechSupport` | Whether the customer has tech support (Yes/No/No internet) | Categorical |
| `StreamingTV` | Whether the customer streams TV (Yes/No/No internet) | Categorical |
| `StreamingMovies` | Whether the customer streams movies (Yes/No/No internet) | Categorical |
| `Contract` | Type of contract (Month-to-month, One year, Two year) | Categorical |
| `PaperlessBilling` | Whether the customer uses paperless billing (Yes/No) | Categorical |
| `PaymentMethod` | Method of payment (e.g., Electronic check, Bank transfer, etc.) | Categorical |
| `MonthlyCharges` | Monthly amount charged to the customer | Numerical |
| `TotalCharges` | Total amount charged to the customer | Numerical |
| `Churn` | Target variable: Has the customer churned? (Yes/No) | Binary |

---

### Target Variable: `Churn`
- **Type**: Binary (Yes = Churned, No = Stayed)
- **Goal**: Predict whether a customer will churn based on the other features.

---

### Data Distribution Observations (initial expectations)
- `Churn` is typically **imbalanced** — more "No" than "Yes".
- `tenure` may have a strong inverse correlation with churn.
- Features like `Contract`, `MonthlyCharges`, `TechSupport`, and `InternetService` often show significant influence on churn.
- `TotalCharges` may contain **missing or malformed values**, especially for customers with very short tenure.

---

### Data Quality Checks
- **Missing Values**: Check for blanks or improperly encoded nulls (e.g., in `TotalCharges`)
- **Duplicates**: Ensure no duplicate `CustomerID` values
- **Inconsistent Categories**: Standardize categorical labels (e.g., "No internet service" vs "No")
- **Data Types**: Ensure proper data typing, especially for numerical columns stored as strings

---

### Feature Engineering Opportunities
- Create `AverageMonthlySpend` = `TotalCharges` / `tenure` (handling divide-by-zero)
- Bin `tenure` into categories (New, Mid-Term, Loyal)
- One-hot encode or label encode categorical variables for modeling
- Combine service-related columns to create a **service engagement score**

---

### Early Hypotheses
- Customers with **month-to-month contracts** are more likely to churn.
- **Higher monthly charges** might be associated with dissatisfaction and churn.
- Customers **without internet services** may be less sticky.
- Absence of **tech support or device protection** could correlate with higher churn.

---

### Tools for Exploration
- **Pandas Profiling** or **Sweetviz** for EDA automation
- Seaborn/Matplotlib for visual distribution analysis
- Correlation matrix to explore relationships between numerical variables
- Chi-square tests or ANOVA for categorical feature relevance

---


### Data Preparation
 Starting with loading the dataset 

In [None]:
# Data manipulation
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Warnings
import warnings
warnings.filterwarnings('ignore')
