In [None]:
# Telecom Churn Intelligence: A Data-Driven Retention and Risk Analysis

## Project Overview

This project analyzes customer churn data from a telecommunications provider to identify key drivers of attrition. The insights help reduce customer loss, improve retention strategies, and guide resource allocation toward high-value customers.

---

## Business Objectives

1. **Evaluate Churn Risk by Demographics and Services**  
   Identify customer groups (e.g., by area code, service plan) most prone to churn.

2. **Analyze Contributing Factors to Customer Churn**  
   Explore how account features, call activity, and service experience influence churn.

3. **Assess Geographical and Contractual Risk**  
   Compare churn rates across area codes and service types to highlight regional or plan-specific risks.

---

## Data Source

- **Dataset**: Customer churn dataset (CSV file)  
- **Records**: ~3,333 customers  
- **Scope**: Includes demographics, service subscriptions, account details, and churn status  

---

## Data Cleaning & Preparation

- Mapped churn into binary numeric variable (`churn_num`)  
- Checked and removed redundant columns  
- Encoded categorical variables like `International plan` and `Voice mail plan`  
- Conducted hypothesis testing on churn across **area codes**  

---

## Key Findings & Visualizations

### Objective 1: Churn Profiles
- Overall churn rate: **14.5%**  
- Customers with **International Plans** churned at **42%**, compared to ~11% without.  
- Churn was significantly higher among customers with **4+ customer service calls**.

### Objective 2: Contributing Factors
- Statistical test results showed **area code is not a significant factor** (p > 0.05).  
- **Customer service calls** had the strongest correlation with churn.  
- High usage (day minutes) also linked to higher churn, likely due to higher charges.

### Objective 3: Geographic & Contractual Risk
- Churn rates across different **area codes** were largely similar.  
- **Monthly plans** saw the highest churn compared to long-term contracts.  

---

## Tools & Technologies

- **Python 3.11**  
- **Pandas, NumPy** – Data processing  
- **Seaborn, Matplotlib** – Visualization  
- **Scikit-learn, Statsmodels** – Hypothesis testing and analysis  
- **Jupyter Notebook** – Interactive exploration  

---

## Strategic Recommendations

### 1. Focus Retention on At-Risk Segments
- Prioritize customers with **International Plans** and those making **frequent service calls**.  
- Offer proactive outreach, discounts, or loyalty rewards.

### 2. Improve Service Experience
- Optimize customer service handling to reduce repeat calls.  
- Monitor heavy users with high charges to prevent dissatisfaction-driven churn.

### 3. Promote Stable Contract Plans
- Encourage migration from monthly contracts to **long-term plans** with incentives.  

---

## Conclusion

The analysis shows that churn is driven more by **service experience** and **plan type** than by geography. International plan customers, heavy users, and frequent service callers are most at risk. Addressing these segments offers the best opportunity to reduce churn and increase retention.

---

## Contact

**Author**: Brian Kiprop Kibor  
**Email**: kipropbrian26@gmail.com  
**Location**: Nairobi, Kenya  


In [None]:
# 4: Exploratory Data Analysis & Insights

## 4.1: Univariate Analysis
- **Churn Distribution:** ~85% of customers did not churn, 15% did, indicating class imbalance.  
- **Customer Support Calls:** Majority of customers made zero support calls. Frequent calls are rare; highest observed was 8 calls.  
- **International Plan:** Only about 10% of customers subscribed to an international plan.  
- **Numerical Features:** Charges correlate directly with minutes (day, evening, night, international). Most numeric variables show reasonable ranges and roughly normal distributions.  

## 4.2: Bivariate Analysis
- **Churn vs International Plan:** Customers with an international plan exhibit higher churn rates, suggesting higher service expectations.  
- **Churn vs Customer Service Calls:** Churned customers made more support calls on average, indicating dissatisfaction or unresolved issues.  
- **Churn by Area Code:** Area code 415 has the highest total customers and elevated churn. Area codes 408 and 510 have similar volumes but lower churn. (Area code is stored as a string to treat it as categorical, not numeric.)  
- **Churn by State:** New Jersey has the highest number of churned customers, followed by Texas, Maryland, and Minnesota.  
- **Numerical Feature Correlations:** Strong correlations exist between minutes and corresponding charges (day, evening, night, international). Total minutes strongly correlate with total charges.

## 4.3: Multivariate Analysis
- **Total Calls by Churn:** Day, evening, and night calls show similar usage patterns for churned and non-churned customers. International calls are minimal.  
- **Minutes & Customer Service Calls vs Churn:** Higher spread in customer service calls among churned customers; day and international minutes show some separation.  
- **Total Minutes vs Churn by Voicemail Plan:** Churned customers without voicemail show higher spread and outliers, suggesting voicemail may improve satisfaction.  
- **Total Charge vs Churn by International Plan:** Churned customers tend to have higher charges; lack of international plan combined with high charges may increase churn likelihood.

# 5: Inferential Analysis
- **Chi-Square Test (Voicemail Plan & Churn):** Significant association exists; customers with voicemail plans behave differently in terms of churn.  
- **Independent Samples T-test (Charges & Churn):** Churned customers have significantly higher mean total charges than non-churned customers.  
- **One-Way ANOVA (Churn & Area Codes):** No significant difference in mean churn rate across area codes; area code alone does not predict churn.
