# Telco Customer Churn - Exploratory Data Analysis

## Objective
Perform comprehensive EDA on the Telco Customer Churn dataset to understand patterns, relationships, and insights that will guide our modeling approach.

## Key Questions
1. What is the overall churn rate?
2. Which features are most correlated with churn?
3. Are there clear customer segments with different churn patterns?
4. What is the relationship between tenure, charges, and churn?
5. How do service subscriptions affect churn probability?


In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 12

print('Libraries imported successfully!')


In [None]:
# Load processed data
df = pd.read_csv('../data/processed/processed_telco_data.csv')
print(f'Dataset shape: {df.shape}')
print(f'Memory usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB')


## 1. Dataset Overview


In [None]:
# Display basic information
print('Dataset Info:')
print('=' * 50)
df.info()


In [None]:
# Display first few rows
df.head()


In [None]:
# Statistical summary
df.describe()


## 2. Target Variable Analysis


In [None]:
# Churn distribution
churn_counts = df['Churn'].value_counts()
churn_percentage = df['Churn'].value_counts(normalize=True) * 100

fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('Churn Count', 'Churn Percentage'),
    specs=[[{'type': 'bar'}, {'type': 'pie'}]]
)

# Bar chart
fig.add_trace(
    go.Bar(x=churn_counts.index, y=churn_counts.values,
           marker_color=['#2ecc71', '#e74c3c']),
    row=1, col=1
)

# Pie chart
fig.add_trace(
    go.Pie(labels=churn_counts.index, values=churn_counts.values,
           marker_colors=['#2ecc71', '#e74c3c']),
    row=1, col=2
)

fig.update_layout(height=400, title_text='Customer Churn Distribution')
fig.show()

print(f"Churn Rate: {churn_percentage['Yes']:.2f}%")
print(f"Retention Rate: {churn_percentage['No']:.2f}%")


## 3. Feature Correlations


In [None]:
# Convert categorical variables for correlation analysis
df_encoded = df.copy()
df_encoded['Churn_Binary'] = (df_encoded['Churn'] == 'Yes').astype(int)

# Select numeric columns
numeric_cols = df_encoded.select_dtypes(include=[np.number]).columns.tolist()

# Calculate correlation with churn
churn_correlations = df_encoded[numeric_cols].corr()['Churn_Binary'].sort_values(ascending=False)

# Plot top correlations
plt.figure(figsize=(10, 8))
churn_correlations[1:].plot(kind='barh')
plt.title('Feature Correlation with Churn')
plt.xlabel('Correlation Coefficient')
plt.tight_layout()
plt.show()


## 4. Customer Segmentation Analysis


In [None]:
# Churn by Contract Type
contract_churn = pd.crosstab(df['Contract'], df['Churn'], normalize='index') * 100

fig = px.bar(contract_churn.T, 
             title='Churn Rate by Contract Type',
             labels={'value': 'Percentage', 'index': 'Churn Status'},
             color_discrete_map={'No': '#2ecc71', 'Yes': '#e74c3c'})
fig.show()

print("Churn Rate by Contract Type:")
print(contract_churn['Yes'].sort_values(ascending=False))


In [None]:
# Churn by Tenure Group
tenure_churn = pd.crosstab(df['tenure_group'], df['Churn'], normalize='index') * 100

fig = px.bar(tenure_churn['Yes'].sort_index(), 
             title='Churn Rate by Tenure Group',
             labels={'value': 'Churn Rate (%)', 'tenure_group': 'Tenure Group (months)'},
             color_discrete_sequence=['#3498db'])
fig.show()


## 5. Service Usage Patterns


In [None]:
# Churn by number of services
service_churn = df.groupby('total_services')['Churn'].value_counts(normalize=True).unstack() * 100

fig = px.bar(service_churn['Yes'], 
             title='Churn Rate by Number of Services',
             labels={'value': 'Churn Rate (%)', 'total_services': 'Number of Services'},
             color_discrete_sequence=['#e74c3c'])
fig.show()

print("Key Insight: Customers with fewer services have higher churn rates")


## 6. Financial Analysis


In [None]:
# Monthly charges distribution by churn
fig = px.box(df, x='Churn', y='MonthlyCharges', 
             title='Monthly Charges Distribution by Churn Status',
             color='Churn',
             color_discrete_map={'No': '#2ecc71', 'Yes': '#e74c3c'})
fig.show()

# Statistics
print("Average Monthly Charges:")
print(df.groupby('Churn')['MonthlyCharges'].mean())


In [None]:
# Customer Lifetime Value Analysis
df['estimated_clv'] = df['TotalCharges'] + (df['MonthlyCharges'] * 12)  # Simple CLV estimate

churned_clv = df[df['Churn'] == 'Yes']['estimated_clv'].sum()
total_clv = df['estimated_clv'].sum()

print(f"Estimated CLV at risk from churn: ${churned_clv:,.2f}")
print(f"Percentage of total CLV at risk: {(churned_clv/total_clv)*100:.2f}%")


## 7. Key Insights & Recommendations

### Key Findings:
1. **High-Risk Segments:**
   - Month-to-month contracts have 3x higher churn rate
   - New customers (<12 months tenure) are most likely to churn
   - Electronic check payment users have elevated churn risk

2. **Service Adoption Impact:**
   - Customers with fewer services churn more
   - Online security and tech support reduce churn probability
   
3. **Financial Impact:**
   - Churners have higher monthly charges on average
   - ~26% of customers represent significant revenue risk

### Business Recommendations:
1. **Retention Focus:** Target month-to-month customers for contract upgrades
2. **Early Intervention:** Implement onboarding programs for new customers
3. **Service Bundling:** Promote multiservice packages to increase stickiness
4. **Payment Optimization:** Incentivize automatic payment methods


In [None]:
# Save insights for reporting
insights = {
    'churn_rate': df['Churn'].value_counts(normalize=True)['Yes'] * 100,
    'high_risk_segments': {
        'month_to_month_churn': contract_churn.loc['Month-to-month', 'Yes'],
        'new_customer_churn': tenure_churn.iloc[0]['Yes'],
    },
    'financial_impact': {
        'clv_at_risk': churned_clv,
        'avg_monthly_charges_churners': df[df['Churn']=='Yes']['MonthlyCharges'].mean()
    }
}

import json
with open('../data/processed/eda_insights.json', 'w') as f:
    json.dump(insights, f, indent=2, default=float)
    
print("✅ EDA Complete! Insights saved to data/processed/eda_insights.json")
