# **Exploratory Data Analysis of Telecom Customer Churn**



## üìù **Problem Statement**

The telecom industry experiences **high customer churn**, making it crucial to understand why **customers leave**. This EDA aims to identify key **patterns, trends, and factors** influencing churn by analyzing customer **demographics, services, billing details, and account** information. The goal is to uncover **actionable insights** that help improve **customer retention** and support **future churn prediction** models.

## üéØ **Objectives**



- Identify the overall **churn rate** and compare it with **active customers**.

- Analyze key features such as **demographics, services, billing, and account** details that influence customer churn.

- Uncover important **patterns, correlations, and trends** using EDA.

- Provide **actionable insights** to improve **customer retention** strategies.

## **Import Libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

## **Load data**

In [None]:
data = pd.read_csv('telecom_churn_data.csv')

In [None]:
data

## **Data Description**

The dataset contains **customer information** from a **telecom company**, including **demographics, service subscriptions, account** details, billing information, and a **churn flag** indicating whether a **customer has left the service**. It helps analyze **customer behavior** and identify key factors contributing to c**ustomer churn**.

**customerID** ‚Äì Unique identifier assigned to each customer.

**gender** ‚Äì Customer‚Äôs gender (Male/Female).

**SeniorCitizen** ‚Äì Indicates if the customer is a senior citizen (0 = No, 1 = Yes).

**Partner** ‚Äì Whether the customer has a partner (Yes/No).

**Dependents** ‚Äì Whether the customer has dependents (Yes/No).

**tenure** ‚Äì Number of months the customer has stayed with the company.

**PhoneService** ‚Äì Indicates if the customer has phone service (Yes/No).

**MultipleLines** ‚Äì Availability of multiple phone lines.

**InternetService** ‚Äì Type of internet service (DSL/Fiber optic/No).

**OnlineSecurity** ‚Äì Online security service status.

**OnlineBackup** ‚Äì Online backup service status.

**DeviceProtection** ‚Äì Device protection plan status.

**TechSupport** ‚Äì Technical support service status.

**StreamingTV** ‚Äì Access to streaming TV service.

**StreamingMovies** ‚Äì Access to streaming movie service.

**Contract** ‚Äì Customer‚Äôs contract type (Month-to-month/One year/Two year).

**PaperlessBilling** ‚Äì Indicates if customer uses paperless billing (Yes/No).

**PaymentMethod** ‚Äì Customer‚Äôs mode of payment.

**MonthlyCharges** ‚Äì Monthly charges billed to the customer.

**TotalCharges** ‚Äì Total charges incurred by the customer.

**Churn** ‚Äì Indicates whether the customer has discontinued the service (Yes/No).

## **Basic checks**

In [None]:
# shape
data.shape

In [None]:
# head()
data.head()

In [None]:
# tail()
data.tail()

In [None]:
# columns
data.columns

In [None]:
# dtypes
data.dtypes

In [None]:
# info
data.info()

In [None]:
# num_cols
num_col = data.select_dtypes(include=['float64','int64'])
num_col

In [None]:
# cat_cols
cat_col = data.select_dtypes(include=['object'])
cat_col.columns

In [None]:
# describe
data.describe()

In [None]:
# describe
data.describe(include='object')

In [None]:
# unique
lst =['gender', 'Partner', 'Dependents', 'PhoneService',
       'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup',
       'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies',
       'Contract', 'PaperlessBilling', 'PaymentMethod', 'TotalCharges',
       'Churn']
for x in lst:
  print(x)
  print(data[x].unique())
  print()

In [None]:
# replace empty strings in data with nan
data.replace({" ":np.nan},inplace=True)

**Insights: TotalCharges data type is wrong**

In [None]:
data.loc[:,'Total_Charges'] = data.loc[:,'TotalCharges'].astype('float64')

In [None]:
data.drop('TotalCharges',axis=1,inplace=True)

In [None]:
data.loc[:,'Total_Charges'].dtypes

In [None]:
# check duplicates
data.duplicated().sum()

In [None]:
# check missing values
data.isnull().sum()

In [None]:
# check distribution of Total charges
sns.histplot(data['Total_Charges'], bins=30, kde=True)
plt.show()

In [None]:
# Since distribution is skewed replace with median
data.loc[:,'Total_Charges'].fillna(data.loc[:,'Total_Charges'].median(),inplace=True)

**Note:** You can also replace the NA values with the mean of the Total charges column instead of Median

In [None]:
data.isnull().sum()

## **Exploratory data analysis**

**What is the distribution of Churn (Yes/No)?**

In [None]:
plt.figure(figsize=(4,3))
sns.countplot(data, x='Churn',color='red')
plt.title("Churn Distribution")
plt.show()


**Insights**


**Customer Retention:** A large proportion of customers continue using the service, indicating relatively strong retention.

**Churn Rate:** A significant portion of customers have churned, which suggests areas for improvement in customer satisfaction or service quality.

**Opportunity for Improvement:** Reducing the churn rate could lead to better customer loyalty and more stable revenue streams.

**Actionable Insight:** The company could benefit from targeted efforts to understand and address the reasons behind the churn.

**How many customers fall under each Contract type?**

In [None]:
plt.figure(figsize=(5,2))
sns.countplot(data, x='Contract',color='green')
plt.title("Contract Type Distribution")
plt.show()


**Insights:**

**Most customers are on Month-to-Month contracts** ‚Äì This is the largest category, with over 3,000 customers.

**Fewer customers have One-Year and Two-Year contracts** ‚Äì These types have a relatively small proportion.

**Opportunity:** The company might focus on converting Month-to-Month customers into longer-term contracts to improve retention and reduce churn.

**What is the distribution of Internet Service types?**

In [None]:
plt.figure(figsize=(5,2))
sns.countplot(data, x='InternetService',color='orange')
plt.title("Internet Service Distribution")
plt.show()

**Insights:**

**Most customers use Fiber Optic** ‚Äì The highest number of customers have chosen Fiber Optic internet.

**DSL is also popular** ‚Äì DSL is the second most common service type among customers.

**Few customers do not have internet** ‚Äì A relatively small number of customers have no internet service.

**How is the tenure distributed among customers?**

In [None]:
plt.figure(figsize=(4,3))
sns.histplot(data['tenure'], bins=30, kde=True,color='cyan')
plt.title("Tenure Distribution")
plt.show()


**Insights:**

**High churn among new customers** ‚Äì There is a sharp peak in the 0‚Äì5 months range, suggesting that many customers leave early.

**Stable retention among long-term customers** ‚Äì After the initial churn, customer retention appears more consistent, with more stable counts in the 20‚Äì60 month range.

**Potential focus on early retention strategies** ‚Äì The company may benefit from implementing retention programs targeted at new customers to reduce early churn.

**What is the distribution of Monthly Charges?**

In [None]:
plt.figure(figsize=(4,3))
sns.histplot(data['MonthlyCharges'], bins=30, kde=True, color="orange")
plt.title("Monthly Charges Distribution")
plt.show()


**Insights:**

**Most customers are charged around 20 per month** ‚Äì There is a significant peak at the $20 mark, indicating a large number of people use low-cost plans.

**The distribution is slightly right-skewed** ‚Äì Although most customers are in the lower monthly charge range, there are customers with higher charges as well.

**Opportunities for segmentation** ‚Äì The company could consider segmenting customers by pricing tiers for tailored retention strategies.

**What is the distribution of Total Charges?**

In [None]:
plt.figure(figsize=(4,3))
sns.histplot(data['Total_Charges'], bins=30, kde=True, color="green")
plt.title("Total Charges Distribution")
plt.show()


**Insights**

**High concentration of low total charges** ‚Äì Most customers have low total charges, as indicated by the peak near 0 to 1,000.

**Right-skewed distribution** ‚Äì The total charges distribution is skewed right, meaning a few customers have significantly higher charges, but most customers fall into lower total charges.

**Potential for tiered pricing strategies** ‚Äì The company could consider focusing on higher charge tiers for upselling or retention offers to high-value customers.

**How does churn vary across different contract types?**

In [None]:
plt.figure(figsize=(4,3))
sns.countplot(data, x='Contract', hue='Churn',palette='viridis')
plt.title("Churn vs Contract Type")
plt.show()


**Insights:**

**High churn for Month-to-Month contracts** ‚Äì Customers with Month-to-Month contracts have the highest churn rate, indicating that customers with flexible, short-term contracts are more likely to leave.

**Lower churn for One-Year and Two-Year contracts** ‚Äì Customers with One-Year and Two-Year contracts show significantly lower churn, suggesting that longer commitment contracts help in retention.

**Opportunity for retention strategies** ‚Äì Focusing on converting Month-to-Month customers to longer-term contracts could reduce churn.

**How does churn differ among Internet Service types?**

In [None]:
plt.figure(figsize=(4,3))
sns.countplot(data, x='InternetService', hue='Churn',palette='rainbow')
plt.title("Churn vs Internet Service")
plt.show()


**Insights:**

**Higher churn for Fiber Optic users** ‚Äì Customers using Fiber Optic internet have the highest churn rate, suggesting potential dissatisfaction with this service.

**DSL users show lower churn** ‚Äì Customers using DSL internet service exhibit lower churn, possibly due to greater satisfaction or lower expectations.

**Opportunity to improve Fiber Optic service** ‚Äì The company could explore customer feedback and issues related to Fiber Optic service to reduce churn in this segment.

**How does churn vary across different payment methods?**

In [None]:
plt.figure(figsize=(10,3))
sns.countplot(data, x='PaymentMethod', hue='Churn',palette='rainbow')
plt.title("Churn vs Payment Method")
plt.xticks(rotation=45)
plt.show()


**Insights:**

**Highest churn with Electronic Check** ‚Äì Customers who use Electronic Check as a payment method have the highest churn rate.

**Lower churn with Automatic Bank Transfers and Credit Cards** ‚Äì Customers using Bank Transfer (automatic) or Credit Card (automatic) methods show significantly lower churn.

**Opportunity for retention** ‚Äì The company could focus on Electronic Check users by offering incentives or improving the payment process to reduce churn.

**Do Monthly Charges differ between churned and non-churned customers?**

In [None]:
plt.figure(figsize=(4,3))
sns.boxplot(data, x='Churn', y='MonthlyCharges',palette='viridis')
plt.title("Monthly Charges vs Churn")
plt.show()


**Insights:**

**Higher Monthly Charges for Churned Customers**‚Äì Customers who have churned tend to have higher monthly charges compared to those who stayed.

**Wider range of charges for churned customers** ‚Äì The box plot shows a larger spread for churned customers, indicating variability in monthly charges for those who left.

**Opportunity for targeted retention strategies**  ‚Äì The company could explore offering lower-cost plans or incentives to high-paying customers to reduce churn.

**Are customers with dependents less likely to churn?**

In [None]:
plt.figure(figsize=(4,3))
sns.countplot(data, x='Dependents', hue='Churn',palette='viridis')
plt.title("Churn vs Dependents")
plt.show()


**Insights:**

**Higher churn among customers without dependents** ‚Äì Customers without dependents show higher churn compared to those with dependents.

**Lower churn for customers with dependents** ‚Äì Customers with dependents tend to have a lower churn rate, possibly due to family-related commitments.

**Opportunity for targeted retention** ‚Äì The company could focus more on retaining non-dependent customers with personalized offers or services.

**Are customers without tech support more likely to churn?**

In [None]:
plt.figure(figsize=(4,3))
sns.countplot(data, x='TechSupport', hue='Churn',palette='viridis')
plt.title("Churn vs Tech Support")
plt.show()


**Insights:**

**Higher churn for customers without Tech Support** ‚Äì Customers who do not have Tech Support are more likely to churn.

**Lower churn for customers with Tech Support** ‚Äì Customers with Tech Support tend to stay longer, showing a lower churn rate.

**Opportunity for improving Tech Support** ‚Äì Offering or improving Tech Support could be an effective strategy to reduce churn, especially among customers who currently do not have it.


**How do Internet Service, Monthly Charges, and Churn interact?**

In [None]:
plt.figure(figsize=(5,3))
sns.boxplot(data, x='InternetService', y='MonthlyCharges', hue='Churn',palette='viridis')
plt.title("Internet Service vs Monthly Charges vs Churn")
plt.show()


**Insights:**

**Fiber Optic customers with higher monthly charges have higher churn** ‚Äì Customers with Fiber Optic service and higher monthly charges tend to churn more, indicating a potential dissatisfaction with the service or pricing.

**DSL users show lower churn** ‚Äì DSL customers tend to have lower monthly charges and exhibit a lower churn rate, suggesting satisfaction with the service.

**No internet service customers** ‚Äì Customers without internet service have minimal churn, but their monthly charges are very low.



**What percentage of customers have churned vs stayed?**

In [None]:
# Pie Chart ‚Äì Churn
plt.figure(figsize=(3,3))
data['Churn'].value_counts().plot.pie(autopct='%1.1f%%', colors=['skyblue','salmon'], startangle=90)
plt.title("Churn Distribution ‚Äì Pie Chart")
plt.ylabel("")
plt.show()


**Insights:**

**73.5% of customers have not churned** ‚Äì A majority of customers are staying with the service.

**26.5% of customers have churned** ‚Äì A significant portion of customers have switched to other providers.

**Focus on retention** ‚Äì The company should work on reducing the 26.5% churn rate to improve customer loyalty and revenue stability.

**What is the proportion of customers subscribed to each Internet service type (DSL, Fiber Optic, or No Internet service)?**

In [None]:
plt.figure(figsize=(3,3))
data['InternetService'].value_counts().plot.pie(
    autopct='%1.1f%%',
    startangle=90,
    colors=['gold','lightgreen','lightcoral']
)
plt.title("Internet Service Distribution ‚Äì Pie Chart")
plt.ylabel("")
plt.show()


**Insights:**

**Fiber Optic is the most popular service** ‚Äì 44% of customers are using Fiber Optic internet.

**DSL is the second most popular service** ‚Äì 34.4% of customers are using DSL internet.

**21.7% of customers have no internet service** ‚Äì A smaller portion of customers have no internet service.

**Correlation Heatmap:**

Visualizes relationships between numerical features

In [None]:
plt.figure(figsize=(10, 8))
numeric_data = data.select_dtypes(include=[np.number])
sns.heatmap(numeric_data.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap of Numerical Features')
plt.show()

**Insight:**

- You'll likely notice a very strong correlation between tenure and TotalCharges, which is expected but helps validate the data quality.
- MonthlyCharges has a positive correlation (0.65), suggesting that higher bills are a driver for customers leaving.

**Demographics Analysis (Senior Citizens & Partners)**

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(15, 5))
sns.countplot(x='SeniorCitizen', hue='Churn', data=data, ax=axes[0], palette='pastel')
axes[0].set_title('Churn Distribution: Senior Citizens (1) vs Non-Seniors (0)')
sns.countplot(x='Partner', hue='Churn', data=data, ax=axes[1], palette='pastel')
axes[1].set_title('Churn Distribution: With Partner vs Without')
plt.show()

**Insight**

- This demographic shows a higher percentage of churn is from seniors compared to younger users. This suggests a need for senior-specific retention plans.
- With respect to Partners: Customers without a partner churn at 32.9% approx or a significant percentage, while those with a partner churn at only 19.6% approx or very small percentage. Single-person accounts are higher risk.

**Service Count Analysis (Feature Engineering)**

Count how many value-added services a customer has

In [None]:
services = ['OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies']
data['ServiceCount'] = data[services].apply(lambda x: x == 'Yes').sum(axis=1)

plt.figure(figsize=(10, 6))
sns.countplot(x='ServiceCount', hue='Churn', data=data, palette='rocket')
plt.title('Number of Additional Services vs Churn')
plt.xlabel('Number of Services Subscribed')
plt.show()

**Insight**

Typically, customers with 3 or more additional services have significantly lower churn rates.

Scatter Plot: MonthlyCharges vs TotalCharges

In [None]:
plt.figure(figsize=(10, 6))
sns.scatterplot(x='MonthlyCharges', y='Total_Charges', hue='Churn', data=data, alpha=0.4)
plt.title('Monthly Charges vs Total Charges (Churn Clusters)')
plt.show()

In [None]:
plt.figure(figsize=(8, 5))
sns.pointplot(x='PaperlessBilling', y='Churn', data=data, color='darkred')
plt.title('The Impact of Paperless Billing on Churn Probability')
plt.ylabel('Churn Probability')
plt.show()

**Insight**

Not having paperless billing services can lead to higher churn rate

In [None]:
data

In [None]:
# 1. Convert Churn to a numeric column (1 for Yes, 0 for No)
data['Churn_Num'] = data['Churn'].map({'Yes': 1, 'No': 0})

# 2. Create Tenure Groups (prevents the heatmap from being 72 rows tall)
def get_tenure_group(t):
    if t <= 12: return '0-1 Year'
    elif t <= 24: return '1-2 Years'
    elif t <= 36: return '2-3 Years'
    elif t <= 48: return '3-4 Years'
    elif t <= 60: return '4-5 Years'
    else: return '5+ Years'

data['Tenure_Group'] = data['tenure'].apply(get_tenure_group)
order = ['0-1 Year', '1-2 Years', '2-3 Years', '3-4 Years', '4-5 Years', '5+ Years']

# 3. Create the Pivot Table using 'mean' to get the Churn Rate
tenure_pivot = data.pivot_table(index='Tenure_Group', columns='Contract', values='Churn_Num', aggfunc='mean')
tenure_pivot = tenure_pivot.reindex(order) # Ensure years are in correct order

# 4. Plot the Heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(tenure_pivot, annot=True, cmap='YlOrRd', fmt='.2%')
plt.title('Churn Risk Matrix: Tenure vs Contract Type')
plt.xlabel('Contract Type')
plt.ylabel('Tenure Group')
plt.show()

**Insight:**

- This graph is the most critical for business strategy as it identifies exactly when and where customers leave.
- The "Danger Zone" (51.4% Churn): Customers in their first year (0-1 Year) on a Month-to-month contract have a churn rate of over 50%. This is the highest risk segment in the entire company.
- The Power of Commitment: Moving a customer from a Month-to-month to a One-year contract in their first year drops the churn risk from 51.4% to 10.5%.
- Zero-Churn Lock-in: Customers on Two-year contracts show 0.0% churn during their first two years. This suggests that long-term contracts are the most effective retention tool, likely due to exit fees or higher initial satisfaction.
- Loyalty Decay: Even for long-term customers (5+ Years), if they remain on a Month-to-month plan, they still churn at 22.2%, whereas their peers on Two-year contracts churn at only 3.1%.