# Customer Churn Analysis using Python (EDA Project)

### Dataset Overview  
- **Rows:** 3738  
- **Columns:** 21  
- **Churn Distribution:** {'No': 1869, 'Yes': 1869}  

This project analyzes telecom customer churn to understand which factors influence whether a customer leaves the service. The analysis includes data cleaning, exploratory data analysis (EDA), visualizations, insights, and a final conclusion.


## Project Objectives
- Understand customer demographics, subscription details, and service usage.
- Identify the strongest factors contributing to churn.
- Visualize relationships between features and churn.
- Provide insights for customer retention strategies.


## Data Dictionary
- **customerID**: Unique customer identifier
- **gender**: Customer gender (Male/Female)
- **SeniorCitizen**: Whether the customer is a senior citizen (Yes/No)
- **Partner**: Whether the customer has a partner
- **Dependents**: Whether the customer has dependents
- **tenure**: Number of months the customer has stayed with the company
- **PhoneService**: Has phone service (Yes/No)
- **MultipleLines**: Multiple phone lines (Yes/No/No phone service)
- **InternetService**: DSL/Fiber optic/No internet
- **OnlineSecurity**: Security service addon
- **OnlineBackup**: Backup addon
- **DeviceProtection**: Device protection addon
- **TechSupport**: Technical support addon
- **StreamingTV**: Streaming TV service
- **StreamingMovies**: Streaming Movies service
- **Contract**: Contract type (Month-to-month/One year/Two year)
- **PaperlessBilling**: Paperless billing (Yes/No)
- **PaymentMethod**: Payment method
- **MonthlyCharges**: Monthly fee
- **TotalCharges**: Total charges paid
- **Churn**: Whether customer churned (Yes/No)


In [None]:
# Load dataset
import pandas as pd
df = pd.read_excel("Customer Churn.xlsx")
df.head()

In [None]:
# Dataset Overview
print(df.info())
df.describe(include='all')

In [None]:
# Convert TotalCharges to numeric if needed
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df.isnull().sum()

In [None]:
# Drop rows with missing TotalCharges
df.dropna(inplace=True)
df.shape

In [None]:
df['Churn'].value_counts(normalize=True)*100

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Contract vs Churn
plt.figure(figsize=(8,4))
sns.countplot(data=df, x='Contract', hue='Churn')
plt.title('Contract Type vs Churn')
plt.show()

# Monthly Charges vs Churn
plt.figure(figsize=(8,4))
sns.boxplot(data=df, x='Churn', y='MonthlyCharges')
plt.title('Monthly Charges vs Churn')
plt.show()

# Tenure vs Churn
plt.figure(figsize=(8,4))
sns.histplot(data=df, x='tenure', hue='Churn', bins=30, multiple='stack')
plt.title('Tenure Distribution by Churn')
plt.show()

# Correlation heatmap
num_cols = df.select_dtypes(include=['float64','int64']).columns
plt.figure(figsize=(8,6))
sns.heatmap(df[num_cols].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()


## Insights
- Month-to-month contract customers show significantly higher churn.
- Higher monthly charges are associated with increased churn likelihood.
- Customers with long tenure churn far less.
- Electronic check payment users tend to churn more.

## Conclusion
Customer churn is strongly influenced by contract type, billing amount, payment method, and tenure.  
Businesses should incentivize long-term contracts, reduce friction in billing, and offer retention programs to high-risk customers.
