# Customer Churn Prediction â€“ Exploratory Data Analysis (EDA)

## Objective
The goal of this notebook is to explore the Telco Customer Churn dataset and identify key patterns and factors that influence customer churn.  
This analysis will guide feature engineering and model selection in later stages.



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Visualization settings
sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (8, 5)

df = pd.read_csv("../data/raw/Telco-Customer-Churn.csv")
df.head()


In [None]:
df.info()
df.describe()
df.shape

In [None]:
df.isnull().sum()

## Target Variable Analysis: Churn

Understanding the distribution of churned vs non-churned customers helps identify class imbalance.


In [None]:
sns.countplot(x="Churn", data=df)
plt.title("Churn Distribution")
plt.show()


### Observation
- The dataset is slightly imbalanced.
- Majority of customers did not churn.
- This imbalance should be considered during model evaluation.


## Contract Type vs Churn

Contract duration is expected to strongly influence customer churn.


In [None]:
sns.countplot(x="Contract", hue="Churn", data=df)
plt.title("Churn by Contract Type")
plt.xticks(rotation=30)
plt.show()


### Observation
- Customers on month-to-month contracts show significantly higher churn.
- Long-term contracts are associated with better retention.


In [None]:
sns.boxplot(x='Churn', y='tenure', data=df)
plt.title('Tenure by churn status')
plt.show()

### Observation
- Customers with shorter tenure are more likely to churn.
- Long-standing customers are more stable.


## Monthly Charges vs Churn

Higher monthly charges may increase the likelihood of churn.


In [None]:
sns.boxplot(x="Churn", y="MonthlyCharges", data=df)
plt.title("Monthly Charges by Churn Status")
plt.show()

### Observation
- Churned customers tend to have higher monthly charges.
- Pricing plays a significant role in customer retention.


## Internet Service vs Churn


In [None]:
sns.boxplot(x="InternetServce", y="Churn", data=df)
plt.title("Churn by Internet Service Type")
plt.show()