# Customer Churn Prediction: Exploratory Data Analysis

This notebook performs exploratory data analysis on our preprocessed customer dataset. We'll visualize various aspects of the data to gain insights into customer behavior and potential predictors of churn.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better-looking plots
plt.style.use('seaborn')
sns.set_palette("deep")

# Load the preprocessed data
df = pd.read_csv('../data/processed/processed_customer_data.csv')

print(df.head())
print(df.info())

## 1. Distribution of Target Variable (Churn)

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(x='Churn', data=df)
plt.title('Distribution of Churn')
plt.show()

churn_rate = df['Churn'].value_counts(normalize=True)
print(f"Churn Rate: {churn_rate['Yes']:.2%}")

## 2. Age Distribution

In [None]:
plt.figure(figsize=(12, 6))
sns.histplot(data=df, x='Age', hue='Churn', multiple='stack', bins=30)
plt.title('Age Distribution by Churn Status')
plt.show()

## 3. Tenure Analysis

In [None]:
plt.figure(figsize=(12, 6))
sns.boxplot(x='Churn', y='Tenure', data=df)
plt.title('Customer Tenure by Churn Status')
plt.show()

## 4. Total Purchases vs Churn

In [None]:
plt.figure(figsize=(12, 6))
sns.boxplot(x='Churn', y='TotalPurchases', data=df)
plt.title('Total Purchases by Churn Status')
plt.show()

## 5. Login Frequency vs Churn

In [None]:
plt.figure(figsize=(12, 6))
sns.countplot(x='LoginFrequency', hue='Churn', data=df)
plt.title('Login Frequency by Churn Status')
plt.show()

## 6. Correlation Heatmap

In [None]:
plt.figure(figsize=(16, 12))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()

## 7. Support Interactions vs Churn

In [None]:
plt.figure(figsize=(12, 6))
sns.boxplot(x='Churn', y='SupportInteractions', data=df)
plt.title('Support Interactions by Churn Status')
plt.show()

## 8. Loyalty Score Analysis

In [None]:
plt.figure(figsize=(12, 6))
sns.boxplot(x='Churn', y='LoyaltyScore', data=df)
plt.title('Loyalty Score by Churn Status')
plt.show()

## 9. Summary of EDA Findings

1. Churn Rate: [Fill in after running the notebook]
2. Age Distribution: [Observations about age and churn]
3. Tenure: [Relationship between tenure and churn]
4. Total Purchases: [How total purchases relate to churn]
5. Login Frequency: [Impact of login frequency on churn]
6. Correlations: [Key correlations observed]
7. Support Interactions: [Relationship between support interactions and churn]
8. Loyalty Score: [How loyalty score relates to churn]

These insights will guide our feature selection and model building process in the next notebook.