# Exploratory Data Analysis (EDA) - E-commerce Churn Prediction

## Objective
Analyze customer behavior to identify patterns related to churn.

## 1. Data Loading and Inspection

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")

df = pd.read_csv('../datos/data_ecommerce_customer_churn.csv')
df.info()

## 2. Data Cleaning
Handling missing values by imputing with mode (categorical) or median (numerical).

In [None]:
missing_cols = df.columns[df.isnull().any()].tolist()
for col in missing_cols:
    if df[col].dtype == 'object':
        df[col] = df[col].fillna(df[col].mode()[0])
    else:
        df[col] = df[col].fillna(df[col].median())

print("Missing values after imputation:", df.isnull().sum().sum())

## 3. Univariate Analysis
### Churn Distribution

In [None]:
plt.figure(figsize=(6, 4))
sns.countplot(x='Churn', data=df)
plt.title('Churn Distribution')
plt.show()

print(df['Churn'].value_counts(normalize=True))

**Insight:** The dataset has a churn rate of approximately 17%.

## 4. Bivariate Analysis
### Correlation Matrix

In [None]:
plt.figure(figsize=(12, 8))
numeric_df = df.select_dtypes(include=[np.number])
sns.heatmap(numeric_df.corr(), annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Matrix')
plt.show()

### Key Correlations with Churn
- **Tenure (-0.35)**: Longer tenure is associated with lower churn.
- **Complain (0.26)**: Customers who complained are more likely to churn.
- **DaySinceLastOrder (-0.16)**: Recent activity might correlate with churn (needs investigation, possibly counter-intuitive or related to specific churn definition).
- **CashbackAmount (-0.16)**: Higher cashback is associated with lower churn.

## 5. Conclusion
Key drivers for churn appear to be customer tenure and complaints. Strategies should focus on resolving complaints effectively and incentivizing loyalty early in the customer lifecycle.