# **Task Overview**
> **Objective:** Build a machine learning model to predict customer churn using historical data.

### **Deliverables:**

• Exploratory Data Analysis (EDA)

• Feature engineering

• Train/test split and model selection (Logistic Regression, XGBoost, etc.)

• Performance metrics (confusion matrix, AUC-ROC)

• Final report with visualizations


### **Mock Data (Python):**

In [1]:
import pandas as pd

import numpy as np



np.random.seed(42)

n = 10000

data = pd.DataFrame({

  'CustomerID': np.arange(n),

  'Gender': np.random.choice(['Male', 'Female'], size=n),

  'SeniorCitizen': np.random.choice([0, 1], size=n),

  'Tenure': np.random.randint(1, 72, size=n),

  'MonthlyCharges': np.round(np.random.uniform(20, 120, size=n), 2),

  'TotalCharges': lambda df: df['Tenure'] * df['MonthlyCharges'],

  'Contract': np.random.choice(['Month-to-month', 'One year', 'Two year'], size=n),

  'PaymentMethod': np.random.choice(['Electronic check', 'Mailed check', 'Bank transfer', 'Credit card'], size=n),

  'Churn': np.random.choice([0, 1], size=n, p=[0.73, 0.27])

})

data['TotalCharges'] = (data['Tenure'] * data['MonthlyCharges']).round(2)

In [None]:
# Exploratory Data Analysis (EDA)

# Display basic information about the dataset
print("Dataset shape:", data.shape)

# Display descriptive statistics
print("\nDescriptive statistics:")
display(data.describe())

# Check for missing values
print("\nMissing values in each column:")
print(data.isnull().sum())

# Explore the target variable distribution
print("\nChurn distribution:")
churn_counts = data['Churn'].value_counts(normalize=True) * 100
print(churn_counts.to_frame().rename(columns={'Churn': 'Percentage (%)'}))

# Visualize the churn distribution
import matplotlib.pyplot as plt
plt.figure(figsize=(8, 5))
plt.pie(churn_counts, labels=['No Churn (0)', 'Churn (1)'], autopct='%1.1f%%', startangle=90)
plt.title('Customer Churn Distribution')
plt.axis('equal')
plt.show()