# Notebook 1: ETL Pipeline

## ETL Pipeline for Credit Card Churn Analysis

- **Extract:** Load the credit card customer data from the dataset.
- **Transform:** Clean the data by removing duplicates, checking for and handling missing values, standardizing categorical variables, encoding categories into numeric form, and creating useful features like credit utilization ratio.
- **Load:** Save the cleaned and processed dataset for further analysis and predictive modeling to identify customers likely to churn.

This pipeline ensures your data is reliable and ready to build effective churn prediction models.

### Column Descriptions

| Column                       | Description                                                  |
| ---------------------------- | ------------------------------------------------------------ |
| **CLIENTNUM**                | Unique customer ID number                                    |
| **Attrition_Flag**           | Whether customer left the service or stayed (TARGET VARIABLE)|
| **Customer_Age**             | Age of the customer in years                                 |
| **Gender**                   | Male or Female                                               |
| **Dependent_count**          | Number of dependents (family members)                        |
| **Education_Level**          | Customer's education level                                   |
| **Marital_Status**           | Single, Married, or Divorced                                 |
| **Income_Category**          | Annual income range category                                 |
| **Card_Category**            | Type of credit card (Blue, Silver, Gold, Platinum)           |
| **Months_on_book**           | How long customer has been with the company                  |
| **Total_Relationship_Count** | Number of products customer has with the company             |
| **Months_Inactive_12_mon**   | Number of months customer was inactive in last 12 months     |
| **Contacts_Count_12_mon**    | Number of times customer contacted company in last 12 months |
| **Credit_Limit**             | Maximum credit limit allowed                                 |
| **Total_Revolving_Bal**      | Outstanding balance on the card                              |
| **Avg_Open_To_Buy**          | Average available credit remaining                           |
| **Total_Amt_Chng_Q4_Q1**     | Change in transaction amount from Q4 to Q1                   |
| **Total_Trans_Amt**          | Total transaction amount in last 12 months                   |
| **Total_Trans_Ct**           | Total transaction count in last 12 months                    |
| **Total_Ct_Chng_Q4_Q1**      | Change in transaction count from Q4 to Q1                    |
| **Avg_Utilization_Ratio**    | How much of credit limit is being used                       |
| **NB_Stay_Probabilty**       | Probability customer will stay                               |
| **NB_Churn_Probality**       | Probability customer will leave                              |

# 📥 Step 1: Extract

In this step, we will load (extract) the raw dataset into our working environment for further processing.

---

### ✅ 1️⃣ Import Required Libraries

We start by importing the necessary libraries for data handling.


In [8]:
import pandas as pd
import numpy as np

In [12]:
# Extract: Load the data
df = pd.read_csv("../data/raw_data/BankChurners.csv")
