# Notebook 2: Feature Engineering

## Objectives
- Load data and apply preprocessing
- Create new features based on business logic
- Analyze feature importance and correlations
- Prepare dataset for model training

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import os

sys.path.append(os.path.abspath(os.path.join('../src')))

from data_preprocessing import load_data, clean_data
from feature_engineering import engineer_all_features

pd.set_option('display.max_columns', None)

## 1. Load and Clean Data

In [None]:
filepath = '../data/WA_Fn-UseC_-Telco-Customer-Churn.csv'
if os.path.exists(filepath):
    df = load_data(filepath)
    df = clean_data(df)
else:
    print("Data file not found.")

## 2. Apply Feature Engineering
We will apply the transformations defined in `src/feature_engineering.py`:
- Tenure Groups
- CLV (Customer Lifetime Value)
- Service Count
- Interaction Features

In [None]:
if 'df' in locals():
    df_fe = engineer_all_features(df)
    print("New features created. Columns:", df_fe.columns[-4:].tolist())
    df_fe.head()

## 3. Visualize New Features

In [None]:
if 'df_fe' in locals():
    plt.figure(figsize=(12, 5))
    sns.countplot(data=df_fe, x='tenure_group', hue='Churn', order=['0-12', '12-24', '24-48', '48-60', '60+'])
    plt.title('Churn by Tenure Group')
    plt.show()

**Observation**: Churn is highest in the 0-12 month tenure group, confirming early churn risk.

In [None]:
if 'df_fe' in locals():
    plt.figure(figsize=(8, 5))
    sns.boxplot(data=df_fe, x='Churn', y='CLV')
    plt.title('Customer Lifetime Value (CLV) by Churn')
    plt.show()

## 4. Feature Selection Check
Let's see the correlation of new numerical features with Churn.

In [None]:
if 'df_fe' in locals():
    new_cols = ['CLV', 'ServiceCount', 'MonthlyCharges_per_Tenure']
    # Append Churn for correlation
    corr_cols = new_cols + ['Churn']
    sns.heatmap(df_fe[corr_cols].corr(), annot=True, cmap='coolwarm', fmt='.2f')
    plt.title('Correlation of New Features with Churn')
    plt.show()

## 5. Summary
The new features provide additional signals. `tenure_group` simplifies the non-linear relationship of tenure. `CLV` and `ServiceCount` act as proxies for customer engagement.