## **4. Feature Engineering**

### **4.1 Overview**
This notebook demonstrates feature engineering for customer segmentation using RFM (Recency, Frequency, Monetary) analysis. The code has been modularized into the `src/feature_engineering.py` module for production use.

**Key Features Created:**
1. **RFM Metrics**: Recency, Frequency, Monetary values
2. **Behavioral Metrics**: TotalItems, UniqueProducts, AvgOrderValue, ItemsPerOrder
3. **Customer-Level Aggregation**: Transaction data â†’ Customer features

**Production Usage:**
```python
from src.feature_engineering import create_customer_features
customer_features = create_customer_features(df_processed)

### **4.2 Load Processed Data**

In [None]:
# Import libraries and load processed data
import pandas as pd
import sys
import os

# Add src directory to path for importing our module
sys.path.append(os.path.join(os.path.dirname(os.getcwd()), 'src'))

# Load processed data
df = pd.read_csv('../data/processed/Online_Retail_Cleaned.csv')
print(f"Processed data shape: {df.shape}")
df.head()

### **4.3 Calculate Total Price**

In [None]:
# Calculate total price for each transaction
from feature_engineering import calculate_total_price, prepare_date_column

# Prepare date column and calculate total price
df = prepare_date_column(df)
df = calculate_total_price(df)

print("Sample data with TotalPrice:")
print(df[['InvoiceDate', 'UnitPrice', 'Quantity', 'TotalPrice']].head())

# Note: This functionality is now available in src/feature_engineering.py
# Functions: prepare_date_column(df), calculate_total_price(df)

### **4.4 RFM Feature Calculation**

In [None]:
# Calculate RFM features using our modular function
from feature_engineering import calculate_rfm_features

customer_data = calculate_rfm_features(df)
print(f"Customer data shape: {customer_data.shape}")
print("\nRFM Features:")
print(customer_data[['CustomerID', 'Recency', 'Frequency', 'Monetary']].head())

# Note: This functionality is now available in src/feature_engineering.py
# Function: calculate_rfm_features(df)

### **4.5 Additional Behavioral Features**

In [None]:
# Calculate additional features using our modular function
from feature_engineering import calculate_additional_features

customer_data = calculate_additional_features(customer_data)
print("Additional Features:")
print(customer_data[['CustomerID', 'AvgOrderValue', 'ItemsPerOrder']].head())

# Note: This functionality is now available in src/feature_engineering.py
# Function: calculate_additional_features(df)

### **4.6 Feature Scaling**

In [None]:
# Scale features for clustering using our modular function
from feature_engineering import scale_features, get_default_feature_columns

# Get default feature columns for scaling
feature_columns = get_default_feature_columns()
print(f"Features to scale: {feature_columns}")

# Scale features
scaled_features, scaler = scale_features(customer_data, feature_columns)
print(f"\nScaled features shape: {scaled_features.shape}")
print("Sample scaled features:")
print(scaled_features.head())

# Note: This functionality is now available in src/feature_engineering.py
# Functions: scale_features(df, feature_columns), get_default_feature_columns()

### **4.7 Complete Feature Engineering Pipeline**

In [None]:
# Run the complete feature engineering pipeline using our modular function
from feature_engineering import create_customer_features, save_customer_features

# This single function performs all the steps above:
# 1. Prepare date column
# 2. Calculate total price
# 3. Calculate RFM features
# 4. Calculate additional features

customer_features = create_customer_features(df)
print(f"Final customer features shape: {customer_features.shape}")
print("\nFeature columns:")
print(customer_features.columns.tolist())

# Display feature summary
print("\nFeature Summary:")
print(customer_features.describe())

# Save customer features
save_customer_features(customer_features, '../data/processed/Customer_RFM_Features.csv')
print("\nCustomer features saved to: ../data/processed/Customer_RFM_Features.csv")