# Phase 3: Feature Engineering for Business Signals

## 1. Business Problem Statement
Raw transaction data does not directly capture customer value, demand patterns,
or operational risk. To enable forecasting, segmentation, churn prediction, and
profit optimization, we must engineer features that represent meaningful business
behavior rather than individual transactions.

## 2. Why This Matters to the Business
Well-designed features allow the business to anticipate customer needs, forecast
demand accurately, identify high-value customers, and make proactive decisions.
Poor feature design leads to fragile models and misleading insights.

In [3]:
import sys
import os

PROJECT_ROOT = os.path.abspath("..")
if PROJECT_ROOT not in sys.path:
    sys.path.append(PROJECT_ROOT)


In [4]:
import pandas as pd

df = pd.read_csv("../data/processed/cleaned_data.csv")
df.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2016-152156,2016-11-08,2016-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136
1,2,CA-2016-152156,2016-11-08,2016-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3,0.0,219.582
2,3,CA-2016-138688,2016-06-12,2016-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,2,0.0,6.8714
3,4,US-2015-108966,2015-10-11,2015-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,5,0.45,-383.031
4,5,US-2015-108966,2015-10-11,2015-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,2,0.2,2.5164


In [5]:
from src.feature_engineering import (
    add_time_features,
    add_profit_discount_features,
    create_customer_aggregates,
    create_rfm_features,
    build_feature_dataset
)

In [6]:
df = build_feature_dataset(df)
df.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Sales,Quantity,Discount,Profit,order_year,order_month,order_quarter,order_dayofweek,profit_margin,is_discounted
0,1,CA-2016-152156,2016-11-08,2016-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,261.96,2,0.0,41.9136,2016,11,4,1,0.16,0
1,2,CA-2016-152156,2016-11-08,2016-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,731.94,3,0.0,219.582,2016,11,4,1,0.3,0
2,3,CA-2016-138688,2016-06-12,2016-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,14.62,2,0.0,6.8714,2016,6,2,6,0.47,0
3,4,US-2015-108966,2015-10-11,2015-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,957.5775,5,0.45,-383.031,2015,10,4,6,-0.4,1
4,5,US-2015-108966,2015-10-11,2015-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,22.368,2,0.2,2.5164,2015,10,4,6,0.1125,1


## FEATURE ENGINEERING THEMES

In [7]:
customer_agg_df = create_customer_aggregates(df)
customer_agg_df.head()

Unnamed: 0,Customer ID,total_sales,total_profit,avg_discount,num_orders
0,AA-10315,5563.56,-362.8825,0.090909,5
1,AA-10375,1056.39,277.3824,0.08,9
2,AA-10480,1790.512,435.8274,0.016667,4
3,AA-10645,5086.935,857.8033,0.063889,6
4,AB-10015,886.156,129.3465,0.066667,3


In [8]:
rfm_df = create_rfm_features(df)
rfm_df.head()

Unnamed: 0,Customer ID,Recency,Frequency,Monetary
0,AA-10315,185,5,5563.56
1,AA-10375,20,9,1056.39
2,AA-10480,260,4,1790.512
3,AA-10645,56,6,5086.935
4,AB-10015,416,3,886.156


In [9]:
df.to_csv("../data/processed/featured_data.csv", index=False)
customer_agg_df.to_csv("../data/processed/customer_rfm.csv", index=False)

## 3. Executive Summary

This phase converted raw transactional data into business-level signals capturing
seasonality, customer value, profitability, and purchasing behavior. These features
serve as the foundation for forecasting, segmentation, churn prediction, and pricing
optimization models in subsequent phases.
