# GabeDA Features (Product-Month Performance)

This notebook processes product lifecycle metrics at the monthly granularity.
It aggregates product performance by month from transaction-level data.

**Input:** Preprocessed transactions from 01_transactions notebook  
**Output:** Product-month performance metrics (1 row per product per month)  
**Group By:** `product_id`, `dt_year`, `dt_month`

## 1. Setup: Imports, Context Loading, Logging

## 0. Project Root Setup (Auto-generated)

In [1]:
# Auto-detect project root and add to Python path
import os
import sys
from pathlib import Path

# Get the project root (2 levels up from notebooks/development or notebooks/from_store)
notebook_dir = Path.cwd() if '__file__' not in globals() else Path(__file__).parent
project_root = notebook_dir.parent.parent

# Change to project root
os.chdir(project_root)

# Add project root to Python path if not already there
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

print(f"Working directory: {os.getcwd()}")
print(f"Project root: {project_root}")

Working directory: c:\Projects\play\khujta_ai_business
Project root: c:\Projects\play\khujta_ai_business


In [2]:
import pandas as pd
import numpy as np

# v2.0 Refactored imports
from src.utils.logger import setup_logging, get_logger
from src.core.context import GabedaContext
from src.core.persistence import load_context_state, get_latest_state, save_context_state
from src.core.constants import *
from src.features.store import FeatureStore
from src.features.resolver import DependencyResolver
from src.features.detector import FeatureTypeDetector
from src.features.analyzer import FeatureAnalyzer
from src.execution.calculator import FeatureCalculator
from src.execution.groupby import GroupByProcessor
from src.execution.executor import ModelExecutor
from src.export.excel import ExcelExporter

# Load latest context state
client_name = 'test_client'
latest_state = get_latest_state(client_name, base_dir='data/context_states')

if latest_state:
    ctx, base_cfg = load_context_state(latest_state)
    print(f"‚úì Loaded latest state: {latest_state}")
else:
    raise FileNotFoundError(f"No context state found for client '{client_name}'")

# Setup logging
setup_logging(log_level=base_cfg.get('log_level', 'INFO'), 
              config={'client': base_cfg.get('client', 'unknown_client')})
logger = get_logger(__name__)

print(f"\n‚úì Context loaded successfully!")
print(f"  - Original run_id: {ctx.original_run_id}")
print(f"  - New run_id: {ctx.run_id}")
print(f"  - Available datasets: {len(ctx.list_datasets())} datasets")

‚úì Loaded latest state: data\context_states\test_client_20251022_150907
üìù Run instance ID: test_client_20251022_151119 - Logging [INFO] to: logs\test_client_20251022_151119.log

‚úì Context loaded successfully!
  - Original run_id: test_client_20251022_151105
  - New run_id: test_client_20251022_151119
  - Available datasets: 17 datasets


## 2. Load Input Data

In [3]:
# Get input dataset
input_df = ctx.get_dataset('transactions_filters')

print(f"‚úì Input dataset loaded")
print(f"  - Shape: {input_df.shape}")
print(f"  - Date range: {input_df['dt_date'].min()} to {input_df['dt_date'].max()}")
print(f"  - Unique products: {input_df['product_id'].nunique()}")
print(f"\nFirst few rows:")
input_df.head()

‚úì Input dataset loaded
  - Shape: (609, 59)
  - Date range: 20251001 to 20251030
  - Unique products: 10

First few rows:


Unnamed: 0,in_dt,in_product_id,in_quantity,in_price_total,in_trans_type,in_customer_id,in_description,in_category,in_unit_type,in_stock,...,cost_unit,cost_total,price_unit,price_total,margin_unit,margin_unit_pct,margin_unit_valid,margin_total,margin_total_pct,margin_total_valid
0,2025-10-01 01:02:00,prod8,2.0,52964.0,return,client13,product 8,category B,pack,61.0,...,18792.0,37585.0,26482.0,52964.0,7690.0,29.04,True,15379.0,29.04,True
1,2025-10-01 06:24:00,prod4,6.0,177195.0,sale,client6,product 4,category B,unit,30.0,...,21526.0,129155.0,29533.0,177195.0,8007.0,27.11,True,48040.0,27.11,True
2,2025-10-01 08:38:00,prod7,2.0,70492.0,return,client12,product 7,category A,unit,78.0,...,25754.0,51509.0,35246.0,70492.0,9492.0,26.93,True,18983.0,26.93,True
3,2025-10-01 09:59:00,prod2,4.0,86751.0,sale,client3,product 2,category A,unit,80.0,...,12947.0,51786.0,21688.0,86751.0,8741.0,40.3,True,34965.0,40.31,True
4,2025-10-01 10:07:00,prod3,3.0,76465.0,sale,client12,product 3,category B,unit,47.0,...,16943.0,5083.0,25488.0,76465.0,8545.0,33.53,True,71382.0,93.35,True


## 3. Define Features

Product lifecycle metrics for monthly performance:  
- Sales metrics (units sold, revenue, velocity)
- Lifecycle stage classification
- Inventory turnover rate
- Cross-sell frequency

In [4]:
# ===== Product Lifecycle Metrics =====
# Based on specs: docs/specs/model/aggr_arch.md - Dataset 2.3

def total_units_sold(quantity):
    """
    Total units of product sold in the month.
    Formula: SUM(quantity)
    """
    return np.sum(quantity)

def total_revenue(price_total):
    """
    Total revenue from product sales in the month.
    Formula: SUM(price_total)
    """
    return np.sum(price_total)

def sales_velocity(quantity, dt_date):
    """
    Average units sold per day in the month.
    Formula: total_units_sold / COUNT(DISTINCT operating_days)
    
    Note: Uses unique dates with sales (excludes days without sales)
    """
    operating_days = len(np.unique(dt_date))
    if operating_days == 0:
        return DEFAULT_FLOAT
    
    total_units = np.sum(quantity)
    return round(total_units / operating_days, 2)

def product_lifecycle_stage(quantity):
    """
    Product lifecycle classification based on sales velocity trend.
    Formula: Categorize based on sales patterns
      - 'intro': Low velocity, recent introduction
      - 'growth': Increasing velocity
      - 'maturity': Stable high velocity
      - 'decline': Decreasing velocity
    
    Returns: 'unknown' (requires historical trend analysis - not implemented in v1)
    
    Note: Proper lifecycle stage detection requires comparing current month
    to previous months to identify trends. Current implementation lacks
    historical comparison capability.
    Future enhancement: Implement trend analysis with window functions.
    """
    return 'unknown'

def inventory_turnover_rate(quantity, stock):
    """
    How many times inventory turned over in the month.
    Formula: total_units_sold / average_inventory_level
    
    Note: Uses current stock level as proxy for average inventory.
    A more accurate calculation would need beginning and ending inventory.
    """
    avg_stock = np.mean(stock)
    
    if avg_stock == 0:
        return DEFAULT_FLOAT
    
    total_sold = np.sum(quantity)
    return round(total_sold / avg_stock, 2)

def cross_sell_frequency(trans_id):
    """
    How often this product is purchased with other products.
    Formula: COUNT(transactions with multiple products) / COUNT(total transactions)
    
    Returns: DEFAULT_FLOAT (requires transaction-level basket analysis - not implemented in v1)
    
    Note: This requires analyzing full transaction baskets to identify when
    this product was purchased alongside other products. Current row-level
    aggregation doesn't have access to full basket composition.
    
    We're already grouped by product_id, so we can't use it as a parameter.
    The proper implementation would need to join with full transaction data
    to see what other products were in the same basket.
    
    Future enhancement: Implement basket analysis with transaction-level joins.
    """
    return DEFAULT_FLOAT

print("‚úì Feature functions defined: 6 attributes")

‚úì Feature functions defined: 6 attributes


## 4. Configure Model

In [5]:
# Collect features into dictionary
features = {
    'total_units_sold': total_units_sold,
    'total_revenue': total_revenue,
    'sales_velocity': sales_velocity,
    'product_lifecycle_stage': product_lifecycle_stage,
    'inventory_turnover_rate': inventory_turnover_rate,
    'cross_sell_frequency': cross_sell_frequency,
}

# Model configuration
cfg_model = {
    'model_name': 'product_month',
    'input_dataset_name': 'transactions_filters',
    'group_by': ['product_id', 'dt_year', 'dt_month'],
    'row_id': 'in_trans_id',
    'output_cols': list(features.keys()),
    'features': features,
}

print(f"‚úì Model configured: '{cfg_model['model_name']}'")
print(f"  - Group by: {cfg_model['group_by']}")
print(f"  - Output features: {len(cfg_model['output_cols'])}")

‚úì Model configured: 'product_month'
  - Group by: ['product_id', 'dt_year', 'dt_month']
  - Output features: 6


## 5. Prepare Features (Store, Resolve Dependencies, Save Config)

In [6]:
# Initialize feature store and store features
feature_store = FeatureStore()
feature_store.store_features(features, model_name=cfg_model['model_name'], auto_save=True)

# Resolve dependencies
resolver = DependencyResolver(feature_store)
in_cols, exec_seq, ext_cols = resolver.resolve_dependencies(
    output_cols=cfg_model['output_cols'],
    available_cols=input_df.columns.tolist(),
    group_by=cfg_model.get('group_by'),
    model=cfg_model['model_name']
)

# Update model config with resolved dependencies
cfg_model['in_cols'] = in_cols
cfg_model['exec_seq'] = exec_seq
cfg_model['ext_cols'] = ext_cols

# Save master configuration
feature_store.save_master_config(
    model_name=cfg_model['model_name'],
    model_config=cfg_model
)

print("‚úì Features prepared and dependencies resolved")
print(f"  - Input columns needed: {len(in_cols)}")
print(f"  - Execution sequence: {exec_seq}")
print(f"  - Master config saved: feature_store/{cfg_model['model_name']}/master_cfg.json")

‚úì Features prepared and dependencies resolved
  - Input columns needed: 5
  - Execution sequence: ['total_units_sold', 'total_revenue', 'sales_velocity', 'product_lifecycle_stage', 'inventory_turnover_rate', 'cross_sell_frequency']
  - Master config saved: feature_store/product_month/master_cfg.json


## 6. Execute Model (Initialize Components + Execute + Store Results)

In [7]:
# Initialize execution components
detector = FeatureTypeDetector()
analyzer = FeatureAnalyzer(feature_store, detector)
calculator = FeatureCalculator()
groupby_processor = GroupByProcessor(calculator, detector)
executor = ModelExecutor(analyzer, groupby_processor, context=ctx)

# Execute model
output = executor.execute_model(
    cfg_model=cfg_model,
    input_dataset_name=cfg_model['input_dataset_name']
)

# Store results in context
ctx.set_model_output(cfg_model['model_name'], output, cfg_model)

print("‚úì Model executed successfully!")
print(f"  - Filters: {output['filters'].shape if output['filters'] is not None else 'None'}")
print(f"  - Attributes: {output['attrs'].shape if output['attrs'] is not None else 'None'}")
print(f"  - Product-months analyzed: {output['attrs'].shape[0] if output['attrs'] is not None else 0}")

‚úì Model executed successfully!
  - Filters: (609, 61)
  - Attributes: (10, 7)
  - Product-months analyzed: 10


## 7. View Results

In [8]:
# View product-month performance (aggregated attributes)
attrs = ctx.get_model_attrs(cfg_model['model_name'])
print(f"Product-Month Performance (n={len(attrs)}):")
attrs.head(10)

Product-Month Performance (n=10):


Unnamed: 0,product_id,dt_year,dt_month,total_units_sold,total_revenue,sales_velocity,inventory_turnover_rate
0,PROD1,2025,10,199,2606098.0,7.37,2.57
1,PROD10,2025,10,279,9887466.0,9.96,3.51
2,PROD2,2025,10,227,4181925.0,8.11,3.27
3,PROD3,2025,10,190,4276070.0,6.79,2.56
4,PROD4,2025,10,192,5075329.0,8.0,2.57
5,PROD5,2025,10,250,3975302.0,9.62,3.36
6,PROD6,2025,10,217,4342449.0,8.35,2.83
7,PROD7,2025,10,248,8161219.0,9.92,3.3
8,PROD8,2025,10,291,7487707.0,10.78,4.05
9,PROD9,2025,10,254,3502255.0,9.41,3.2


In [9]:
# View summary statistics
print("Sales Performance Summary:")
attrs[['total_units_sold', 'total_revenue', 'sales_velocity']].describe()

Sales Performance Summary:


Unnamed: 0,total_units_sold,total_revenue,sales_velocity
count,10.0,10.0,10.0
mean,234.7,5349582.0,8.831
std,35.565433,2345270.0,1.289224
min,190.0,2606098.0,6.79
25%,203.5,4026958.0,8.0275
50%,237.5,4309260.0,8.88
75%,253.0,6884612.0,9.845
max,291.0,9887466.0,10.78


In [10]:
# View inventory metrics
print("Inventory Turnover Summary:")
print(attrs[['inventory_turnover_rate']].describe())
print("\nTop 5 Products by Sales Velocity:")
print(attrs.nlargest(5, 'sales_velocity')[['product_id', 'dt_year', 'dt_month', 'sales_velocity', 'total_revenue']])

Inventory Turnover Summary:
       inventory_turnover_rate
count                10.000000
mean                  3.122000
std                   0.487461
min                   2.560000
25%                   2.635000
50%                   3.235000
75%                   3.345000
max                   4.050000

Top 5 Products by Sales Velocity:
  product_id  dt_year  dt_month  sales_velocity  total_revenue
8      PROD8     2025        10           10.78      7487707.0
1     PROD10     2025        10            9.96      9887466.0
7      PROD7     2025        10            9.92      8161219.0
5      PROD5     2025        10            9.62      3975302.0
9      PROD9     2025        10            9.41      3502255.0


## 8. Export to Excel

In [11]:
# Export model results to Excel
exporter = ExcelExporter(ctx)
output_file = f'outputs/{cfg_model["model_name"]}_export.xlsx'
exporter.export_model(cfg_model['model_name'], output_file, include_input=True)

print(f"‚úì Export complete: {output_file}")
print("\nExcel tabs:")
print(f"  1. {cfg_model['input_dataset_name']} (input)")
print(f"  2. {cfg_model['model_name']}_filters")
print(f"  3. {cfg_model['model_name']}_attrs")

‚úì Export complete: outputs/product_month_export.xlsx

Excel tabs:
  1. transactions_filters (input)
  2. product_month_filters
  3. product_month_attrs


## 9. Save Context State

Save the complete context state for use in downstream notebooks:

In [12]:
# Save context state (datasets, config, metadata)
state_dir = save_context_state(ctx=ctx, base_cfg=base_cfg)

print(f"‚úì Context state saved: {state_dir}")
print(f"  - Total datasets: {len(ctx.datasets)}")
print(f"\nTo load this state in another notebook:")
print(f"  from src.core.persistence import load_context_state")
print(f"  ctx, base_cfg = load_context_state('{state_dir}')")

‚úì Context state saved: data\context_states\test_client_20251022_150907
  - Total datasets: 17

To load this state in another notebook:
  from src.core.persistence import load_context_state
  ctx, base_cfg = load_context_state('data\context_states\test_client_20251022_150907')
