# GabeDA Features (Weekly Business Metrics)

This notebook creates weekly business metrics by aggregating transaction data by week.
It uses external_data from daily_attrs to calculate week-level insights.

**Input:** Preprocessed transactions from 01_transactions notebook  
**Output:** Weekly metrics (1 row per week)  
**Group By:** `dt_year`, `dt_weekofyear`  
**External Data:** `daily_attrs` (joined on `dt_date` for daily aggregates)

## 1. Setup: Imports, Context Loading, Logging

## 0. Project Root Setup (Auto-generated)

In [1]:
# Auto-detect project root and add to Python path
import os
import sys
from pathlib import Path

# Get the project root (2 levels up from notebooks/development or notebooks/from_store)
notebook_dir = Path.cwd() if '__file__' not in globals() else Path(__file__).parent
project_root = notebook_dir.parent.parent

# Change to project root
os.chdir(project_root)

# Add project root to Python path if not already there
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

print(f"Working directory: {os.getcwd()}")
print(f"Project root: {project_root}")

Working directory: c:\Projects\play\khujta_ai_business
Project root: c:\Projects\play\khujta_ai_business


In [2]:
import pandas as pd
import numpy as np

# v2.0 Refactored imports
from src.utils.logger import setup_logging, get_logger
from src.core.context import GabedaContext
from src.core.persistence import load_context_state, get_latest_state, save_context_state
from src.core.constants import *
from src.features.store import FeatureStore
from src.features.resolver import DependencyResolver
from src.features.detector import FeatureTypeDetector
from src.features.analyzer import FeatureAnalyzer
from src.execution.calculator import FeatureCalculator
from src.execution.groupby import GroupByProcessor
from src.execution.executor import ModelExecutor
from src.export.excel import ExcelExporter

# Load latest context state
client_name = 'test_client'
latest_state = get_latest_state(client_name, base_dir='data/context_states')

if latest_state:
    ctx, base_cfg = load_context_state(latest_state)
    print(f"‚úì Loaded latest state: {latest_state}")
else:
    raise FileNotFoundError(f"No context state found for client '{client_name}'")

# Setup logging
setup_logging(log_level=base_cfg.get('log_level', 'INFO'), 
              config={'client': base_cfg.get('client', 'unknown_client')})
logger = get_logger(__name__)

print(f"\n‚úì Context loaded successfully!")
print(f"  - Original run_id: {ctx.original_run_id}")
print(f"  - New run_id: {ctx.run_id}")
print(f"  - Available datasets: {len(ctx.list_datasets())} datasets")

‚úì Loaded latest state: data\context_states\test_client_20251022_150907
üìù Run instance ID: test_client_20251022_151027 - Logging [INFO] to: logs\test_client_20251022_151027.log

‚úì Context loaded successfully!
  - Original run_id: test_client_20251022_151013
  - New run_id: test_client_20251022_151027
  - Available datasets: 11 datasets


## 2. Load Input Data

In [3]:
# Get input dataset
input_df = ctx.get_dataset('transactions_filters')

print(f"‚úì Input dataset loaded")
print(f"  - Shape: {input_df.shape}")
print(f"  - Date range: {input_df['dt_date'].min()} to {input_df['dt_date'].max()}")
print(f"  - Week range: Week {input_df['dt_weekofyear'].min()} to Week {input_df['dt_weekofyear'].max()}")
print(f"\nFirst few rows:")
input_df.head()

‚úì Input dataset loaded
  - Shape: (609, 59)
  - Date range: 20251001 to 20251030
  - Week range: Week 40 to Week 44

First few rows:


Unnamed: 0,in_dt,in_product_id,in_quantity,in_price_total,in_trans_type,in_customer_id,in_description,in_category,in_unit_type,in_stock,...,cost_unit,cost_total,price_unit,price_total,margin_unit,margin_unit_pct,margin_unit_valid,margin_total,margin_total_pct,margin_total_valid
0,2025-10-01 01:02:00,prod8,2.0,52964.0,return,client13,product 8,category B,pack,61.0,...,18792.0,37585.0,26482.0,52964.0,7690.0,29.04,True,15379.0,29.04,True
1,2025-10-01 06:24:00,prod4,6.0,177195.0,sale,client6,product 4,category B,unit,30.0,...,21526.0,129155.0,29533.0,177195.0,8007.0,27.11,True,48040.0,27.11,True
2,2025-10-01 08:38:00,prod7,2.0,70492.0,return,client12,product 7,category A,unit,78.0,...,25754.0,51509.0,35246.0,70492.0,9492.0,26.93,True,18983.0,26.93,True
3,2025-10-01 09:59:00,prod2,4.0,86751.0,sale,client3,product 2,category A,unit,80.0,...,12947.0,51786.0,21688.0,86751.0,8741.0,40.3,True,34965.0,40.31,True
4,2025-10-01 10:07:00,prod3,3.0,76465.0,sale,client12,product 3,category B,unit,47.0,...,16943.0,5083.0,25488.0,76465.0,8545.0,33.53,True,71382.0,93.35,True


## 3. Define Features

Weekly business metrics with external data from daily_attrs:  
- Revenue aggregates (total revenue, transaction count, unique customers)  
- Daily averages (average daily sales from external data)  
- Peak analysis (peak sales day using external data)  
- Behavioral ratios (weekday vs weekend sales)  
- Growth metrics (week-over-week growth, customer retention)

In [4]:
# ===== Weekly Business Metrics =====
# Based on specs: docs/specs/model/tech_specs.md - Dataset 2.1

def weekly_revenue(price_total):
    """
    Sum of daily sales amounts for the entire week.
    Formula: SUM(price_total) from transaction-level data
    """
    return np.sum(price_total)

def weekly_transaction_count(trans_id):
    """
    Total number of unique transactions in the week.
    Formula: COUNT(DISTINCT trans_id)
    """
    return len(np.unique(trans_id))

def weekly_unique_customers(customer_id):
    """
    Distinct count of unique customers in the week.
    Formula: COUNT(DISTINCT customer_id)
    """
    return len(np.unique(customer_id))

def average_daily_sales(daily_attrs_price_total_sum):
    """
    Average revenue per day in the week.
    Formula: AVG(price_total_sum) from daily_attrs external data

    Note: daily_attrs_price_total_sum comes from daily_attrs joined via external_data
          External columns are prefixed with dataset name: daily_attrs_*
    """
    return round(np.mean(daily_attrs_price_total_sum), 2)

def peak_sales_day(dt_date, daily_attrs_price_total_sum):
    """
    The date with highest sales amount in the week.
    Formula: dt_date[ARGMAX(price_total_sum)] from daily_attrs

    Returns: Date in YYYYMMDD format (integer)
    
    Note: dt_date is the JOIN KEY, so it's NOT prefixed (see groupby.py:239)
          Other external columns ARE prefixed: daily_attrs_*
    """
    peak_idx = np.argmax(daily_attrs_price_total_sum)
    return dt_date[peak_idx]

def weekday_vs_weekend_ratio(price_total, is_weekend):
    """
    Ratio comparing weekend sales to weekday sales.
    Formula: SUM(price_total WHERE is_weekend=True) / SUM(price_total WHERE is_weekend=False)

    Returns: Ratio value, or DEFAULT_FLOAT if weekday sales = 0
    """
    weekend_sales = np.sum(price_total[is_weekend == True])
    weekday_sales = np.sum(price_total[is_weekend == False])

    if weekday_sales == 0:
        return DEFAULT_FLOAT

    return round(weekend_sales / weekday_sales, 2)

def week_over_week_growth(price_total):
    """
    Revenue growth percentage from the previous week.
    Formula: ((current_week_revenue - previous_week_revenue) / previous_week_revenue) * 100

    Returns: DEFAULT_FLOAT (requires historical/window data - not implemented in v1)

    Note: This feature requires access to previous week's data via window functions
    or external historical context. Currently returns DEFAULT_FLOAT for all weeks.
    Future enhancement: Implement window function support or pass historical data.
    """
    return DEFAULT_FLOAT

def customer_retention_rate(customer_id):
    """
    Percentage of customers from previous week who made purchases in current week.
    Formula: (retained_customers / previous_week_customers) * 100

    Returns: DEFAULT_FLOAT (requires historical/window data - not implemented in v1)

    Note: This feature requires access to previous week's customer set via window
    functions or external historical context. Currently returns DEFAULT_FLOAT.
    Future enhancement: Implement customer tracking across weeks.
    """
    return DEFAULT_FLOAT

print("‚úì Feature functions defined: 8 attributes")

‚úì Feature functions defined: 8 attributes


## 4. Configure Model

In [5]:
# Collect features into dictionary
features = {
    'weekly_revenue': weekly_revenue,
    'weekly_transaction_count': weekly_transaction_count,
    'weekly_unique_customers': weekly_unique_customers,
    'average_daily_sales': average_daily_sales,
    'peak_sales_day': peak_sales_day,
    'weekday_vs_weekend_ratio': weekday_vs_weekend_ratio,
    'week_over_week_growth': week_over_week_growth,
    'customer_retention_rate': customer_retention_rate,
}

# Model configuration with external data
cfg_model = {
    'model_name': 'weekly',
    'input_dataset_name': 'transactions_filters',
    'group_by': ['dt_year', 'dt_weekofyear'],
    'row_id': 'in_trans_id',
    'output_cols': list(features.keys()),
    'features': features,
    'external_data': {
        'daily_attrs': {
            'source': 'daily_attrs',           # Dataset name in context
            'join_on': ['dt_date'],            # Join on date
            'columns': None                    # None = bring ALL columns, or specify list
        }
    }
}

print(f"‚úì Model configured: '{cfg_model['model_name']}'")
print(f"  - Group by: {cfg_model['group_by']}")
print(f"  - Output features: {len(cfg_model['output_cols'])}")
print(f"  - External data sources: {list(cfg_model['external_data'].keys())}")

‚úì Model configured: 'weekly'
  - Group by: ['dt_year', 'dt_weekofyear']
  - Output features: 8
  - External data sources: ['daily_attrs']


## 5. Prepare Features (Store, Resolve Dependencies, Save Config)

In [6]:
# Initialize feature store and store features
feature_store = FeatureStore()
feature_store.store_features(features, model_name=cfg_model['model_name'], auto_save=True)

# Resolve dependencies
resolver = DependencyResolver(feature_store)
in_cols, exec_seq, ext_cols = resolver.resolve_dependencies(
    output_cols=cfg_model['output_cols'],
    available_cols=input_df.columns.tolist(),
    group_by=cfg_model.get('group_by'),
    model=cfg_model['model_name']
)

# Update model config with resolved dependencies
cfg_model['in_cols'] = in_cols
cfg_model['exec_seq'] = exec_seq
cfg_model['ext_cols'] = ext_cols

# Save master configuration
feature_store.save_master_config(
    model_name=cfg_model['model_name'],
    model_config=cfg_model
)

print("‚úì Features prepared and dependencies resolved")
print(f"  - Input columns needed: {len(in_cols)}")
print(f"  - Execution sequence: {exec_seq}")
print(f"  - Master config saved: feature_store/{cfg_model['model_name']}/master_cfg.json")

‚úì Features prepared and dependencies resolved
  - Input columns needed: 6
  - Execution sequence: ['weekly_revenue', 'weekly_transaction_count', 'weekly_unique_customers', 'average_daily_sales', 'peak_sales_day', 'weekday_vs_weekend_ratio', 'week_over_week_growth', 'customer_retention_rate']
  - Master config saved: feature_store/weekly/master_cfg.json


In [7]:
# Initialize execution components
detector = FeatureTypeDetector()
analyzer = FeatureAnalyzer(feature_store, detector)
calculator = FeatureCalculator()
groupby_processor = GroupByProcessor(calculator, detector)
executor = ModelExecutor(analyzer, groupby_processor, context=ctx)

# Execute model
output = executor.execute_model(
    cfg_model=cfg_model,
    input_dataset_name=cfg_model['input_dataset_name']
)

# Store results in context
ctx.set_model_output(cfg_model['model_name'], output, cfg_model)

print("‚úì Model executed successfully!")
print(f"  - Filters: {output['filters'].shape if output['filters'] is not None else 'None'}")
print(f"  - Attributes: {output['attrs'].shape if output['attrs'] is not None else 'None'}")
print(f"  - Weeks analyzed: {output['attrs'].shape[0] if output['attrs'] is not None else 0}")

‚úì Model executed successfully!
  - Filters: (609, 85)
  - Attributes: (5, 7)
  - Weeks analyzed: 5


## 6. Execute Model (Initialize Components + Execute + Store Results)

## 7. View Results

In [8]:
# View weekly attributes (aggregated results)
attrs = ctx.get_model_attrs(cfg_model['model_name'])
print(f"Weekly Metrics (n={len(attrs)}):")
attrs.head()

Weekly Metrics (n=5):


Unnamed: 0,dt_year,dt_weekofyear,weekly_revenue,weekly_transaction_count,weekly_unique_customers,average_daily_sales,weekday_vs_weekend_ratio
0,2025,40,10589225.0,118,15,2211191.61,0.53
1,2025,41,9706455.0,128,15,1508764.95,0.39
2,2025,42,12029169.0,135,15,1965979.04,0.28
3,2025,43,11855919.0,136,15,1736761.74,0.2
4,2025,44,9315052.0,92,15,2553517.33,0.0


In [9]:
# View summary statistics
print("Weekly Revenue Summary:")
attrs[['weekly_revenue', 'weekly_transaction_count', 'weekly_unique_customers']].describe()

Weekly Revenue Summary:


Unnamed: 0,weekly_revenue,weekly_transaction_count,weekly_unique_customers
count,5.0,5.0,5.0
mean,10699160.0,121.8,15.0
std,1226817.0,18.143869,0.0
min,9315052.0,92.0,15.0
25%,9706455.0,118.0,15.0
50%,10589220.0,128.0,15.0
75%,11855920.0,135.0,15.0
max,12029170.0,136.0,15.0


In [10]:
# View daily averages and ratios
print("Daily Averages and Behavioral Ratios:")
attrs[['average_daily_sales', 'weekday_vs_weekend_ratio']].head(10)

Daily Averages and Behavioral Ratios:


Unnamed: 0,average_daily_sales,weekday_vs_weekend_ratio
0,2211191.61,0.53
1,1508764.95,0.39
2,1965979.04,0.28
3,1736761.74,0.2
4,2553517.33,0.0


## 8. Export to Excel

In [11]:
# Export model results to Excel
exporter = ExcelExporter(ctx)
output_file = f'outputs/{cfg_model["model_name"]}_export.xlsx'
exporter.export_model(cfg_model['model_name'], output_file, include_input=True)

print(f"‚úì Export complete: {output_file}")
print("\nExcel tabs:")
print(f"  1. {cfg_model['input_dataset_name']} (input)")
print(f"  2. {cfg_model['model_name']}_filters")
print(f"  3. {cfg_model['model_name']}_attrs")

‚úì Export complete: outputs/weekly_export.xlsx

Excel tabs:
  1. transactions_filters (input)
  2. weekly_filters
  3. weekly_attrs


## 9. Save Context State

Save the complete context state for use in downstream notebooks:

In [12]:
# Save context state (datasets, config, metadata)
state_dir = save_context_state(ctx=ctx, base_cfg=base_cfg)

print(f"‚úì Context state saved: {state_dir}")
print(f"  - Total datasets: {len(ctx.datasets)}")
print(f"\nTo load this state in another notebook:")
print(f"  from src.core.persistence import load_context_state")
print(f"  ctx, base_cfg = load_context_state('{state_dir}')")

‚úì Context state saved: data\context_states\test_client_20251022_150907
  - Total datasets: 13

To load this state in another notebook:
  from src.core.persistence import load_context_state
  ctx, base_cfg = load_context_state('data\context_states\test_client_20251022_150907')
