# GabeDA Features (Customer Profile - Loading from Feature Store)

This notebook demonstrates the **intended feature_store workflow**:
- Features are pre-defined and saved in `feature_store/customer_profile/`
- **Single method call** loads complete model from feature_store
- Returned `master_cfg` is execution-ready (just needs `in_cols` added)

**Input:** Preprocessed transactions from 01_transactions notebook
**Output:** Customer profiles (1 row per customer)
**Group By:** `customer_id`
**Features:** Loaded from `feature_store/customer_profile/`

## 1. Setup: Imports, Context Loading, Logging

## 0. Project Root Setup (Auto-generated)

In [1]:
# Auto-detect project root and add to Python path
import os
import sys
from pathlib import Path

# Get the project root (2 levels up from notebooks/development or notebooks/from_store)
notebook_dir = Path.cwd() if '__file__' not in globals() else Path(__file__).parent
project_root = notebook_dir.parent.parent

# Change to project root
os.chdir(project_root)

# Add project root to Python path if not already there
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

print(f"Working directory: {os.getcwd()}")
print(f"Project root: {project_root}")

Working directory: c:\Projects\play\khujta_ai_business
Project root: c:\Projects\play\khujta_ai_business


In [2]:
import pandas as pd
import numpy as np
from collections import Counter

# v2.0 Refactored imports
from src.utils.logger import setup_logging, get_logger
from src.core.context import GabedaContext
from src.core.persistence import load_context_state, get_latest_state, save_context_state
from src.core.constants import *
from src.features.store import FeatureStore
from src.features.resolver import DependencyResolver
from src.features.detector import FeatureTypeDetector
from src.features.analyzer import FeatureAnalyzer
from src.execution.calculator import FeatureCalculator
from src.execution.groupby import GroupByProcessor
from src.execution.executor import ModelExecutor
from src.export.excel import ExcelExporter

# Load latest context state
client_name = 'test_client'
latest_state = get_latest_state(client_name, base_dir='data/context_states')

if latest_state:
    ctx, base_cfg = load_context_state(latest_state)
    print(f"‚úì Loaded latest state: {latest_state}")
else:
    raise FileNotFoundError(f"No context state found for client '{client_name}'")

# Setup logging
setup_logging(log_level=base_cfg.get('log_level', 'INFO'),
              config={'client': base_cfg.get('client', 'unknown_client')})
logger = get_logger(__name__)

print(f"\n‚úì Context loaded successfully!")
print(f"  - Original run_id: {ctx.original_run_id}")
print(f"  - New run_id: {ctx.run_id}")
print(f"  - Available datasets: {len(ctx.list_datasets())} datasets")

‚úì Loaded latest state: data\context_states\test_client_20251022_121534
üìù Run instance ID: test_client_20251022_121746 - Logging [INFO] to: logs\test_client_20251022_121746.log

‚úì Context loaded successfully!
  - Original run_id: test_client_20251022_121737
  - New run_id: test_client_20251022_121746
  - Available datasets: 19 datasets


## 2. Load Input Data

In [3]:
# Get input dataset
input_df = ctx.get_dataset('transactions_filters')

print(f"‚úì Input dataset loaded")
print(f"  - Shape: {input_df.shape}")
print(f"  - Date range: {input_df['dt_date'].min()} to {input_df['dt_date'].max()}")
print(f"  - Unique customers: {input_df['customer_id'].nunique()}")
print(f"\nFirst few rows:")
input_df.head()

‚úì Input dataset loaded
  - Shape: (609, 59)
  - Date range: 20251001 to 20251030
  - Unique customers: 15

First few rows:


Unnamed: 0,in_dt,in_product_id,in_quantity,in_price_total,in_trans_type,in_customer_id,in_description,in_category,in_unit_type,in_stock,...,cost_unit,cost_total,price_unit,price_total,margin_unit,margin_unit_pct,margin_unit_valid,margin_total,margin_total_pct,margin_total_valid
0,2025-10-01 01:02:00,prod8,2.0,52964.0,return,client13,product 8,category B,pack,61.0,...,18792.0,37585.0,26482.0,52964.0,7690.0,29.04,True,15379.0,29.04,True
1,2025-10-01 06:24:00,prod4,6.0,177195.0,sale,client6,product 4,category B,unit,30.0,...,21526.0,129155.0,29533.0,177195.0,8007.0,27.11,True,48040.0,27.11,True
2,2025-10-01 08:38:00,prod7,2.0,70492.0,return,client12,product 7,category A,unit,78.0,...,25754.0,51509.0,35246.0,70492.0,9492.0,26.93,True,18983.0,26.93,True
3,2025-10-01 09:59:00,prod2,4.0,86751.0,sale,client3,product 2,category A,unit,80.0,...,12947.0,51786.0,21688.0,86751.0,8741.0,40.3,True,34965.0,40.31,True
4,2025-10-01 10:07:00,prod3,3.0,76465.0,sale,client12,product 3,category B,unit,47.0,...,16943.0,5083.0,25488.0,76465.0,8545.0,33.53,True,71382.0,93.35,True


## 3. Load Model from Feature Store

**KEY DIFFERENCE:** Single method call returns execution-ready `master_cfg`

In [4]:
# Load model - returns execution-ready master_cfg
feature_store = FeatureStore()
model = feature_store.load_model('customer_profile')

# Extract execution-ready config (already has compiled features!)
cfg_model = model['master_cfg']

print(f"‚úì Model loaded from feature_store")
print(f"  - Features: {len(cfg_model['features'])}")
print(f"  - Execution sequence: {cfg_model['exec_seq']}")
print(f"  - Group by: {cfg_model['group_by']}")
print(f"  - Input dataset: {cfg_model['input_dataset_name']}")

‚úì Model loaded from feature_store
  - Features: 7
  - Execution sequence: ['month_total_spent', 'month_transaction_count', 'month_items_purchased', 'average_order_value', 'favorite_products', 'preferred_shopping_day', 'preferred_shopping_time']
  - Group by: ['customer_id']
  - Input dataset: transactions_filters


## 4. Resolve Dependencies and Add in_cols

Only missing piece - resolve which input columns are needed

In [5]:
# Resolve dependencies to determine input columns needed
resolver = DependencyResolver(feature_store)
in_cols, _, _ = resolver.resolve_dependencies(
    output_cols=cfg_model['output_cols'],
    available_cols=input_df.columns.tolist(),
    group_by=cfg_model.get('group_by'),
    model=cfg_model['model_name']
)

# Add in_cols to cfg_model - now it's complete!
cfg_model['in_cols'] = in_cols

print(f"‚úì Dependencies resolved")
print(f"  - Input columns needed: {len(in_cols)}")
print(f"  - cfg_model is now execution-ready!")

‚úì Dependencies resolved
  - Input columns needed: 6
  - cfg_model is now execution-ready!


## 5. Execute Model

In [6]:
# Initialize execution components
detector = FeatureTypeDetector()
analyzer = FeatureAnalyzer(feature_store, detector)
calculator = FeatureCalculator()
groupby_processor = GroupByProcessor(calculator, detector)
executor = ModelExecutor(analyzer, groupby_processor, context=ctx)

# Execute model using cfg_model (which is the enhanced master_cfg)
output = executor.execute_model(
    cfg_model=cfg_model,
    input_dataset_name=cfg_model['input_dataset_name']
)

# Store results in context
ctx.set_model_output(cfg_model['model_name'], output)

print("‚úì Model executed successfully!")
print(f"  - Filters: {output['filters'].shape if output['filters'] is not None else 'None'}")
print(f"  - Attributes: {output['attrs'].shape if output['attrs'] is not None else 'None'}")
print(f"  - Customers profiled: {output['attrs'].shape[0] if output['attrs'] is not None else 0}")

‚úì Model executed successfully!
  - Filters: (609, 59)
  - Attributes: (15, 8)
  - Customers profiled: 15


## 6. View Results

In [7]:
# View customer profiles (aggregated attributes)
attrs = ctx.get_model_attrs(cfg_model['model_name'])
print(f"Customer Profiles (n={len(attrs)}):")
attrs.head(10)

Customer Profiles (n=15):


Unnamed: 0,customer_id,month_total_spent,month_transaction_count,month_items_purchased,average_order_value,favorite_products,preferred_shopping_day,preferred_shopping_time
0,CLIENT1,3835713.0,43,176,89202.63,"PROD2,PROD9,PROD8",Friday,Afternoon
1,CLIENT10,2642342.0,33,116,80070.97,"PROD9,PROD7,PROD1",Saturday,Afternoon
2,CLIENT11,3263459.0,42,175,77701.4,"PROD9,PROD8,PROD10",Thursday,Afternoon
3,CLIENT12,4057118.0,59,184,68764.71,"PROD1,PROD3,PROD10",Thursday,Afternoon
4,CLIENT13,3407417.0,31,144,109916.68,"PROD8,PROD3,PROD6",Friday,Afternoon
5,CLIENT14,4637137.0,50,198,92742.74,"PROD9,PROD4,PROD10",Wednesday,Afternoon
6,CLIENT15,3423471.0,41,166,83499.29,"PROD10,PROD7,PROD1",Thursday,Afternoon
7,CLIENT2,3425998.0,44,151,77863.59,"PROD2,PROD6,PROD9",Thursday,Afternoon
8,CLIENT3,4731427.0,40,188,118285.68,"PROD7,PROD4,PROD8",Thursday,Afternoon
9,CLIENT4,2597804.0,34,106,76406.0,"PROD10,PROD3,PROD1",Wednesday,Afternoon


In [8]:
# View summary statistics
print("Customer Spending Summary:")
attrs[['month_total_spent', 'month_transaction_count', 'average_order_value']].describe()

Customer Spending Summary:


Unnamed: 0,month_total_spent,month_transaction_count,average_order_value
count,15.0,15.0,15.0
mean,3566388.0,40.6,88377.004667
std,730578.3,7.414272,13843.930085
min,2504940.0,30.0,68764.71
25%,3287843.0,36.5,78967.28
50%,3423471.0,40.0,84928.9
75%,3946416.0,43.5,90972.685
max,4782574.0,59.0,118285.68


In [9]:
# View preferred shopping patterns
print("Shopping Day Preferences:")
print(attrs['preferred_shopping_day'].value_counts())
print("\nShopping Time Preferences:")
print(attrs['preferred_shopping_time'].value_counts())

Shopping Day Preferences:
preferred_shopping_day
Thursday     7
Friday       3
Saturday     2
Wednesday    2
Tuesday      1
Name: count, dtype: int64

Shopping Time Preferences:
preferred_shopping_time
Afternoon    15
Name: count, dtype: int64


## 7. Export to Excel

In [10]:
# Export model results to Excel
exporter = ExcelExporter(ctx)
output_file = f'outputs/{cfg_model["model_name"]}_from_store_export.xlsx'
exporter.export_model(cfg_model['model_name'], output_file, include_input=True)

print(f"‚úì Export complete: {output_file}")
print("\nExcel tabs:")
print(f"  1. {cfg_model['input_dataset_name']} (input)")
print(f"  2. {cfg_model['model_name']}_filters")
print(f"  3. {cfg_model['model_name']}_attrs")

‚úì Export complete: outputs/customer_profile_from_store_export.xlsx

Excel tabs:
  1. transactions_filters (input)
  2. customer_profile_filters
  3. customer_profile_attrs


## 8. Save Context State

In [11]:
# Save context state (datasets, config, metadata)
state_dir = save_context_state(ctx=ctx, base_cfg=base_cfg)

print(f"‚úì Context state saved: {state_dir}")
print(f"  - Total datasets: {len(ctx.datasets)}")
print(f"\nTo load this state in another notebook:")
print(f"  from src.core.persistence import load_context_state")
print(f"  ctx, base_cfg = load_context_state('{state_dir}')")

‚úì Context state saved: data\context_states\test_client_20251022_121534
  - Total datasets: 19

To load this state in another notebook:
  from src.core.persistence import load_context_state
  ctx, base_cfg = load_context_state('data\context_states\test_client_20251022_121534')
