# Transaction Features (Initial Data Loading)

This notebook is the **FIRST** in the analytics pipeline. It:
- Loads raw transaction data from CSV
- Validates data quality and schema
- Processes transaction-level features (43 filters, 0 attributes)
- Creates the initial context state for downstream notebooks

**Key Characteristics:**
- Does NOT load from context (creates the initial context)
- Processes transaction-level data (no aggregation/group_by)
- Saves context state for product, customer, and other models to reuse

## 1. Configuration

## 0. Project Root Setup (Auto-generated)

In [1]:
# Auto-detect project root and add to Python path
import os
import sys
from pathlib import Path

# Get the project root (2 levels up from notebooks/development or notebooks/from_store)
notebook_dir = Path.cwd() if '__file__' not in globals() else Path(__file__).parent
project_root = notebook_dir.parent.parent

# Change to project root
os.chdir(project_root)

# Add project root to Python path if not already there
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

print(f"Working directory: {os.getcwd()}")
print(f"Project root: {project_root}")

Working directory: c:\Projects\play\khujta_ai_business
Project root: c:\Projects\play\khujta_ai_business


In [2]:
# Base configuration
base_cfg = {
    # 'input_file': 'data/tests/transactions/alsur_transacciones_202510_1.csv',
    # 'input_file': 'data/tests/raw/quick_test_7days.csv',
    'input_file': 'data/tests/raw/test_transactions_30days.csv',
    'client': 'test_client',
    'analysis_dt': '2025-11-11',
    'log_level': 'INFO',
    'fidx_config': {'type': 'local', 'path': 'feature_store'},
    
    # Default formats applied to all columns of each dtype (unless overridden)s
    'default_formats': {
        'date': '%m/%d/%Y %H:%M',  # European date format
        'float': {'thousands': '.', 'decimal': ','},  # European numeric format
        'int': {'thousands': '.', 'decimal': ','},
    },
    
    # Data schema: column mapping + types
    'data_schema': {
        'in_dt': {
            'source_column': 'in_dt',
            'dtype': 'date',
            # 'format': '%d-%m-%Y %H:%M'
        },
        'in_trans_id': {
            'source_column': 'in_trans_id',
            'dtype': 'str'
        },
        'in_trans_type': {
            'source_column': 'in_trans_type',
            'dtype': 'str'
        },
        'in_customer_id': {
            'source_column': 'in_customer_id',
            'dtype': 'str'
        },
        'in_product_id': {
            'source_column': 'in_product_id',
            'dtype': 'str'
        },
        'in_description': {
            'source_column': 'in_description',
            'dtype': 'str'
        },
        'in_category': {
            'source_column': 'in_category',
            'dtype': 'str'
        },
        'in_unit_type': {
            'source_column': 'in_unit_type',
            'dtype': 'str'
        },
        'in_stock': {
            'source_column': 'in_stock',
            'dtype': 'float'
        },
        'in_quantity': {
            'source_column': 'in_quantity',
            'dtype': 'float'
        },
        'in_cost_unit': {
            'source_column': 'in_cost_unit',
            'dtype': 'float'
        },
        'in_cost_total': {
            'source_column': 'in_cost_total',
            'dtype': 'float'
        },
        'in_price_unit': {
            'source_column': 'in_price_unit',
            'dtype': 'float'
        },
        'in_price_total': {
            'source_column': 'in_price_total',
            'dtype': 'float'
        },
        'in_discount_total': {
            'source_column': 'in_discount_total',
            'dtype': 'float'
        },
        'in_commission_total': {
            'source_column': 'in_commission_total',
            'dtype': 'float'
        },
        'in_margin': {
            'source_column': 'in_margin',
            'dtype': 'float'
        },
    
    # # Data schema: column mapping + types
    # 'data_schema': {
    #     'in_dt': {
    #         'source_column': 'Fecha venta',
    #         'dtype': 'date',
    #         # 'format': '%d-%m-%Y %H:%M'
    #     },
    #     'in_trans_id': {
    #         'source_column': 'N¬∞ doc. venta',
    #         'dtype': 'str'
    #     },
    #     'in_trans_type': {
    #         'source_column': 'M√©todo pago',
    #         'dtype': 'str'
    #     },
    #     'in_customer_id': {
    #         'source_column': 'Nombre cliente',
    #         'dtype': 'str'
    #     },
    #     'in_product_id': {
    #         'source_column': 'SKU',
    #         'dtype': 'str'
    #     },
    #     'in_description': {
    #         'source_column': 'Nombre',
    #         'dtype': 'str'
    #     },
    #     'in_category': {
    #         'source_column': 'Categor√≠a',
    #         'dtype': 'str'
    #     },
    #     'in_unit_type': {
    #         'source_column': 'Tipo unidad',
    #         'dtype': 'str'
    #     },
    #     'in_stock': {
    #         'source_column': 'Stock actual',
    #         'dtype': 'float'
    #     },
    #     'in_quantity': {
    #         'source_column': 'Cantidad',
    #         'dtype': 'float'
    #     },
    #     'in_cost_unit': {
    #         'source_column': 'Costo unitario',
    #         'dtype': 'float'
    #     },
    #     'in_cost_total': {
    #         'source_column': 'Total costo',
    #         'dtype': 'float'
    #     },
    #     'in_price_unit': {
    #         'source_column': 'Precio unitario',
    #         'dtype': 'float'
    #     },
    #     'in_price_total': {
    #         'source_column': 'Total neto',
    #         'dtype': 'float'
    #     },
    #     'in_discount_total': {
    #         'source_column': 'Total descuento',
    #         'dtype': 'float'
    #     },
    #     'in_commission_total': {
    #         'source_column': 'Total comisi√≥n',
    #         'dtype': 'float'
    #     },
    #     'in_margin': {
    #         'source_column': 'Utilidad aprox.',
    #         'dtype': 'float'
    #     },   
    }
}

## 2. Imports + Initialize Context + Setup Logging

In [3]:
import pandas as pd
import numpy as np

# v2.0 Refactored imports
from src.utils.logger import setup_logging, get_logger
from src.core.context import GabedaContext
from src.preprocessing.loaders import DataLoader
from src.preprocessing.validators import DataValidator
from src.preprocessing.schema import SchemaProcessor
from src.preprocessing.synthetic import SyntheticEnricher
from src.features.store import FeatureStore
from src.features.resolver import DependencyResolver
from src.features.detector import FeatureTypeDetector
from src.features.analyzer import FeatureAnalyzer
from src.execution.calculator import FeatureCalculator
from src.execution.groupby import GroupByProcessor
from src.execution.executor import ModelExecutor
from src.export.excel import ExcelExporter

# Setup logging
setup_logging(log_level=base_cfg.get('log_level', 'INFO'), config={'client': base_cfg.get('client', 'unknown_client')})
logger = get_logger(__name__)

# Initialize context (creates NEW context)
ctx = GabedaContext(base_cfg)

print(f"‚úì Imports successful")
print(f"‚úì Logging initialized")
print(f"‚úì Context initialized: {ctx.run_id}")

üìù Run instance ID: test_client_20251022_121117 - Logging [INFO] to: logs\test_client_20251022_121117.log
‚úì Imports successful
‚úì Logging initialized
‚úì Context initialized: test_client_20251022_121117


## 3. Load & Validate Data

Complete data loading pipeline:
1. Load raw CSV
2. Validate schema and data quality
3. Process schema (column mapping + type conversion)
4. Synthetic enrichment (auto-inference of missing fields)
5. Row-level validation (reject invalid rows)
6. Store datasets in context

In [4]:
# ============================================================
# SECTION 1: Load raw data
# ============================================================
loader = DataLoader()
raw_data = loader.load_csv(base_cfg['input_file'])
print(f"‚úì Loaded raw data: {raw_data.shape}")

# ============================================================
# SECTION 2: Comprehensive validation (schema + data quality)
# ============================================================
validator = DataValidator()
required_cols = [spec['source_column'] for spec in base_cfg['data_schema'].values()]

validation = validator.validate_all(
    df=raw_data,
    required_cols=required_cols,
    data_schema=base_cfg['data_schema'],
    default_formats=base_cfg['default_formats']
)

if not validation.is_valid:
    print("‚ùå Validation FAILED!")
    print(f"\nErrors ({len(validation.errors)}):")
    for error in validation.errors:
        print(f"  ‚Ä¢ {error}")
    raise ValueError("Fix validation errors before proceeding")

if validation.warnings:
    print(f"‚ö†Ô∏è  Warnings ({len(validation.warnings)}):")
    for warning in validation.warnings[:5]:  # Show first 5
        print(f"  ‚Ä¢ {warning}")
    if len(validation.warnings) > 5:
        print(f"  ... and {len(validation.warnings) - 5} more")

print("‚úì Validation passed")

# ============================================================
# SECTION 3: Process schema (column mapping + type conversion)
# ============================================================
schema_processor = SchemaProcessor()
preprocessed_df = schema_processor.process_schema(raw_data, base_cfg).df
print(f"‚úì Schema processed: {preprocessed_df.shape}")

# ============================================================
# SECTION 4: Synthetic enrichment (auto-inference)
# ============================================================
enricher = SyntheticEnricher(synthetic_model_name='synthetic')
preprocessed_df = enricher.enrich(data=preprocessed_df)
print(f"‚úì Data enriched: {preprocessed_df.shape}")

# ============================================================
# SECTION 5: Row-level validation (reject nulls in required fields)
# ============================================================
transactions_enriched, reject_result = validator.validate_row_level_required_fields(
    df=preprocessed_df,
    data_schema=base_cfg['data_schema'],
    save_to_file=False  # Use context instead of files
)

# ============================================================
# SECTION 6: Store datasets in context
# ============================================================
ctx.set_dataset('transactions_raw', raw_data)
ctx.set_dataset('transactions_enriched', transactions_enriched)
if reject_result.rejected_rows is not None:
    ctx.set_dataset('transactions_rejected', reject_result.rejected_rows)

# ============================================================
# SECTION 7: Report data quality summary
# ============================================================
if reject_result.rejected_rows is not None:
    print(f"\nüìä Data Quality Summary:")
    print(f"  Total rows: {len(preprocessed_df)}")
    print(f"  Clean rows: {len(transactions_enriched)}")
    print(f"  Rejected rows: {len(reject_result.rejected_rows)}")
    print(f"  Rejection rate: {len(reject_result.rejected_rows) / len(preprocessed_df) * 100:.2f}%")
    
    # Show top rejection reasons
    print(f"\n‚ùå Top Rejection Reasons:")
    reasons_count = {}
    for reason in reject_result.rejection_reasons.values():
        reasons_count[reason] = reasons_count.get(reason, 0) + 1
    
    for reason, count in sorted(reasons_count.items(), key=lambda x: x[1], reverse=True)[:5]:
        print(f"  ‚Ä¢ {reason}: {count} rows")
else:
    print(f"\n‚úì All {len(transactions_enriched)} rows passed validation")

print(f"\n‚úì Data loading complete!")
print(f"  Datasets: {ctx.list_datasets()}")

‚úì Loaded raw data: (609, 17)
‚úì Validation passed
‚úì Schema processed: (609, 17)
‚úì Data enriched: (609, 17)

‚úì All 609 rows passed validation

‚úì Data loading complete!
  Datasets: ['transactions_raw', 'transactions_enriched']


## 4. View Datasets

In [5]:
print("Available Datasets:")
for name in ctx.list_datasets():
    df = ctx.get_dataset(name)
    print(f"  - {name}: {df.shape}")

print("\nSample of enriched transactions:")
display(transactions_enriched.head())

# Optionally view rejected rows
if 'transactions_rejected' in ctx.list_datasets():
    rejected = ctx.get_dataset('transactions_rejected')
    print(f"\nRejected Rows: {len(rejected)}")
    display(rejected[['in_dt', 'in_trans_id', 'in_product_id', 'rejection_reason']].head(10))

Available Datasets:
  - transactions_raw: (609, 17)
  - transactions_enriched: (609, 17)

Sample of enriched transactions:


Unnamed: 0,in_dt,in_trans_id,in_product_id,in_quantity,in_price_total,in_trans_type,in_customer_id,in_description,in_category,in_unit_type,in_stock,in_cost_unit,in_cost_total,in_price_unit,in_discount_total,in_commission_total,in_margin
0,2025-10-01 01:02:00,trans000002,prod8,2.0,52964.0,return,client13,product 8,category B,pack,61.0,18792.0,37585.0,26482.0,0.0,2791.0,12587.0
1,2025-10-01 06:24:00,trans000011,prod4,6.0,177195.0,sale,client6,product 4,category B,unit,30.0,21526.0,129155.0,29533.0,28405.0,8102.0,11533.0
2,2025-10-01 08:38:00,trans000004,prod7,2.0,70492.0,return,client12,product 7,category A,unit,78.0,25754.0,51509.0,35246.0,5192.0,3843.0,9947.0
3,2025-10-01 09:59:00,trans000021,prod2,4.0,86751.0,sale,client3,product 2,category A,unit,80.0,12947.0,51786.0,21688.0,0.0,4656.0,30309.0
4,2025-10-01 10:07:00,trans000003,prod3,3.0,76465.0,sale,client12,product 3,category B,unit,47.0,16943.0,5083.0,25488.0,0.0,2877.0,22758.0


## 5. Define Feature Functions

Transaction-level features (43 filters, 0 attributes):

In [6]:
DEFAULT_FLOAT = -16.0

# ============================================================
# Time conversion utility
# ============================================================
def timestamp(in_dt) -> pd.Timestamp:
    """Convert datetime input to pd.Timestamp to ensure all datetime attributes are accessible."""
    if isinstance(in_dt, pd.Timestamp):
        return in_dt
    return pd.Timestamp(in_dt)

# ============================================================
# Time features (hour/minute extraction)
# ============================================================
def hour(timestamp: pd.Timestamp) -> int:
    return pd.Timestamp(timestamp).hour

def minute(timestamp: pd.Timestamp) -> int:
    return pd.Timestamp(timestamp).minute

def is_morning(timestamp: pd.Timestamp) -> bool:
    hour = pd.Timestamp(timestamp).hour
    return 5 <= hour < 11

def is_afternoon(timestamp: pd.Timestamp) -> bool:
    hour = pd.Timestamp(timestamp).hour
    return 12 <= hour < 17

def is_evening(timestamp: pd.Timestamp) -> bool:
    hour = pd.Timestamp(timestamp).hour
    return 17 <= hour < 22

def is_night(timestamp: pd.Timestamp) -> bool:
    hour = pd.Timestamp(timestamp).hour
    return hour >= 22 or hour < 5

# ============================================================
# Date features
# ============================================================
def dt_year(timestamp: pd.Timestamp) -> int:
    return pd.Timestamp(timestamp).year

def dt_month(timestamp: pd.Timestamp) -> int:
    return pd.Timestamp(timestamp).month

def dt_day(timestamp: pd.Timestamp) -> int:
    return pd.Timestamp(timestamp).day

def dt_date(dt_year: int, dt_month: int, dt_day: int) -> str:
    """Convert year, month, day to numeric date format YYYYMMDD."""
    return str(int(dt_year * 10000 + dt_month * 100 + dt_day))

def dt_weekday(timestamp: pd.Timestamp) -> int:
    return pd.Timestamp(timestamp).dayofweek

def dt_weekday_name(timestamp: pd.Timestamp) -> str:
    return pd.Timestamp(timestamp).day_name()

def dt_weekofyear(timestamp: pd.Timestamp) -> int:
    return pd.Timestamp(timestamp).isocalendar().week

def dt_quarter(timestamp: pd.Timestamp) -> int:
    return pd.Timestamp(timestamp).quarter

def dayofyear(timestamp: pd.Timestamp) -> int:
    return pd.Timestamp(timestamp).dayofyear

def is_leap_year(timestamp: pd.Timestamp) -> bool:
    return pd.Timestamp(timestamp).is_leap_year

def is_month_start(timestamp: pd.Timestamp) -> bool:
    return pd.Timestamp(timestamp).is_month_start

def is_month_end(timestamp: pd.Timestamp) -> bool:
    return pd.Timestamp(timestamp).is_month_end

def is_quarter_start(timestamp: pd.Timestamp) -> bool:
    return pd.Timestamp(timestamp).is_quarter_start

def is_quarter_end(timestamp: pd.Timestamp) -> bool:
    return pd.Timestamp(timestamp).is_quarter_end

def is_year_start(timestamp: pd.Timestamp) -> bool:
    return pd.Timestamp(timestamp).is_year_start

def is_year_end(timestamp: pd.Timestamp) -> bool:
    return pd.Timestamp(timestamp).is_year_end

def is_weekend(timestamp: pd.Timestamp) -> bool:
    return pd.Timestamp(timestamp).dayofweek >= 5

# ============================================================
# Transaction/Product/Customer ID features
# ============================================================
def trans_id(in_trans_id: str) -> str:
    return str(in_trans_id).upper()

def trans_type(in_trans_type: str) -> str:
    return str(in_trans_type).upper()

def customer_id(in_customer_id: str) -> str:
    return str(in_customer_id).upper()

def product_id(in_product_id: str) -> str:
    return str(in_product_id).upper()

def description(in_description: str) -> str:
    return str(in_description).upper()

def category(in_category: str) -> str:
    return str(in_category).upper()

def unit_type(in_unit_type: str) -> str:
    return str(in_unit_type).upper()

# ============================================================
# Numeric features (stock, quantity, cost, price)
# ============================================================
def stock(in_stock: float) -> float:
    return float(in_stock)

def quantity(in_quantity: int) -> int:
    return int(in_quantity)

def cost_unit(in_cost_unit: float) -> float:
    return float(in_cost_unit)

def cost_total(in_cost_total: float) -> float:
    return float(in_cost_total)

def price_unit(in_price_unit: float) -> float:
    return float(in_price_unit)

def price_total(in_price_total: float) -> float:
    return float(in_price_total)

# ============================================================
# Margin calculations
# ============================================================
def margin_unit(in_price_unit, in_cost_unit):
    margin_unit = in_price_unit - in_cost_unit
    return margin_unit if margin_unit >= 0.0 else DEFAULT_FLOAT

def margin_unit_pct(in_price_unit, in_cost_unit):
    if pd.isna(in_price_unit) or pd.isna(in_cost_unit):
        return np.nan
    if abs(in_price_unit) < 0.0:
        return 0.0
    margin_unit_pct = round(((in_price_unit - in_cost_unit) / in_price_unit) * 100.0, 2)
    return margin_unit_pct if margin_unit_pct >= 0.0 else DEFAULT_FLOAT

def margin_unit_valid(margin_unit_pct, in_cost_unit, in_price_unit):
    if pd.isna(margin_unit_pct) or pd.isna(in_cost_unit) or pd.isna(in_price_unit):
        return False
    return margin_unit_pct >= 0.0 and in_cost_unit >= 0.0 and in_price_unit >= 0.0

def margin_total(in_price_total, in_cost_total):
    margin_total = in_price_total - in_cost_total
    return margin_total if margin_total >= 0.0 else DEFAULT_FLOAT

def margin_total_pct(in_price_total, in_cost_total):
    if pd.isna(in_price_total) or pd.isna(in_cost_total):
        return np.nan
    if abs(in_price_total) < 0.0:
        return 0.0
    margin_total_pct = round(((in_price_total - in_cost_total) / in_price_total) * 100.0, 2)
    return margin_total_pct if margin_total_pct >= 0.0 else DEFAULT_FLOAT

def margin_total_valid(margin_total_pct, in_cost_total, in_price_total):
    if pd.isna(margin_total_pct) or pd.isna(in_cost_total) or pd.isna(in_price_total):
        return False
    return margin_total_pct >= 0.0 and in_cost_total >= 0.0 and in_price_total >= 0.0

print("‚úì Feature functions defined (43 transaction-level features)")

‚úì Feature functions defined (43 transaction-level features)


## 6. Features Dict + Model Config

In [7]:
# Features dictionary
features = {
    'timestamp': timestamp,
    'hour': hour,
    'minute': minute,
    'is_morning': is_morning,
    'is_afternoon': is_afternoon,
    'is_evening': is_evening,
    'is_night': is_night,
    'dt_date': dt_date,
    'dt_year': dt_year,
    'dt_month': dt_month,
    'dt_day': dt_day,
    'dt_weekday': dt_weekday,
    'dt_weekday_name': dt_weekday_name,
    'dt_weekofyear': dt_weekofyear,
    'dt_quarter': dt_quarter,
    'dayofyear': dayofyear,
    'is_leap_year': is_leap_year,
    'is_month_start': is_month_start,
    'is_month_end': is_month_end,
    'is_quarter_start': is_quarter_start,
    'is_quarter_end': is_quarter_end,
    'is_year_start': is_year_start,
    'is_year_end': is_year_end,
    'is_weekend': is_weekend,
    'trans_id': trans_id,
    'trans_type': trans_type,
    'customer_id': customer_id,
    'product_id': product_id,
    'description': description,
    'category': category,
    'unit_type': unit_type,
    'stock': stock,
    'quantity': quantity,
    'cost_unit': cost_unit,
    'cost_total': cost_total,
    'price_unit': price_unit,
    'price_total': price_total,
    'margin_unit': margin_unit,
    'margin_unit_pct': margin_unit_pct,
    'margin_unit_valid': margin_unit_valid,
    'margin_total': margin_total,
    'margin_total_pct': margin_total_pct,
    'margin_total_valid': margin_total_valid,
}

# Model configuration (transaction-level, no group_by)
cfg_product = {
    'model_name': 'transactions',
    'input_dataset_name': 'transactions_enriched',
    'row_id': 'in_trans_id',
    'output_cols': list(features.keys()),
    'features': features
}

print("‚úì Model configured")
print(f"  - Model: {cfg_product['model_name']}")
print(f"  - Row ID: {cfg_product['row_id']}")
print(f"  - Output columns: {len(cfg_product['output_cols'])}")
print(f"  - Group by: None (transaction-level)")

‚úì Model configured
  - Model: transactions
  - Row ID: in_trans_id
  - Output columns: 43
  - Group by: None (transaction-level)


## 7. Feature Store + Dependency Resolution + Save Config

In [8]:
# ============================================================
# SECTION 1: Initialize feature store and store features
# ============================================================
feature_store = FeatureStore()
feature_store.store_features(features, model_name=cfg_product['model_name'], auto_save=True)
print(f"‚úì Stored {len(feature_store.features)} features in store")

# ============================================================
# SECTION 2: Resolve dependencies (DFS traversal)
# ============================================================
resolver = DependencyResolver(feature_store)
in_cols, exec_seq, ext_cols = resolver.resolve_dependencies(
    output_cols=cfg_product['output_cols'],
    available_cols=transactions_enriched.columns.tolist(),
    group_by=cfg_product.get('group_by'),  # None for transaction-level
    model=cfg_product['model_name']
)

# Update config with dependency resolution results
cfg_product['in_cols'] = in_cols
cfg_product['exec_seq'] = exec_seq
cfg_product['ext_cols'] = ext_cols

print("‚úì Dependencies resolved")
print(f"  - Input columns needed: {len(in_cols)}")
print(f"  - Execution sequence: {len(exec_seq)} features")

# ============================================================
# SECTION 3: Save master configuration
# ============================================================
feature_store.save_master_config(
    model_name=cfg_product['model_name'],
    model_config=cfg_product
)

print(f"‚úì master_cfg.json saved for model '{cfg_product['model_name']}'")
print(f"  Location: feature_store/{cfg_product['model_name']}/master_cfg.json")

‚úì Stored 43 features in store
‚úì Dependencies resolved
  - Input columns needed: 14
  - Execution sequence: 43 features
‚úì master_cfg.json saved for model 'transactions'
  Location: feature_store/transactions/master_cfg.json


## 8. Initialize Components + Execute + Store

In [9]:
# ============================================================
# SECTION 1: Initialize execution components
# ============================================================
detector = FeatureTypeDetector()
analyzer = FeatureAnalyzer(feature_store, detector)
calculator = FeatureCalculator()
groupby_processor = GroupByProcessor(calculator, detector)
executor = ModelExecutor(analyzer, groupby_processor)

print("‚úì Execution pipeline initialized")

# ============================================================
# SECTION 2: Execute model
# ============================================================
output = executor.execute_model(
    data_in=transactions_enriched,
    cfg_model=cfg_product,
    input_dataset_name=cfg_product['input_dataset_name']
)

# ============================================================
# SECTION 3: Store results in context
# ============================================================
ctx.set_model_output(cfg_product['model_name'], output, cfg_product)

print("\n‚úì Model executed successfully!")
print(f"  - Filters: {output['filters'].shape if output['filters'] is not None else 'None'}")
print(f"  - Attributes: {output['attrs'].shape if output['attrs'] is not None else 'None'}")
print(f"  - Filter columns: {len(output['exec_fltrs'])}")
print(f"  - Attribute columns: {len(output['exec_attrs'])}")

‚úì Execution pipeline initialized

‚úì Model executed successfully!
  - Filters: (609, 60)
  - Attributes: None
  - Filter columns: 59
  - Attribute columns: 0


## 9. View Results

In [10]:
print("Available Datasets:")
for name in ctx.list_datasets():
    df = ctx.get_dataset(name)
    print(f"  - {name}: {df.shape}")

# View filters (transaction-level results)
filters = ctx.get_model_filters(cfg_product['model_name'])
print(f"\n{cfg_product['model_name'].capitalize()} - Filters:")
display(filters.head())

# Attributes should be None for transaction-level
attrs = ctx.get_model_attrs(cfg_product['model_name'])
print(f"\n{cfg_product['model_name'].capitalize()} - Attributes:")
if attrs is not None:
    display(attrs.head())
else:
    print("None (transaction-level model has no aggregations)")

Available Datasets:
  - transactions_raw: (609, 17)
  - transactions_enriched: (609, 17)
  - transactions_filters: (609, 59)

Transactions - Filters:


Unnamed: 0,in_dt,in_product_id,in_quantity,in_price_total,in_trans_type,in_customer_id,in_description,in_category,in_unit_type,in_stock,...,cost_unit,cost_total,price_unit,price_total,margin_unit,margin_unit_pct,margin_unit_valid,margin_total,margin_total_pct,margin_total_valid
0,2025-10-01 01:02:00,prod8,2.0,52964.0,return,client13,product 8,category B,pack,61.0,...,18792.0,37585.0,26482.0,52964.0,7690.0,29.040001,True,15379.0,29.040001,True
1,2025-10-01 06:24:00,prod4,6.0,177195.0,sale,client6,product 4,category B,unit,30.0,...,21526.0,129155.0,29533.0,177195.0,8007.0,27.110001,True,48040.0,27.110001,True
2,2025-10-01 08:38:00,prod7,2.0,70492.0,return,client12,product 7,category A,unit,78.0,...,25754.0,51509.0,35246.0,70492.0,9492.0,26.93,True,18983.0,26.93,True
3,2025-10-01 09:59:00,prod2,4.0,86751.0,sale,client3,product 2,category A,unit,80.0,...,12947.0,51786.0,21688.0,86751.0,8741.0,40.299999,True,34965.0,40.310001,True
4,2025-10-01 10:07:00,prod3,3.0,76465.0,sale,client12,product 3,category B,unit,47.0,...,16943.0,5083.0,25488.0,76465.0,8545.0,33.529999,True,71382.0,93.349998,True



Transactions - Attributes:
None (transaction-level model has no aggregations)


## 10. Export to Excel

In [11]:
# Initialize exporter
exporter = ExcelExporter(ctx)

# Export model
output_file = f'outputs/{cfg_product["model_name"]}_export.xlsx'
exporter.export_model(cfg_product['model_name'], output_file, include_input=True)

print(f"‚úì Export complete!")
print(f"  File saved: {output_file}")
print("\n  Excel Structure:")
print("    Tab 1: transactions_enriched (input dataset)")
print(f"    Tab 2: {cfg_product['model_name']}_filters")
print(f"    Tab 3: {cfg_product['model_name']}_attrs (empty for transaction-level)")

‚úì Export complete!
  File saved: outputs/transactions_export.xlsx

  Excel Structure:
    Tab 1: transactions_enriched (input dataset)
    Tab 2: transactions_filters
    Tab 3: transactions_attrs (empty for transaction-level)


## 11. Save Context State

Save the complete context state for downstream notebooks (product, customer models):

In [12]:
from src.core.persistence import save_context_state

# Save context state (creates initial state for downstream notebooks)
state_dir = save_context_state(
    ctx=ctx,
    base_cfg=base_cfg,
    reuse_existing=False  # Always create fresh state for initial notebook
)

print(f"‚úì Context state saved!")
print(f"  - Location: {state_dir}")
print(f"  - Datasets saved: {len(ctx.datasets)}")
print(f"  - Available for reuse: {list(ctx.datasets.keys())}")
print(f"\nDownstream notebooks can load this state with:")
print(f"  from src.core.persistence import load_context_state")
print(f"  ctx, base_cfg = load_context_state('{state_dir}')")

‚úì Context state saved!
  - Location: data\context_states\test_client_20251022_121117
  - Datasets saved: 3
  - Available for reuse: ['transactions_raw', 'transactions_enriched', 'transactions_filters']

Downstream notebooks can load this state with:
  from src.core.persistence import load_context_state
  ctx, base_cfg = load_context_state('data\context_states\test_client_20251022_121117')
