# CusFlow - Exploration Notebook

This notebook demonstrates the core capabilities of the CusFlow recommendation system.

## Contents
1. Data Generation
2. Feature Engineering
3. Model Training
4. Evaluation & Metrics
5. GenAI Features
6. A/B Simulation

In [None]:
# Setup
import sys
sys.path.insert(0, '..')

import numpy as np
import pandas as pd
from pathlib import Path

# CusFlow imports
from src.config import Domain, get_settings
from src.data.loaders import SyntheticDataGenerator, DataLoader
from src.ranking.lambdamart import LambdaMARTRanker
from src.ranking.feature_engineering import FeatureEngineer
from src.evaluation.metrics import RankingMetrics, ndcg_at_k

print("CusFlow loaded successfully!")

## 1. Data Generation

Generate synthetic data for hotels, wealth reports, or e-commerce.

In [None]:
# Generate hotel data
generator = SyntheticDataGenerator(domain=Domain.HOTEL, seed=42)

items = generator.generate_items(n_items=500)
users = generator.generate_users(n_users=200)
events = generator.generate_events(users, items, n_events=5000)

print(f"Generated {len(items)} items, {len(users)} users, {len(events)} events")

# Inspect sample item
sample_item = items[0]
print(f"\nSample Item:")
print(f"  ID: {sample_item.item_id}")
print(f"  Name: {sample_item.name}")
print(f"  Features: {sample_item.features.features}")

## 2. Model Training

Train a LambdaMART ranking model.

In [None]:
# Generate training examples
from collections import Counter

training_examples = list(generator.generate_training_data(users, items, events))
print(f"Generated {len(training_examples)} training examples")

# Convert to arrays
X = np.array([ex.features for ex in training_examples])
y = np.array([ex.relevance for ex in training_examples])
query_ids = [ex.query_id for ex in training_examples]
query_counts = Counter(query_ids)
groups = np.array([query_counts[qid] for qid in sorted(set(query_ids), key=query_ids.index)])

# Split data
n_train = int(len(groups) * 0.8)
train_size = sum(groups[:n_train])
X_train, X_val = X[:train_size], X[train_size:]
y_train, y_val = y[:train_size], y[train_size:]
groups_train, groups_val = groups[:n_train], groups[n_train:]

print(f"Training: {X_train.shape[0]} samples, {len(groups_train)} queries")
print(f"Validation: {X_val.shape[0]} samples, {len(groups_val)} queries")

In [None]:
# Train LambdaMART
model = LambdaMARTRanker(num_boost_round=100, early_stopping_rounds=20)

model.fit(
    X_train, y_train, groups_train,
    X_val=X_val, y_val=y_val, groups_val=groups_val,
)

print("\nTop 10 Feature Importance:")
for name, score in list(model.get_feature_importance(top_k=10).items())[:10]:
    print(f"  {name}: {score:.4f}")

## 3. Evaluation

Evaluate the model with standard ranking metrics.

In [None]:
# Evaluate
y_pred = model.predict(X_val)
metrics = RankingMetrics(cutoffs=[5, 10, 20])
results = metrics.evaluate(y_val, y_pred, groups_val)

print("Evaluation Results:")
print("-" * 30)
for metric, value in sorted(results.items()):
    print(f"{metric:15s}: {value:.4f}")

## Summary

This notebook demonstrated:
- Synthetic data generation for hotel recommendations
- LambdaMART model training
- Offline evaluation with NDCG, MAP, Recall

For more:
- See `scripts/train_model.py` for production training
- See `scripts/run_ab_sim.py` for A/B simulation
- Run `python -m src.cli serve` for the API