# Chapter 10: Pipeline Generation

Generate production-ready pipeline code from exploration findings.

**Generation Targets:**
1. **Local (Feast + MLFlow)** - Local feature store and experiment tracking
2. **Databricks (FS + MLFlow)** - Unity Catalog, DLT, Feature Store, MLFlow
3. **LLM Documentation** - Markdown files for AI-assisted development

**Output Formats:**
- Python files (`.py`)
- Jupyter notebooks (`.ipynb`)

---

## 10.1 Configuration

In [1]:
from pathlib import Path
from enum import Enum

class GenerationTarget(Enum):
    LOCAL_FEAST_MLFLOW = "local"
    DATABRICKS = "databricks"
    LLM_DOCS = "llm_docs"

class OutputFormat(Enum):
    PYTHON = "py"
    NOTEBOOK = "ipynb"

# === USER CONFIGURATION ===
PIPELINE_NAME = "customer_churn"
GENERATION_TARGET = GenerationTarget.LOCAL_FEAST_MLFLOW
OUTPUT_FORMAT = OutputFormat.PYTHON

# Paths
# FINDINGS_DIR imported from customer_retention.core.config.experiments
OUTPUT_BASE_DIR = Path("../generated_pipelines")

# Databricks settings (only used when GENERATION_TARGET == DATABRICKS)
DATABRICKS_CATALOG = "main"
DATABRICKS_SCHEMA = "ml_features"

print(f"Pipeline: {PIPELINE_NAME}")
print(f"Target: {GENERATION_TARGET.value}")
print(f"Format: {OUTPUT_FORMAT.value}")

Pipeline: customer_churn
Target: local
Format: py


## 10.2 Load Findings and Recommendations

In [2]:
import yaml
from customer_retention.analysis.auto_explorer import ExplorationFindings
from customer_retention.analysis.auto_explorer.layered_recommendations import RecommendationRegistry
from customer_retention.core.config.experiments import FINDINGS_DIR, EXPERIMENTS_DIR, OUTPUT_DIR, setup_experiments_structure

def load_findings_and_recommendations(findings_dir: Path):
    findings_files = sorted(
        [f for f in findings_dir.glob("*_findings.yaml") if "multi_dataset" not in f.name],
        key=lambda f: f.stat().st_mtime, reverse=True
    )
    if not findings_files:
        raise FileNotFoundError(f"No findings in {findings_dir}. Run exploration notebooks first.")
    
    findings = ExplorationFindings.load(str(findings_files[0]))
    
    # Look for recommendations file matching the findings file pattern
    # Step 06 saves as: {name}_recommendations.yaml (matching {name}_findings.yaml)
    findings_name = findings_files[0].stem.replace("_findings", "")
    recommendations_path = findings_dir / f"{findings_name}_recommendations.yaml"
    
    # Fallback to generic recommendations.yaml if not found
    if not recommendations_path.exists():
        recommendations_path = findings_dir / "recommendations.yaml"
    
    # Final fallback: find any *_recommendations.yaml
    if not recommendations_path.exists():
        rec_files = sorted(findings_dir.glob("*_recommendations.yaml"), 
                          key=lambda f: f.stat().st_mtime, reverse=True)
        if rec_files:
            recommendations_path = rec_files[0]
    
    registry = None
    if recommendations_path.exists():
        with open(recommendations_path) as f:
            registry = RecommendationRegistry.from_dict(yaml.safe_load(f))
        print(f"Loaded recommendations from: {recommendations_path.name}")
    
    multi_dataset_path = findings_dir / "multi_dataset_findings.yaml"
    multi_dataset = None
    if multi_dataset_path.exists():
        with open(multi_dataset_path) as f:
            multi_dataset = yaml.safe_load(f)
    
    return findings, registry, multi_dataset

findings, registry, multi_dataset = load_findings_and_recommendations(FINDINGS_DIR)

print(f"Loaded: {findings.source_path}")
print(f"Rows: {findings.row_count:,} | Columns: {findings.column_count}")
print(f"Target: {findings.target_column}")
print(f"Recommendations: {'Loaded' if registry else 'Not found'}")
print(f"Multi-dataset: {'Loaded' if multi_dataset else 'Not found'}")

Loaded recommendations from: customer_emails_408768_aggregated_d24886_recommendations.yaml
Loaded: ../experiments/findings/customer_emails_408768_aggregated.parquet
Rows: 4,998 | Columns: 68
Target: target
Recommendations: Loaded
Multi-dataset: Loaded


## 10.3 Review Layered Recommendations

Recommendations are organized by medallion layer:
- **Bronze**: null_handling, outlier_handling, type_conversions, deduplication, filtering, text_processing
- **Silver**: joins, aggregations, derived_columns
- **Gold**: encoding, scaling, feature_selection, transformations

In [3]:
def display_recommendations(registry: RecommendationRegistry):
    if not registry:
        print("No recommendations loaded. Run notebooks 02-07 first.")
        return
    
    for layer in ["bronze", "silver", "gold"]:
        recs = registry.get_by_layer(layer)
        print(f"\n{layer.upper()} ({len(recs)} recommendations):")
        print("-" * 50)
        for rec in recs[:5]:
            print(f"  [{rec.category}] {rec.target_column}: {rec.action}")
        if len(recs) > 5:
            print(f"  ... and {len(recs) - 5} more")

display_recommendations(registry)


BRONZE (50 recommendations):
--------------------------------------------------
  [null] time_to_open_hours_mean_180d: impute
  [null] time_to_open_hours_max_180d: impute
  [null] send_hour_mean_180d: impute
  [null] send_hour_max_180d: impute
  [null] opened_mean_180d: impute
  ... and 45 more

SILVER (8 recommendations):
--------------------------------------------------
  [derived] event_count_180d_to_event_count_180d_ratio: ratio
  [derived] event_count_180d_x_event_count_365d: interaction
  [derived] event_count_180d_x_event_count_all_time: interaction
  [derived] event_count_180d_x_time_to_open_hours_sum_180d: interaction
  [derived] event_count_365d_x_event_count_all_time: interaction
  ... and 3 more

GOLD (295 recommendations):
--------------------------------------------------
  [encoding] lifecycle_quadrant: one_hot
  [encoding] lifecycle_quadrant: onehot
  [scaling] send_hour_mean_180d: standard
  [scaling] send_hour_max_180d: standard
  [scaling] send_hour_mean_365d: stan

---

## 10.4 Generate Pipeline

Select generation based on configured target.

In [4]:
import os

output_dir = OUTPUT_BASE_DIR / GENERATION_TARGET.value / PIPELINE_NAME
output_dir.mkdir(parents=True, exist_ok=True)

print(f"Output directory: {output_dir}")

Output directory: ../generated_pipelines/local/customer_churn


### Option A: Local (Feast + MLFlow)

In [5]:
if GENERATION_TARGET == GenerationTarget.LOCAL_FEAST_MLFLOW:
    from customer_retention.generators.spec_generator import MLflowPipelineGenerator, MLflowConfig
    from customer_retention.generators.pipeline_generator import PipelineGenerator
    
    mlflow_config = MLflowConfig(
        tracking_uri="./mlruns",
        experiment_name=PIPELINE_NAME,
        log_data_quality=True,
        nested_runs=True
    )
    
    mlflow_gen = MLflowPipelineGenerator(mlflow_config=mlflow_config, output_dir=str(output_dir))
    
    if OUTPUT_FORMAT == OutputFormat.PYTHON:
        saved = mlflow_gen.save_all(findings)
        print("Generated MLflow pipeline files:")
        for f in saved:
            print(f"  {f}")
    
    if multi_dataset:
        pipeline_gen = PipelineGenerator(
            findings_dir=str(FINDINGS_DIR),
            output_dir=str(output_dir),
            pipeline_name=PIPELINE_NAME
        )
        orch_files = pipeline_gen.generate()
        print("\nGenerated pipeline files (Bronze/Silver/Gold/Training):")
        for f in orch_files:
            print(f"  {f}")
else:
    print(f"Skipping Local generation (target is {GENERATION_TARGET.value})")

Generated MLflow pipeline files:
  pipeline.py
  requirements.txt

Generated pipeline files (Bronze/Silver/Gold/Training):
  ../generated_pipelines/local/customer_churn/run_all.py
  ../generated_pipelines/local/customer_churn/config.py
  ../generated_pipelines/local/customer_churn/bronze/bronze_customer_emails_aggregated.py
  ../generated_pipelines/local/customer_churn/silver/silver_merge.py
  ../generated_pipelines/local/customer_churn/gold/gold_features.py
  ../generated_pipelines/local/customer_churn/training/ml_experiment.py
  ../generated_pipelines/local/customer_churn/pipeline_runner.py
  ../generated_pipelines/local/customer_churn/workflow.json
  ../generated_pipelines/local/customer_churn/feature_repo/feature_store.yaml
  ../generated_pipelines/local/customer_churn/feature_repo/features.py
  ../generated_pipelines/local/customer_churn/scoring/run_scoring.py
  ../generated_pipelines/local/customer_churn/scoring/scoring_dashboard.ipynb


### Option B: Databricks (FS + MLFlow)

In [6]:
if GENERATION_TARGET == GenerationTarget.DATABRICKS:
    from customer_retention.generators.spec_generator import DatabricksSpecGenerator, PipelineSpec, SourceSpec
    
    spec = PipelineSpec(
        name=PIPELINE_NAME,
        version="1.0.0",
        sources=[SourceSpec(
            name=findings.source_path.split("/")[-1].replace(".csv", ""),
            path=findings.source_path,
            format=findings.source_format
        )]
    )
    
    if findings.target_column:
        from customer_retention.generators.spec_generator import ModelSpec
        spec.model_config = ModelSpec(
            name=f"{PIPELINE_NAME}_model",
            model_type="gradient_boosting",
            target_column=findings.target_column
        )
    
    db_gen = DatabricksSpecGenerator(
        catalog=DATABRICKS_CATALOG,
        schema=DATABRICKS_SCHEMA,
        output_dir=str(output_dir)
    )
    
    saved = db_gen.save_all(spec)
    print("Generated Databricks artifacts:")
    for f in saved:
        print(f"  {f}")
else:
    print(f"Skipping Databricks generation (target is {GENERATION_TARGET.value})")

Skipping Databricks generation (target is local)


### Option C: LLM Documentation

In [7]:
if GENERATION_TARGET == GenerationTarget.LLM_DOCS:
    from customer_retention.analysis.auto_explorer import RecommendationEngine
    
    recommender = RecommendationEngine()
    target_rec = recommender.recommend_target(findings)
    feature_recs = recommender.recommend_features(findings)
    cleaning_recs = recommender.recommend_cleaning(findings)
    
    docs_dir = output_dir / "docs"
    docs_dir.mkdir(parents=True, exist_ok=True)
    
    # 1. Overview
    overview = f"""# {PIPELINE_NAME} Pipeline Overview

## Data Source
- **Path**: {findings.source_path}
- **Format**: {findings.source_format}
- **Rows**: {findings.row_count:,}
- **Columns**: {findings.column_count}
- **Quality Score**: {findings.overall_quality_score:.1f}/100

## Target Variable
- **Column**: {target_rec.column_name}
- **Type**: {target_rec.target_type}
- **Rationale**: {target_rec.rationale}

## Column Types
| Column | Type | Nulls | Unique |
|--------|------|-------|--------|
"""
    for name, col in list(findings.columns.items())[:20]:
        overview += f"| {name} | {col.inferred_type.value} | {col.null_percentage:.1f}% | {col.unique_count} |\n"
    (docs_dir / "01_overview.md").write_text(overview)
    
    # 2. Bronze layer - separate file per source
    if registry and registry.sources:
        for source_name, bronze_recs in registry.sources.items():
            bronze_doc = f"""# Bronze Layer - {source_name}

## Source File
`{bronze_recs.source_file}`

## Null Handling
"""
            for rec in bronze_recs.null_handling:
                bronze_doc += f"- `{rec.target_column}`: {rec.action} ({rec.parameters.get('strategy', '')}) - {rec.rationale}\n"
            
            bronze_doc += "\n## Outlier Handling\n"
            for rec in bronze_recs.outlier_handling:
                bronze_doc += f"- `{rec.target_column}`: {rec.action} - {rec.rationale}\n"
            
            bronze_doc += "\n## Type Conversions\n"
            for rec in bronze_recs.type_conversions:
                bronze_doc += f"- `{rec.target_column}`: {rec.action} - {rec.rationale}\n"
            
            bronze_doc += "\n## Deduplication\n"
            for rec in bronze_recs.deduplication:
                bronze_doc += f"- `{rec.target_column}`: {rec.action} - {rec.rationale}\n"
            
            bronze_doc += "\n## Filtering\n"
            for rec in bronze_recs.filtering:
                bronze_doc += f"- `{rec.target_column}`: {rec.action} - {rec.rationale}\n"
            
            bronze_doc += "\n## Text Processing\n"
            for rec in bronze_recs.text_processing:
                bronze_doc += f"- `{rec.target_column}`: {rec.action} - {rec.rationale}\n"
            
            safe_name = source_name.replace(" ", "_").lower()
            (docs_dir / f"02_bronze_cleaning_{safe_name}.md").write_text(bronze_doc)
    else:
        bronze_doc = f"""# Bronze Layer - Data Cleaning

## Cleaning Recommendations
"""
        for rec in cleaning_recs:
            bronze_doc += f"\n### {rec.column_name}\n- **Strategy**: {rec.strategy}\n- **Severity**: {rec.severity}\n- **Rationale**: {rec.rationale}\n"
        (docs_dir / "02_bronze_cleaning.md").write_text(bronze_doc)
    
    # 3. Silver layer
    silver_doc = """# Silver Layer - Feature Engineering

## Aggregations and Joins
"""
    if registry and registry.silver:
        silver_doc += "\n### Joins\n"
        for rec in registry.silver.joins:
            silver_doc += f"- {rec.parameters.get('left_source', '')} ⟷ {rec.parameters.get('right_source', '')} on `{rec.parameters.get('join_keys', [])}`\n"
        
        silver_doc += "\n### Aggregations\n"
        for rec in registry.silver.aggregations:
            silver_doc += f"- `{rec.target_column}`: {rec.action} - windows: {rec.parameters.get('windows', [])}\n"
        
        silver_doc += "\n### Derived Columns\n"
        for rec in registry.silver.derived_columns:
            silver_doc += f"- `{rec.target_column}`: {rec.parameters.get('expression', rec.action)}\n"
    else:
        silver_doc += "\nNo silver-layer recommendations found.\n"
    (docs_dir / "03_silver_features.md").write_text(silver_doc)
    
    # 4. Gold layer
    gold_doc = """# Gold Layer - ML Features

## Feature Recommendations
"""
    for rec in feature_recs[:15]:
        gold_doc += f"\n### {rec.feature_name}\n- **Source**: {rec.source_column}\n- **Type**: {rec.feature_type}\n- **Description**: {rec.description}\n"
    
    if registry and registry.gold:
        gold_doc += "\n## Encoding\n"
        for rec in registry.gold.encoding:
            gold_doc += f"- `{rec.target_column}`: {rec.parameters.get('method', rec.action)}\n"
        
        gold_doc += "\n## Scaling\n"
        for rec in registry.gold.scaling:
            gold_doc += f"- `{rec.target_column}`: {rec.parameters.get('method', rec.action)}\n"
        
        gold_doc += "\n## Feature Selection\n"
        for rec in registry.gold.feature_selection:
            gold_doc += f"- `{rec.target_column}`: {rec.action} - {rec.rationale}\n"
        
        gold_doc += "\n## Transformations\n"
        for rec in registry.gold.transformations:
            gold_doc += f"- `{rec.target_column}`: {rec.action} - {rec.parameters}\n"
    (docs_dir / "04_gold_ml_features.md").write_text(gold_doc)
    
    # 5. Training
    training_doc = f"""# Model Training

## Target
- **Column**: {target_rec.column_name}
- **Type**: {target_rec.target_type}

## Recommended Models
1. **Gradient Boosting** - Good for tabular data with mixed types
2. **Random Forest** - Robust baseline, handles missing values
3. **Logistic Regression** - Interpretable, good for imbalanced data

## Evaluation Metrics
- ROC-AUC (primary)
- Precision/Recall at threshold
- F1 Score
"""
    (docs_dir / "05_training.md").write_text(training_doc)
    
    print("Generated LLM documentation:")
    for f in sorted(docs_dir.glob("*.md")):
        print(f"  {f.name}")
else:
    print(f"Skipping LLM docs generation (target is {GENERATION_TARGET.value})")

Skipping LLM docs generation (target is local)


---

## 10.5 Convert to Notebooks (Optional)

In [8]:
import json

def py_to_notebook(py_path: Path):
    content = py_path.read_text()
    cells = []
    current_lines = []
    
    for line in content.split("\n"):
        if line.startswith("# %% ") or line.startswith("# %%\n"):
            if current_lines:
                cells.append({"cell_type": "code", "metadata": {}, "source": current_lines, "outputs": [], "execution_count": None})
                current_lines = []
            title = line.replace("# %% ", "").strip()
            if title:
                cells.append({"cell_type": "markdown", "metadata": {}, "source": [f"## {title}"]})
        else:
            current_lines.append(line + "\n")
    
    if current_lines:
        cells.append({"cell_type": "code", "metadata": {}, "source": current_lines, "outputs": [], "execution_count": None})
    
    notebook = {
        "cells": cells,
        "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}},
        "nbformat": 4, "nbformat_minor": 4
    }
    
    out_path = py_path.with_suffix(".ipynb")
    out_path.write_text(json.dumps(notebook, indent=1))
    return out_path

if OUTPUT_FORMAT == OutputFormat.NOTEBOOK:
    print("Converting Python files to notebooks...")
    for py_file in output_dir.rglob("*.py"):
        if py_file.name != "__init__.py":
            nb_path = py_to_notebook(py_file)
            print(f"  {py_file.name} -> {nb_path.name}")
else:
    print("Output format is Python. Set OUTPUT_FORMAT = OutputFormat.NOTEBOOK to convert.")

Output format is Python. Set OUTPUT_FORMAT = OutputFormat.NOTEBOOK to convert.


---

## 10.6 Run Pipeline

Single command runs everything: Bronze (parallel) → Silver → Gold → Training → MLflow UI (auto-opens browser).

In [9]:
# Uncomment below to run the pipeline after generation
# RUN_PIPELINE = True

RUN_PIPELINE = False

run_all_path = output_dir / "run_all.py"

if RUN_PIPELINE and GENERATION_TARGET == GenerationTarget.LOCAL_FEAST_MLFLOW:
    import subprocess
    if run_all_path.exists():
        print(f"Running: python {run_all_path}")
        print("Pipeline will run and MLflow UI will open automatically...")
        subprocess.run(["python", str(run_all_path)], cwd=run_all_path.parent)
    else:
        print(f"run_all.py not found. Generate first by running cells above.")
else:
    print("To run the complete pipeline:")
    print(f"\n  cd {output_dir}")
    print(f"  python run_all.py")
    print(f"\nThis will:")
    print("  1. Run Bronze layers (parallel)")
    print("  2. Run Silver merge")
    print("  3. Run Gold features")
    print("  4. Train models with MLflow")
    print("  5. Auto-start MLflow UI and open browser")
    print("  6. Press Ctrl+C to stop MLflow UI when done")

To run the complete pipeline:

  cd ../generated_pipelines/local/customer_churn
  python run_all.py

This will:
  1. Run Bronze layers (parallel)
  2. Run Silver merge
  3. Run Gold features
  4. Train models with MLflow
  5. Auto-start MLflow UI and open browser
  6. Press Ctrl+C to stop MLflow UI when done


---

## 10.7 Summary

In [10]:
print("Generated Artifacts Summary")
print("=" * 60)
print(f"Pipeline: {PIPELINE_NAME}")
print(f"Target: {GENERATION_TARGET.value}")
print(f"Format: {OUTPUT_FORMAT.value}")
print(f"Output: {output_dir}")
print()

def show_tree(path: Path, prefix: str = ""):
    items = sorted(path.iterdir(), key=lambda p: (p.is_file(), p.name))
    for i, item in enumerate(items):
        is_last = i == len(items) - 1
        connector = "└── " if is_last else "├── "
        if item.is_file():
            size = item.stat().st_size
            print(f"{prefix}{connector}{item.name} ({size:,} bytes)")
        else:
            print(f"{prefix}{connector}{item.name}/")
            show_tree(item, prefix + ("    " if is_last else "│   "))

if output_dir.exists():
    show_tree(output_dir)

Generated Artifacts Summary
Pipeline: customer_churn
Target: local
Format: py
Output: ../generated_pipelines/local/customer_churn

├── bronze/
│   └── bronze_customer_emails_aggregated.py (769 bytes)
├── feature_repo/
│   ├── data/
│   ├── feature_store.yaml (188 bytes)
│   └── features.py (1,112 bytes)
├── gold/
│   └── gold_features.py (2,780 bytes)
├── scoring/
│   ├── run_scoring.py (6,071 bytes)
│   └── scoring_dashboard.ipynb (15,645 bytes)
├── silver/
│   └── silver_merge.py (692 bytes)
├── training/
│   └── ml_experiment.py (9,523 bytes)
├── config.py (1,558 bytes)
├── pipeline.py (8,519 bytes)
├── pipeline_runner.py (845 bytes)
├── requirements.txt (111 bytes)
├── run_all.py (2,489 bytes)
└── workflow.json (968 bytes)


---

## 10.8 Recommendations Hash

The recommendations hash is a unique identifier for the gold layer feature engineering configuration. It enables experiment tracking and reproducibility.

In [11]:
if registry:
    recommendations_hash = registry.compute_recommendations_hash()
    print("Recommendations Hash")
    print("=" * 60)
    print(f"Hash: {recommendations_hash}")
    print(f"Full version tag: v1.0.0_{recommendations_hash}")
    print()
    print("This hash uniquely identifies the gold layer configuration:")
    print(f"  - Encodings: {len(registry.gold.encoding) if registry.gold else 0}")
    print(f"  - Scalings: {len(registry.gold.scaling) if registry.gold else 0}")
    print(f"  - Transformations: {len(registry.gold.transformations) if registry.gold else 0}")
    print(f"  - Feature selections: {len(registry.gold.feature_selection) if registry.gold else 0}")
    
    # Show what's in each layer for debugging
    print()
    print("Recommendations by layer:")
    for layer in ["bronze", "silver", "gold"]:
        recs = registry.get_by_layer(layer)
        print(f"  {layer.upper()}: {len(recs)} recommendations")
        if recs and layer == "gold":
            for rec in recs[:3]:
                print(f"    - [{rec.category}] {rec.target_column}: {rec.action}")
            if len(recs) > 3:
                print(f"    ... and {len(recs) - 3} more")
    
    # Check if gold layer exists but is empty
    if registry.gold:
        print(f"\n✓ Gold layer initialized (target: {registry.gold.target_column})")
    else:
        print("\n⚠ Gold layer not initialized - run step 06 first")
    
    print()
    print("Use this hash to:")
    print("  - Track MLflow experiments (tag: recommendations_hash)")
    print("  - Version Feast feature views (tag in feature_store)")
    print("  - Return to a specific feature engineering configuration")
else:
    print("No recommendations loaded - hash not available")
    print("Run notebooks 02-07 first, then re-run this notebook.")

Recommendations Hash
Hash: 9d2a86e3
Full version tag: v1.0.0_9d2a86e3

This hash uniquely identifies the gold layer configuration:
  - Encodings: 2
  - Scalings: 6
  - Transformations: 100
  - Feature selections: 187

Recommendations by layer:
  BRONZE: 50 recommendations
  SILVER: 8 recommendations
  GOLD: 295 recommendations
    - [encoding] lifecycle_quadrant: one_hot
    - [encoding] lifecycle_quadrant: onehot
    - [scaling] send_hour_mean_180d: standard
    ... and 292 more

✓ Gold layer initialized (target: target)

Use this hash to:
  - Track MLflow experiments (tag: recommendations_hash)
  - Version Feast feature views (tag in feature_store)
  - Return to a specific feature engineering configuration


---

## 10.9 Feast Feature Store Validation

Check what's registered in Feast after running the pipeline.

---

## 10.10 Next Steps

### Run Pipeline (Single Command)
```bash
cd ../generated_pipelines/local/customer_churn
python run_all.py
```

This single command:
1. Runs Bronze layers in **parallel**
2. Runs Silver merge
3. Runs Gold features  
4. Trains models with MLflow tracking
5. **Auto-starts MLflow UI** and opens browser
6. Press `Ctrl+C` to stop when done

### Generated Structure
```
generated_pipelines/local/{pipeline}/
├── run_all.py          # Single entry point
├── config.py           # Configuration (includes RECOMMENDATIONS_HASH)
├── bronze/
│   └── bronze_*.py     # Parallel execution
├── silver/
│   └── silver_merge.py
├── gold/
│   └── gold_features.py  # Includes feature version tag
├── training/
│   └── ml_experiment.py  # MLflow tags with recommendations_hash
├── pipeline.py         # Standalone pipeline script
└── requirements.txt
```

### Tracking Your Experiment
After running, you can find your experiment by:
- **MLflow UI**: Filter by tag `recommendations_hash = <your_hash>`
- **Feast**: Check feature view tags for `recommendations_hash`
- **Return to config**: The hash uniquely identifies the gold layer settings

---

## Complete!