# üéÑ 12 Days of Demos: North Pole Modernization Office (NPMO)

Welcome to the **North Pole Modernization Office**! As part of our initiative to move from handwritten letters and spreadsheets to an AI-driven Lakehouse, we need to ingest our legacy data.

This notebook initializes the environment for the **Model Context Protocol (MCP)** demo by:
1. Creating and or setting the Unity Catalog and Schema.
2. Ingesting synthetic CSV data (Gift Requests, Reindeer Telemetry, etc.) into Delta Tables.

*Let's get this data ready before Christmas Eve!* üéÖ

In [None]:
import os
from pathlib import Path
import shutil

# Configuration
# TODO: Update these values for your environment
catalog_name = "12daysofdemos"
schema_name = "npmo"

# Path where the CSV files are located in the repo
source_data_path = Path(f"{os.getcwd()}/data")
volume_data_path = f"/Volumes/{catalog_name}/{schema_name}/data"
print(source_data_path)
print(volume_data_path)

In [None]:
# Setup Catalog and Schema
# spark.sql(f"CREATE CATALOG IF NOT EXISTS {catalog_name}")
spark.sql(f"USE CATALOG {catalog_name}")
# spark.sql(f"CREATE SCHEMA IF NOT EXISTS {schema_name}")
spark.sql(f"USE SCHEMA {schema_name}")

spark.sql(f"CREATE VOLUME IF NOT EXISTS data")

print(f"Using Catalog: {catalog_name}, Schema: {schema_name}, Volume: data")

## üèóÔ∏è Ingest Holiday Themed Data to Delta
We are taking our raw CSV files‚Äîrepresenting everything from *Reindeer Telemetry* to *Gift Requests*‚Äîand loading them into managed Delta tables. This provides the foundation for our AI agents to query and analyze North Pole operations.

In [None]:
def ingest_csv_to_delta(table_name, file_name):
    # First copy file to a Volume so Spark can get it
    source_file_path = f"{source_data_path}/{file_name}"
    volume_file_path = f"{volume_data_path}/{file_name}"
    shutil.copy(source_file_path, volume_file_path)

    # Start ingestion
    print(f"Ingesting {volume_file_path} into table {table_name}...")
    try:
        # Read CSV with header and infer schema
        df = spark.read.format("csv") \
            .option("header", "true") \
            .option("inferSchema", "true") \
            .load(volume_file_path)
            
        # Write to Delta table
        df.write.format("delta") \
            .mode("overwrite") \
            .saveAsTable(table_name)
            
        print(f"‚úÖ Successfully created table: {catalog_name}.{schema_name}.{table_name}")
        print(f"   Row count: {df.count()}")
    except Exception as e:
        print(f"‚ùå Error ingesting {table_name}: {str(e)}")

# List of datasets to ingest
datasets = [
    ("gift_requests", "gift_requests.csv"),
    ("reindeer_telemetry", "reindeer_telemetry.csv"),
    ("workshop_production", "workshop_production.csv"),
    ("behavioral_analytics", "behavioral_analytics.csv"),
    ("delivery_optimization", "delivery_optimization.csv")
]

# Run ingestion
for table, file in datasets:
    ingest_csv_to_delta(table, file)

In [None]:
# Verify tables
display(spark.sql(f"SHOW TABLES IN {catalog_name}.{schema_name}"))