# üéÖ Welcome to the North Pole Modernization Office
## Santa's AI-Powered Letter Processing System

---

### üì¨ The Challenge
Every year, Santa receives **millions of letters** from children worldwide. Manual processing can't keep up!

### ‚ú® The Solution: Databricks AI Functions
The North Pole Modernization Office (NPMO) uses [**Databricks AI Functions**](https://docs.databricks.com/aws/en/large-language-models/ai-functions) - powerful built-in AI capabilities that analyze text using simple SQL. No ML expertise required!

**What You'll Learn:**
* üéØ `ai_classify()` - Categorize text (Naughty or Nice, Gift Departments)
* üí™ `ai_extract()` - Pull specific information (Dream gifts, Goodness claims)
* üéÑ `ai_analyze_sentiment()` - Measure excitement levels
* ‚úâÔ∏è `ai_gen()` - Generate personalized responses
* üåç `ai_translate()` - Convert to other languages
* üìä `ai_summarize()` - Create executive briefings
* üéØ `ai_query()` - Use custom AI models like Claude Sonnet 4

---

**Let's help Santa modernize his operations!** üöÄ

## ‚öôÔ∏è Configuration

**Before you begin:** Update the configuration below to match your environment.

The default values point to the demo dataset, but you can customize:
* **Catalog name** - Your Unity Catalog catalog
* **Schema names** - Where your raw data and processed results are stored
* **Sample size** - Number of letters to process in examples

üëá **Update the cell below with your values, then run it!**

In [0]:
# Configuration
# TODO: Update these values for your environment
TARGET_CATALOG = "main"
TARGET_SCHEMA = "dbrx_12daysofdemos"
TARGET_VOLUME = "raw_data_volume"
SAMPLE_TABLE_NAME = "holiday_letters_sample"
SOURCE_TABLE_NAME = "santa_letters_canada"

# Derived values
source_table = f"{TARGET_CATALOG}.{TARGET_SCHEMA}.{SOURCE_TABLE_NAME}"
sample_table = f"{TARGET_CATALOG}.{TARGET_SCHEMA}.{SAMPLE_TABLE_NAME}"

print(f"‚úÖ Configuration loaded:")
print(f"   üìä Source Table: {source_table}")
print(f"   üéØ Sample Table: {sample_table}")
print(f"")
print(f"üí° All queries will use these table references.")

In [0]:
%run "../00-init/load-data"

## üìä Step 1: Explore the Raw Letter Data

**What's happening:** We're looking at letters stored in a **Delta Lake table** - Databricks' high-performance storage format.

**The data includes:**
* Child's name, city, and province
* Date the letter was written
* Full letter text
* Requested gifts

üëá **Run the cell below to see sample letters!**

In [0]:
spark.sql(f"""
  
  SELECT
    *
  FROM 
    {source_table}
  LIMIT 
    10
  
""").display()

In [0]:
spark.sql(f"""
  
  CREATE OR REPLACE TABLE {sample_table} AS
  SELECT 
    name,
    province,
    city,
    date,
    letter,
    gifts
  FROM {source_table}
  ORDER BY RANDOM()
  LIMIT 100
  
""")

# üéÖ Part 1: Built-in AI Functions

---

## What Are AI Functions?

**AI Functions** let you apply artificial intelligence using simple SQL - no ML infrastructure needed!

| Function | Purpose | Example |
|----------|---------|----------|
| `ai_classify()` | Categorize text | "Naughty or Nice?" |
| `ai_extract()` | Pull information | "What's the dream gift?" |
| `ai_analyze_sentiment()` | Measure emotion | "How excited?" |
| `ai_gen()` | Generate text | "Write Santa's response" |
| `ai_translate()` | Convert language | "Translate to French" |
| `ai_summarize()` | Create summaries | "Daily briefing" |

**The Magic:** Databricks uses advanced Large Language Models (LLMs) behind the scenes. You just write SQL!

üìö **[Full AI Functions Documentation](https://docs.databricks.com/aws/en/large-language-models/ai-functions)**

---

### üéØ Naughty or Nice Classifier

**Using `ai_classify()`** to categorize behavior based on tone and behavior claims.

In [0]:
spark.sql(f"""
  
  SELECT 
    name,
    city,
    province,
    ai_classify(
      letter,
      ARRAY('Definitively Nice', 'Nice (Mostly)', 'On Thin Ice', 'Coal Candidate')
    ) AS nice_rating,
    letter
  FROM {sample_table}
  ORDER BY nice_rating
  LIMIT 10
  
""").display()

## üí™ Detecting "I've Been Good" Claims

**Using `ai_extract()`** to pull out:
* **proof_of_goodness** - Phrases they use ("I tried my best" vs "I've been on my best behavior")
* **confidence_level** - How confident they sound

In [0]:
spark.sql(f"""
  
  SELECT 
    name,
    city,
    ai_extract(
      letter,
      ARRAY('proof_of_goodness','name_of_sender')
    ) AS goodness_claims,
    letter
  FROM {sample_table}
  LIMIT 10
  
""").display()

## üéÑ Christmas Spirit-O-Meter

**Using `ai_analyze_sentiment()`** to measure holiday excitement:
* Positive - Excited, grateful, joyful
* Neutral - Matter-of-fact
* Negative - Disappointed, demanding

**Real-world use:** Companies analyze customer satisfaction, product reviews, and brand perception the same way!

In [0]:
spark.sql(f"""
  
  SELECT 
    name,
    city,
    ai_analyze_sentiment(letter) AS sentiment_score,
    letter
  FROM {sample_table}
  ORDER BY random() DESC
  LIMIT 10
  
""").display()

## üè≠ Auto-Sort Gifts to Workshop Departments

**Using `ai_classify()` on gifts** to route to specialized workshops:
* Electronics Lab üîå - iPads, gaming consoles
* Toy Factory üß∏ - Dolls, LEGO, plushies
* Sports Equipment Shed ‚õ∑Ô∏è - Bikes, skateboards
* Arts & Crafts Corner üé® - Art supplies, puzzles
* Luxury Item Vault üíé - High-end requests

**Business value:** Optimized production, inventory management, no lost requests!

In [0]:
spark.sql(f"""
  
  SELECT 
    name,
    gifts,
    ai_classify(
      gifts,
      ARRAY('Electronics Lab üîå', 'Toy Factory üß∏', 'Sports Equipment Shed ‚õ∑Ô∏è', 'Arts & Crafts Corner üé®', 'Luxury Item Vault üíé')
    ) AS workshop_department
  FROM {sample_table}
  LIMIT 10
  
""").display()

## üéÅ Extract the #1 Dream Gift

**Using `ai_extract()`** to find the top priority gift - usually mentioned first or with extra enthusiasm.

**Why it matters:** If Santa can't deliver everything, he knows which gift brings the most joy!

In [0]:
spark.sql(f"""
  
  SELECT 
    name,
    city,
    ai_extract(
      letter,
      ARRAY('top_priority_gift')
    ) AS dream_gift,
    gifts AS all_gifts
  FROM {sample_table}
  LIMIT 10
  
""").display()

## ‚úâÔ∏è Generate Personalized Santa Responses

**Using `ai_gen()`** to create warm, personalized responses that:
* Address the child by name
* Reference their city and gifts
* Maintain Santa's jolly tone

**Scaling personal touch:** Millions of personalized responses while maintaining North Pole magic! ‚ú®

In [0]:
spark.sql(f"""
  
  SELECT 
    name,
    city,
    ai_gen(
      CONCAT('Write a personalized response from Santa to ', name, ' from ', city, '. Reference their letter where they requested: ', gifts, '. Keep it warm, jolly, and under 100 words.')
    ) AS santa_response
  FROM {sample_table}
  LIMIT 5
  
""").display()

## üá´üá∑ Translate for International Elves

**Using `ai_translate()`** to convert letters to French for Quebec elves.

**Global operations:** Local elves work in their native language, preserving cultural nuances!

In [0]:
spark.sql(f"""
  
  SELECT 
    name,
    province,
    city,
    letter AS original_letter,
    ai_translate(letter, 'fr') AS french_translation
  FROM {sample_table}
  WHERE province = 'Quebec'
  LIMIT 5
  
""").display()

## üìä Mrs. Claus's Daily Briefing

**Using `ai_summarize()`** to create executive summaries from batches of letters. **CTE (Common Table Expression)** combines multiple letters for actionable insights.

In [0]:
spark.sql(f"""
  
  -- First, let's get a sample of letters to summarize
  WITH letter_batch AS (
    SELECT 
      CONCAT_WS('\n---\n', COLLECT_LIST(CONCAT(name, ' from ', city, ', ', province, ': ', LEFT(letter, 200)))) AS batch_text
    FROM {sample_table}
    LIMIT 10
  )
  SELECT 
    ai_summarize(batch_text, 1000) AS daily_briefing
  FROM letter_batch
  
""").display()

# üéØ Part 2: Advanced AI with ai_query()

---

## Beyond Built-in Functions: Custom AI Models

While built-in AI Functions are powerful, `ai_query()` gives you **more control**:

* Use **any AI model** (Claude, Llama, GPT, Gemini)
* Write **custom prompts** for specific tasks
* Get **structured JSON output** with guaranteed schemas

### üé® Two Approaches

**1Ô∏è‚É£ Simple Text Output**
```sql
ai_query('databricks-claude-sonnet-4', 'Analyze this letter...')
```

**2Ô∏è‚É£ Structured JSON Output**
```sql
ai_query('model', 'prompt', responseFormat => '{...json_schema...}')
```
Define an exact schema, get guaranteed structured data.

---

### üéØ Priority Classification Example

We'll use **Claude Sonnet 4** to classify letters by priority (1-5) with detailed reasoning.

In [0]:
spark.sql(f"""
  
  SELECT
    name,
    city,
    ai_query(
      'databricks-claude-sonnet-4',
      CONCAT(
        'Analyze this letter to Santa and classify it across multiple dimensions. ',
        'Letter to analyze: ', letter
      )
    ) AS ai_response,
    gifts
  FROM {sample_table}
  LIMIT 10
  
""").display()

In [0]:
spark.sql(f"""
  
  -- Step 1: Use a CTE ("classified") to call the Claude Sonnet 4 model via ai_query for each letter,
  --         requesting a structured JSON classification across multiple dimensions (priority, behavior, tone, etc.)
  WITH classified AS (
    SELECT 
      name,
      city,
      ai_query(
        'databricks-claude-sonnet-4',
        CONCAT(
          'Analyze this letter to Santa and classify it across multiple dimensions. ',
          'Letter to analyze: ', letter
        ),
        responseFormat => '{
          "type": "json_schema",
          "json_schema": {
            "name": "letter_priority_classification",
            "schema": {
              "type": "object",
              "properties": {
                "priority_level": {
                  "type": "integer",
                  "enum": [1, 2, 3, 4, 5],
                  "description": "Priority from 1 (urgent/special) to 5 (lowest priority)"
                },
                "behavior_score": {
                  "type": "string",
                  "enum": ["excellent", "good", "average", "needs_improvement"],
                  "description": "Assessment of claimed behavior throughout the year"
                },
                "letter_tone": {
                  "type": "string",
                  "enum": ["polite_grateful", "casual_friendly", "neutral", "demanding"],
                  "description": "Overall tone and politeness of the letter"
                },
                "gift_reasonableness": {
                  "type": "string",
                  "enum": ["very_reasonable", "reasonable", "ambitious", "unrealistic"],
                  "description": "How reasonable are the gift requests"
                },
                "reasoning": {
                  "type": "string",
                  "description": "Brief explanation for the priority assignment"
                }
              },
              "required": ["priority_level", "behavior_score", "letter_tone", "gift_reasonableness", "reasoning"]
            },
            "strict": true
          }
        }'
      ) AS priority_json, -- The model's structured JSON output for each letter
      gifts
    FROM {sample_table}
    LIMIT 10 -- Limit to 10 letters for demonstration
  )
  -- Step 2: Parse the JSON output from the model into individual columns for analysis and display
  SELECT 
    name,
    city,
    from_json(priority_json, 'priority_level INT, behavior_score STRING, letter_tone STRING, gift_reasonableness STRING, reasoning STRING').priority_level AS priority, -- Numeric priority level
    from_json(priority_json, 'priority_level INT, behavior_score STRING, letter_tone STRING, gift_reasonableness STRING, reasoning STRING').behavior_score AS behavior, -- Behavior assessment
    from_json(priority_json, 'priority_level INT, behavior_score STRING, letter_tone STRING, gift_reasonableness STRING, reasoning STRING').letter_tone AS tone, -- Letter tone
    from_json(priority_json, 'priority_level INT, behavior_score STRING, letter_tone STRING, gift_reasonableness STRING, reasoning STRING').gift_reasonableness AS gift_rating, -- Reasonableness of gift requests
    from_json(priority_json, 'priority_level INT, behavior_score STRING, letter_tone STRING, gift_reasonableness STRING, reasoning STRING').reasoning AS reason, -- Model's reasoning for priority
    gifts
  FROM classified
  ORDER BY priority -- Sort results by priority level
  
""").display()

## üìä Structured Priority Classification

**Approach #2:** Guaranteed structured data with JSON schema

### üéØ What We Get

For each letter, Claude returns JSON with:
```json
{
  "priority_level": 2,
  "behavior_score": "good",
  "letter_tone": "polite_grateful",
  "gift_reasonableness": "very_reasonable",
  "reasoning": "Margaret's letter demonstrates..."
}
```

### üîí Schema Validation

The `responseFormat` parameter defines:
* **Exact field names** - No typos
* **Data types** - Integer, string, etc.
* **Allowed values** - Enums ensure consistency
* **Required fields** - No missing data

### üí° Why This Matters

‚úÖ **Reliable parsing** - No regex needed  
‚úÖ **Type safety** - Numbers are numbers  
‚úÖ **Easy analysis** - Can aggregate, filter, join  
‚úÖ **BI compatible** - Works with dashboards

---

## üîç Parsing JSON for Analysis

**Two-step process:**

1. **CTE:** Call AI model, store JSON response
2. **Parse:** Use `from_json()` to extract fields into columns

**Result:** AI output becomes queryable data you can sort, filter, aggregate, and visualize!

# üéä Congratulations!

---

## üéì What You've Learned

### ‚úÖ Built-in AI Functions
* `ai_classify()`
* `ai_extract()`
* `ai_analyze_sentiment()`
* `ai_gen()`
* `ai_translate()`
* `ai_summarize()`

### ‚úÖ Advanced Custom AI
* `ai_query()` with custom prompts
* Structured JSON output with schemas
* Parsing AI results into queryable columns

---

## üéØ Next Steps

* **Modify prompts** - Change classification categories
* **Scale up** - Remove `LIMIT 10` and process all letters
* **Build dashboard** - Visualize in Databricks AI/BI
* **Automate** - Create pipelines with Databricks Workflows

### üìö Resources
* [AI Functions Documentation](https://docs.databricks.com/aws/en/large-language-models/ai-functions)
* [Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/index.html)
* [Databricks Workflows](https://docs.databricks.com/en/workflows/index.html)

---

### üéÑ Happy Holidays from the North Pole Team! üéÑ