[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](
https://colab.research.google.com/github/Simone-Alghisi/HMD-Lab/blob/master/notebooks/2_nlu.ipynb)

On Colab:
1. Switch to a GPU Runtime by clicking on *Runtime > Change runtime type > T4 GPU*
2. Run the cell below

In [None]:
# For Google Colab only
!git clone https://github.com/Simone-Alghisi/HMD-Lab.git
%cd /content/HMD-Lab/notebooks 

![system_architecture](./assets/system.jpg)

# Natural Language Understanding (NLU)

Natural Language Understanding (NLU) is the component that, given a user's free-text utterance, performs Slot Filling (extracting the corresponding value) and Intent Classification.

- *Intent* — a coarse surrogate of the meaning of the sentence that. In the Amazon Alexa's Framework: "A representation of the action that fulfills a customer's spoken request". Examples: 
  - `pizza_ordering`
  - `drink_ordering`
  - `out_of_domain`
  - ...
- *Slots* — a set of parameters that are required to fulfill an intent. Examples for `pizza_ordering`: 
  - `pizza_size`
  - `pizza_type`
  - `pizza_count`
  - ...
- *Values* — the set of valid attributes for a slot (slot: value). Examples: 
  - `pizza_size`: `medium`
  - `pizza_type`: `margherita`
  - `pizza_count`: `2`
  - ...

Why this matters:
- The intent tells the system what to do next (e.g., collect order details vs. provide a menu).
- Slots capture the pieces of information you need to complete the intent (e.g., size, type, quantity).
- Missing Values are used by the system to understand which Slots are required to fulfill an intent.

## Question

You want to design an OrderBot to collect the user's pizza order.

Consider the following sentence: *"I'd like a medium margherita pizza, please."*

1. What is the intent?
2. What are the values in the user request?
3. Which slots are missing from the user request to fulfill its intent?


```json
{
  "intent": "",
  "slots": {
  }
}
```

### Solution

Consider the following sentence: *"I'd like a medium margherita pizza, please."*

```json
{
  "intent": "pizza_ordering",
  "slots": {
    "pizza_size": "medium",
    "pizza_type": "margherita",
    "pizza_count": null
  }
}
```

## Code

In [1]:
import sys

sys.path.append("..")

from utils import MODELS
from transformers import AutoTokenizer

model_name, InitModel, prepare_text = MODELS["qwen3"]

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = InitModel(
    model_name,
    dtype="auto",
    device_map="cuda:0",
)

  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  1.97it/s]


In [2]:
import torch

from notebooks.notebook_utils import display_conversation
from models.qwen3 import prepare_text

task_prompt = """Identify the user intent from this list: 
[pizza_ordering, drink_ordering, out_of_domain].

If the intent is pizza_ordering, extract the following slot values from the user input
- pizza_size, the size of the pizza
- pizza_type, the type of pizza
- pizza_count, the number of pizzas as a numeric value.
If no values are present in the user input you have to put 'null' as value.

Only output the json object without any additional text.
The json format is: 
{
    "intent": "...", 
    "slots": {
        "slot": "value"
    }
}"""

In [3]:
user_message = "I would like a medium margherita pizza, please."

zero_shot = [
    {
        "role": "system", 
        "content": task_prompt
    }
]

text = prepare_text(user_message, tokenizer, zero_shot)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

with torch.no_grad():
    generated_ids = model.generate(**model_inputs, max_new_tokens=16384).cpu()

# decode the output
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
display_conversation(zero_shot, user_message, content)


### Conversation

**System:** Identify the user intent from this list: 
[pizza_ordering, drink_ordering, out_of_domain].

If the intent is pizza_ordering, extract the following slot values from the user input
- pizza_size, the size of the pizza
- pizza_type, the type of pizza
- pizza_count, the number of pizzas as a numeric value.
If no values are present in the user input you have to put 'null' as value.

Only output the json object without any additional text.
The json format is: 
{
    "intent": "...", 
    "slots": {
        "slot": "value"
    }
}

**User:** I would like a medium margherita pizza, please.

**Assistant:** {
    "intent": "pizza_ordering",
    "slots": {
        "pizza_size": "medium",
        "pizza_type": "margherita",
        "pizza_count": "1"
    }
}

In [4]:
import json

print(json.dumps(json.loads(content), ensure_ascii=False, indent=2))

{
  "intent": "pizza_ordering",
  "slots": {
    "pizza_size": "medium",
    "pizza_type": "margherita",
    "pizza_count": "1"
  }
}


### Exercise

Extend the following prompt to handle other intents:
- `out_of_domain`
- `drink_ordering`

For each intent, specify the (optional) slots that may be required to fulfill it

# Dialogue State Tracking (DST)

The Dialogue State Tracker (DST) updates the Dialogue State (DS) based on the current turn.

The dialogue state contains information about the intent and its extracted (or missing) slot-value pairs.

Through the DS, the system has a summary (abstract meaning representation) of the dialogue (better than relying on the WHOLE dialogue history).

The dialogue state is passed among components, and allows the system to track which slots are still missing (null) so the Dialog Manager (DM) can ask follow-ups.

DST responsibilities:
- Merge new NLU outputs with the existing state.
- Keep canonical, normalized slot values (e.g., "medium" not "Medium ").

## Example

Suppost the current DS is:

```json
{
    "intent": "pizza_ordering", 
    "slots": {
        "pizza_type": null,
        "pizza_size": "medium",
        "pizza_count": null
    }
}
```

What do you expect to happen if the next user's utterance is *"I would like a margherita?"*

### Solution

We have a new value for `pizza_type`, so we update our DS accordingly

```json
{
    "intent": "pizza_ordering", 
    "slots": {
        "pizza_type": "margherita",
        "pizza_size": "medium",
        "pizza_count": null
    }
}
```

## Code

In [5]:
from copy import deepcopy


def clean_response(response: dict) -> dict:
    """Clean an NLU response for safe merging into the dialogue state."""
    def _clean(obj):
        if isinstance(obj, dict):
            out = {}
            for k, v in obj.items():
                # Normalize explicit textual nulls
                if isinstance(v, str) and v.strip().lower() == "null":
                    v = None
                if v is None:
                    continue
                if isinstance(v, dict):
                    cleaned = _clean(v)
                    if cleaned:
                        out[k] = cleaned
                else:
                    out[k] = v
            return out
        else:
            return obj

    return _clean(deepcopy(response))



def update_ds(ds: dict, nlu_response: dict) -> dict:
    """Merge a cleaned NLU response into the dialogue state `ds`."""
    for key, value in nlu_response.items():
        if value is None:
            continue
        # If the incoming value is a dict (commonly 'slots'), merge recursively
        if isinstance(value, dict):
            if key not in ds or not isinstance(ds.get(key), dict):
                ds[key] = {}
            for subk, subv in value.items():
                if subv is None:
                    continue
                ds[key][subk] = subv
        else:
            ds[key] = value
    return ds


In [6]:
nlu_response = """{
    "intent": "pizza_ordering", 
    "slots": {
        "pizza_type": "margherita",
        "pizza_size": null,
        "pizza_count": null
    }
}"""

In [7]:
import json

# Initial dialogue state (previous context)
dialogue_state = {
    "intent": "pizza_ordering",
    "slots": {
        "pizza_type": None,
        "pizza_size": "medium",
        "pizza_count": None
    }
}

print("Dialogue state BEFORE update:")
print(json.dumps(dialogue_state, ensure_ascii=False, indent=2))

parsed = json.loads(nlu_response)
cleaned = clean_response(parsed)

# Merge into dialogue state
updated_state = update_ds(dialogue_state, cleaned)
print("\nDialogue state AFTER update:")
print(json.dumps(updated_state, ensure_ascii=False, indent=2))


Dialogue state BEFORE update:
{
  "intent": "pizza_ordering",
  "slots": {
    "pizza_type": null,
    "pizza_size": "medium",
    "pizza_count": null
  }
}

Dialogue state AFTER update:
{
  "intent": "pizza_ordering",
  "slots": {
    "pizza_type": "margherita",
    "pizza_size": "medium",
    "pizza_count": null
  }
}


In [8]:
# Example: what the dialog manager should now ask next
missing_slots = [k for k, v in updated_state.get("slots", {}).items() if v is None]
print("\nMissing slots:", missing_slots)


Missing slots: ['pizza_count']


# Evaluation

You are given:
- a sentence
- the ground-truth intent and slot-value pairs

Question: *How can we evalute the NLU component?*

### Example

Sentence: *"I would like a margherita pizza"*
 
Ground-Truth:
```json
{
    "intent": "pizza_ordering", 
    "slots": {
        "pizza_type": "margherita",
        "pizza_size": null,
        "pizza_count": null
    }
}
```

How can we use this information to evaluate our NLU component?

#### Solution
We expect that a proper NLU model will correctly
1. classify the intent
2. fill the slots using the right values

By comparing the **ground-truth** with the **prediction** from the NLU, we can compute metrics such as the overall Intent and Slot *F1-Score* and *Accuracy*.

In [None]:
from typing import List, Dict, Tuple, Any

def _normalize_val(v: Any):
    """Normalize slot values for comparison."""
    if isinstance(v, str) and v.strip().lower() == "null":
        return None
    if v is None:
        return None
    # try to normalize numeric strings to int
    if isinstance(v, str):
        s = v.strip()
        if s.isdigit():
            return int(s)
        try:
            f = float(s)
            return f
        except Exception:
            return s.lower()
    if isinstance(v, (int, float)):
        return v
    return v

def _equal_slot(a: Any, b: Any) -> bool:
    a_n = _normalize_val(a)
    b_n = _normalize_val(b)
    return a_n == b_n

def evaluate_nlu(pred_states: List[Dict], gt_states: List[Dict]) -> Dict:
    """
    Evaluate NLU predictions against ground-truth dialogue states.

    Args:
      pred_states: list of predicted dialogue states, each like {"intent": "...", "slots": {...}}
      gt_states:   list of ground-truth dialogue states, same format

    Returns:
      Dictionary with:
        - intent_accuracy: fraction of examples with correct intent
        - slot_accuracy: fraction of slot values correct (computed over all GT slots)
    """
    if len(pred_states) != len(gt_states):
        raise ValueError("pred_states and gt_states must have the same length")

    n = len(gt_states)
    if n == 0:
        return {"intent_accuracy": 0.0, "slot_accuracy": 0.0}

    intent_correct = 0
    total_slots = 0
    correct_slots = 0

    for pred, gt in zip(pred_states, gt_states):
        # intents
        if pred.get("intent") == gt.get("intent"):
            intent_correct += 1

        gt_slots = gt.get("slots", {}) or {}
        pred_slots = pred.get("slots", {}) or {}

        for slot_name, gt_val in gt_slots.items():
            total_slots += 1
            # predicted might miss the slot key -> treat as None
            pred_val = pred_slots.get(slot_name, None)
            if _equal_slot(pred_val, gt_val):
                correct_slots += 1

    return {
        "intent_accuracy": intent_correct / n,
        "slot_accuracy": (correct_slots / total_slots) if total_slots else 0.0
    }

In [10]:
# Example usage:
preds = [
    {
        "intent": "pizza_ordering",
        "slots": {
            "pizza_size": "medium",
            "pizza_type": "margherita",
            "pizza_count": None,
        },
    }
]
gts = [
    {
        "intent": "pizza_ordering",
        "slots": {"pizza_size": "medium", "pizza_type": None, "pizza_count": None},
    }
]
print(evaluate_nlu(preds, gts))

{'intent_accuracy': 1.0, 'slot_accuracy': 0.6666666666666666}


Sometimes, Per-Intent and Per-Slot F1-Score and Accuracy may be more informative, since they can provide additional details about the model shortcomings.