![](./cover.jpg)

Your airline's customer service team has been collecting chat data for years—thousands of conversations, each labeled with the user’s intent and an ideal response. Now, it's time to put that data to work.

You've been tasked with fine-tuning a TinyLlama model to power the airline’s next-gen AI assistant. The goal? Given a user message, the model should predict the intent (like booking a flight, checking baggage status, or requesting special assistance) and generate a helpful, human-like response. Accurate intent detection is key since it helps the system understand what the customer wants, so it can respond appropriately and trigger downstream actions when needed.

### The Data
You'll work with a dataset of various travel query examples. 

 Column | Description |
|--------|-------------|
| ```instruction``` | A user request from the Travel domain |
| ```category``` | The high-level semantic category for the intent |
| ```intent``` | The specific intent corresponding to the user instruction |
| ```response``` | An example of an expected response from the virtual assistant |

___
### Update to Python 3.10

Due to how frequently the libraries required for this project are updated, you'll need to update your environment to Python 3.10:

1. In the workbook, click on "Environment," in the top toolbar and select "Session details".

2. In the workbook language dropdown, select "Python 3.10".

3. Click "Confirm" and hit "Done" once the session is ready.

In [21]:
# First install the necessary packages
!pip install -q -q -q trl==0.16.0
!pip install -q -q -q tf-keras==2.19.0
!pip install -q -q -q peft==0.14.0

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [22]:
# Import the required dependencies for this project
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import SFTTrainer, SFTConfig
from peft import LoraConfig

from datasets import Dataset, load_dataset
from collections import Counter, defaultdict
import random

The code below loads the travel query dataset and reduces it from ~30k to ~50 records, keeping all intent types. This speeds up fine-tuning. Run it before starting, and feel free to experiment with it later!

In [23]:
# First load the entire dataset
ds = load_dataset('bitext/Bitext-travel-llm-chatbot-training-dataset', split="train")

# Group examples by intent
random.seed(42)
intent_groups = defaultdict(list)
for record in ds:
    intent = record["intent"]
    intent_groups[intent].append(record)

# Determine how many samples per intent
total_intents = len(intent_groups)
samples_per_intent = 100 // total_intents

# Sample from each intent
balanced_subset = []
for intent, examples in intent_groups.items():
    sampled = random.sample(examples, min(samples_per_intent, len(examples)))
    balanced_subset.extend(sampled)

total_num_of_records = 50    
travel_chat_ds = Dataset.from_list(balanced_subset[:total_num_of_records])

travel_chat_ds.to_pandas().head(3)

Unnamed: 0,instruction,intent,category,tags,response
0,I'd like information about my checked baggage ...,check_baggage_allowance,BAGGAGE,BCIP,To retrieve your checked baggage allowance det...
1,i have to see the fucking checked baggage allo...,check_baggage_allowance,BAGGAGE,BCIQW,"To determine your checked baggage allowance, p..."
2,I want to know about my checked baggage allowa...,check_baggage_allowance,BAGGAGE,BCI,To find details regarding your checked baggage...


In [24]:
# Start the project with the dataset below

print(travel_chat_ds)

Dataset({
    features: ['instruction', 'intent', 'category', 'tags', 'response'],
    num_rows: 50
})


In [25]:
# Start coding here
# Use as many cells as you need

In [26]:
# Modify the dataset to include a conversation field
def merge_example(row):    
  row['conversation'] = f"Query: {row['instruction']}\nIntent: {row['intent']}\nResponse: {row['response']}"
  return row

travel_chat_ds = travel_chat_ds.map(merge_example)

print(travel_chat_ds[0]['conversation'])

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

Query: I'd like information about my checked baggage allowance, how can I find it?
Intent: check_baggage_allowance
Response: To retrieve your checked baggage allowance details, please follow these instructions:

1. Visit {{WEBSITE_URL}} or launch the {{APP_NAME}} application.
2. Log in to your personal account.
3. Select the {{BOOKINGS_OPTION}} section.
4. Enter the required booking or flight information.
5. The specific baggage allowance for your trip will be displayed.

Should you require additional help, do not hesitate to contact customer support via the {{APP_NAME}} app or on {{WEBSITE_URL}}.


# 1. Load the model and tokenizer
In this section, we load the pre-trained TinyLlama model and its tokenizer 
from Hugging Face Hub. This model is a compact, causal language model 
designed for lightweight inference and experimentation.

In [27]:
# Load the Llama Model
model_name="TinyLlama/TinyLlama-1.1B-Chat-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token


# 2. Setting up the training Arguments
This block defines the training configuration for supervised fine-tuning (SFT)
using Hugging Face's TRL (Transformers Reinforcement Learning) library.

SFTConfig allows us to control how the model learns during fine-tuning.
We set conservative values here to allow quick experimentation and avoid
resource bottlenecks.

Key Parameters:
- max_steps: Limits total training steps. Set to 1 for a quick test run.
- per_device_train_batch_size: Batch size per device (GPU/CPU).
- gradient_accumulation_steps: Number of steps to accumulate gradients before an optimizer update.
- learning_rate: Step size for weight updates.
- max_grad_norm: Clips gradients to prevent explosion.
- save_steps: Frequency at which the model is checkpointed.
- dataset_text_field: Name of the column in your dataset containing the prompt or input text.
- output_dir: Directory to save logs and model checkpoints.
"""


In [28]:
# Initialize SFTConfig
sftConfig = SFTConfig(
    max_steps=1,    
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    learning_rate=2e-3,
    max_grad_norm=0.3,
    save_steps=100,
    dataset_text_field='conversation',
    output_dir="/tmp",
)

# 3 Initialize LoRA (Low-Rank Adaptation) Configuration

"""
This configuration sets up Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning 
of large language models. Instead of updating the full weight matrices of the model, 
LoRA injects small trainable matrices into attention layers — significantly reducing 
the number of trainable parameters.

Key Parameters:
- r: The rank of the low-rank matrices A and B (controls model capacity and memory usage).
- lora_alpha: Scaling factor for the LoRA update. Controls how much the LoRA adaptation influences the base model.
- lora_dropout: Dropout applied on the LoRA layers to improve generalization.
- bias: Whether to train bias terms. "none" disables it to keep things lightweight.
- task_type: Specifies the type of model. For causal language modeling (GPT-style), use "CAUSAL_LM".
- target_modules: List of model modules (typically attention projections) where LoRA will be applied.
"""

In [29]:
# Initialize LoRA config
lora_config = LoraConfig(    
    r= 4,    
    lora_alpha= 16,    
    lora_dropout=0.05,    
    bias="none",    
    task_type="CAUSAL_LM",    
    target_modules=['q_proj', 'v_proj']
)

# 4 Initialize the SFTTrainer for Fine-Tuning

"""
This block initializes the `SFTTrainer`, a high-level training loop provided by Hugging Face's 
`trl` library for supervised fine-tuning (SFT) of language models using parameter-efficient 
methods like LoRA.

The `SFTTrainer` handles tokenization, batching, gradient accumulation, checkpointing, and 
LoRA integration under the hood — allowing for quick experimentation without boilerplate.

Parameters:
- model: The pre-trained causal language model to be fine-tuned (TinyLlama in this case).
- train_dataset: The dataset containing input prompts (and optionally responses).
- peft_config: The LoRA configuration that determines which parts of the model are adapted.
- args: Training arguments specified via `SFTConfig` (e.g., learning rate, batch size, max steps).
"""

from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    train_dataset=travel_chat_ds,   # Hugging Face Dataset object
    peft_config=lora_config,        # LoRA config defined above
    args=sftConfig                  # Supervised fine-tuning config
)

print("✅ SFTTrainer initialized and ready to train.")


In [30]:
# Initialize SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=travel_chat_ds,
    peft_config=lora_config,
    args=sftConfig
)

Converting train dataset to ChatML:   0%|          | 0/50 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

# 5 Start Fine-Tuning the Model


This command starts the supervised fine-tuning process using the `SFTTrainer`.

What happens under the hood:
- The model parameters (as specified by the LoRA configuration) are updated using the training data.
- If `max_steps` is set (e.g., `max_steps=1`), training will terminate after that number of optimization steps.
- Intermediate metrics like loss, speed, and gradient stats may be logged, depending on `logging_steps` and `save_steps`.
- The model will be saved in `output_dir` if save checkpoints are triggered.

This is where the model begins learning domain-specific behavior — in this case, for travel-related customer support.


In [31]:
# Kickstart fine-tuning process
trainer.train()

Step,Training Loss


TrainOutput(global_step=1, training_loss=2.135936975479126, metrics={'train_runtime': 52.4515, 'train_samples_per_second': 0.019, 'train_steps_per_second': 0.019, 'total_flos': 1372512940032.0, 'train_loss': 2.135936975479126})

# 6 Generate a Response with the Fine-Tuned Model

"""
Now that the model is fine-tuned, we can prompt it with a user query and 
generate a response using the `generate()` method.

Steps:
1. Tokenize the input prompt using the model's tokenizer.
2. Use the `.generate()` method to produce a continuation from the model.
3. Decode the generated tokens (ignoring special tokens) to get a readable string.
"""

In [19]:
# Generate responses with fine-tuned model
inputs = tokenizer.encode("Query: I'm trying to book a flight", return_tensors="pt")
outputs = model.generate(
    inputs,
    max_new_tokens=20,
    do_sample=False,
    num_beams=1,
    early_stopping=True
)

decoded_outputs = tokenizer.decode(outputs[0, inputs.shape[1]:], skip_special_tokens=True)
model_response = decoded_outputs


In [20]:
model_response

"from London Gatwick to New York JFK. I've found 3 ways to"