<a href="https://colab.research.google.com/github/candyledger1/Express-Certificate-Pricing/blob/main/DL_Assignment_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment III

# Practical Deep Learning for Language Processing

Author: Nino Maisuradze

Master’s Program Economics and Finance

University of Tübingen

## Fine-Tuning and Preference Alignment of Large Language Models for Airbnb Title Optimization

## 1. Introduction

In digital marketplaces such as Airbnb, listing titles play a crucial role in attracting user attention and influencing click-through behavior. Since potential guests initially only see the listing title, it must clearly communicate key attributes such as location, property type, and standout features.

The goal of this assignment is to develop an automated title generation system based on a pre-trained Large Language Model (LLM). The model is conditioned on Airbnb listing descriptions and trained to generate concise, attractive, and relevant titles.

The assignment consists of the following stages:

	•	Task A: Zero-shot title generation using a base LLM
	•	Task B: Supervised fine-tuning (LoRA + 4-bit quantization)
	•	Task C: Manual human preference labeling
	•	Task D: Preference fine-tuning using Direct Preference Optimization (DPO)
	•	Task E: Comparative evaluation and qualitative analysis
	•	Task F: Deployment via a Gradio web application

The objective is to compare three approaches:
	1.	Zero-shot baseline
	2.	Supervised fine-tuned model (SFT)
	3.	Preference-aligned model (DPO)

and evaluate how each step improves title quality.

## Task A - Zero-Shot Title Geneation

### 2. Zero-Shot Title Generation

In this task, I evaluate how well a pretrained instruction-tuned Large Language Model (LLM) can generate Airbnb listing titles without any additional fine-tuning.

The model is used in a zero-shot setting, meaning it relies solely on its pretrained knowledge and instruction-following capabilities.

The goal is to:

	•	Generate titles conditioned on listing descriptions
	•	Experiment with different prompt formulations
	•	Assess the quality of the generated titles

This zero-shot evaluation serves as a baseline for later comparison with:
	•	Supervised Fine-Tuning (SFT)
	•	Direct Preference Optimization (DPO)

### 2.1 Environment Setup

I first verify that a GPU is available and mount Google Drive to access the dataset.

In [None]:
!nvidia-smi

Fri Feb 13 15:41:01 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   44C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+----------------------------------------------

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### 2.2 Experimental Setup

I evaluate the base model (Mistral-7B-Instruct-v0.2) in a zero-shot setting, meaning no additional fine-tuning is applied. The model is used directly as a pretrained instruction-following LLM.

To examine the impact of prompt design, I experiment with three different prompt formulations:

	•	v1: Simple instruction
	•	v2: More structured format
	•	v3: Stronger stylistic constraints

For each prompt version, I generate titles for 8 Airbnb listings and compare the outputs qualitatively.



### 2.3 Load Dataset

In [None]:
import pandas as pd

DATA_PATH = "/content/drive/MyDrive/airbnb_tabular.csv"

df = pd.read_csv(DATA_PATH)

print(df.shape)
df.head()

(22570, 116)


Unnamed: 0.1,Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,description,neighborhood_overview,picture_url,host_id,...,has_amenity_Elevator,has_amenity_Host greets you,has_amenity_Free parking on premises,len_amenities,len_description,proxy,review_diff,in_top_third,img_available,joint_description
0,0.0,13913,https://www.airbnb.com/rooms/13913,20220610000000.0,2022-06-08,Holiday London DB Room Let-on going,My bright double bedroom with a large window h...,Finsbury Park is a friendly melting pot commun...,https://a0.muscache.com/pictures/miso/Hosting-...,54730.0,...,0,0,1,41,154,3,15.0,1,1.0,My bright double bedroom with a large window h...
1,3.0,17402,https://www.airbnb.com/rooms/17402,20220610000000.0,2022-06-08,Superb 3-Bed/2 Bath & Wifi: Trendy W1,You'll have a wonderful stay in this superb mo...,"Location, location, location! You won't find b...",https://a0.muscache.com/pictures/39d5309d-fba7...,67564.0,...,1,0,0,38,112,3,5.0,1,1.0,You'll have a wonderful stay in this superb mo...
2,4.0,25123,https://www.airbnb.com/rooms/25123,20220610000000.0,2022-06-08,Clean big Room in London (Room 1),Big room with double bed clean sheets clean to...,Barnet is one of the largest boroughs in Londo...,https://a0.muscache.com/pictures/456905/a004b9...,103583.0,...,0,0,0,14,129,1,0.0,0,1.0,Big room with double bed clean sheets clean to...
3,5.0,36299,https://www.airbnb.com/rooms/36299,20220610000000.0,2022-06-07,Kew Gardens 3BR house in cul-de-sac,3 Bed House with garden close to Thames river ...,"Residential family neighborhood, with both Eng...",https://a0.muscache.com/pictures/457052/6e819d...,155938.0,...,0,0,0,34,128,3,7.0,1,1.0,3 Bed House with garden close to Thames river ...
4,9.0,39387,https://www.airbnb.com/rooms/39387,20220610000000.0,2022-06-08,Stylish bedsit in Notting Hill ish flat.,Private lockable bedsit room available within ...,My place is convenient for all London attracti...,https://a0.muscache.com/pictures/beda1dab-9443...,168920.0,...,0,1,0,40,135,1,0.0,0,1.0,Private lockable bedsit room available within ...


In [None]:
df_top = df[df["in_top_third"] == 1].copy()

print(df_top.shape)
df_top[["description", "name"]].head()

(5782, 116)


Unnamed: 0,description,name
0,My bright double bedroom with a large window h...,Holiday London DB Room Let-on going
1,You'll have a wonderful stay in this superb mo...,Superb 3-Bed/2 Bath & Wifi: Trendy W1
3,3 Bed House with garden close to Thames river ...,Kew Gardens 3BR house in cul-de-sac
6,A luminous room in a modern 2 bedroom flat loc...,Room with a view zone 1 Central Bankside
8,Blenheim Lodge was built in 1878 when there we...,You Will Save Money Here


### 2.4 Load Instruction-Tuned Base Model

I now load the pretrained model used in the assignment:

	•	Mistral-7B-Instruct-v0.2
	•	4-bit quantized (to fit Colab GPU memory)

In [None]:
!pip install -q transformers accelerate bitsandbytes

In [None]:
import os
from google.colab import userdata
from huggingface_hub import login

# Read secret from Colab Secrets panel
hf_token = userdata.get("HF_TOKEN")

# Login to HuggingFace
login(hf_token)

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.2"

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME,
    trust_remote_code=True
)

# Important for Mistral padding
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load model
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

model.eval()

print("Model loaded successfully.")

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

Model loaded successfully.


### 2.5 Prompt Construction

To guide the model, I design instruction-style prompts that condition title generation on the listing description.

Rather than using a single prompt, I experiment with three different prompt formulations to examine how prompt design influences zero-shot performance.

Each prompt takes the listing description as input and instructs the model to generate a short Airbnb-style title.

In [None]:
def build_prompt_v1(description):
    # Simple direct instruction
    return f"""Write a short and attractive Airbnb listing title.

Description:
{description}

Title:"""

def build_prompt_v2(description):
    # Role-based instruction
    return f"""You are an expert Airbnb copywriter.
Create a short, catchy, and professional listing title
based on the description below.

Description:
{description}

Title:"""

def build_prompt_v3(description):
    # More constrained / structured
    return f"""Generate a concise Airbnb listing title (max 10 words).
Focus on location and key selling points.

Description:
{description}

Title:"""

### 2.6 Inference Function

In [None]:
def generate_title(description, prompt_builder, max_new_tokens=20):
    prompt = prompt_builder(description)

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.8,
            top_p=0.9,
        )

    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract only text after "Title:"
    if "Title:" in generated_text:
        return generated_text.split("Title:")[-1].strip()
    else:
        return generated_text.strip()

### 2.7 Example Zero-Shot Generation

In [None]:
N = 10   # start small for testing
subset = df_top.iloc[:N].copy()

titles_v1 = []
titles_v2 = []
titles_v3 = []

for desc in subset["description"]:
    titles_v1.append(generate_title(desc, build_prompt_v1))
    titles_v2.append(generate_title(desc, build_prompt_v2))
    titles_v3.append(generate_title(desc, build_prompt_v3))

subset["title_v1"] = titles_v1
subset["title_v2"] = titles_v2
subset["title_v3"] = titles_v3

subset[["name", "title_v1", "title_v2", "title_v3"]]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for o

Unnamed: 0,name,title_v1,title_v2,title_v3
0,Holiday London DB Room Let-on going,"Cozy Double Room in Finsbury Park, Central Lon...",Bright & Cozy Double Room in Central Finsbury ...,Bright Double Room in Finsbury Park: Comfortab...
1,Superb 3-Bed/2 Bath & Wifi: Trendy W1,Modern & Spacious 3-Bedroom Apartment in Centr...,"Modern Fitzrovia Apartment: Beautiful 3-Bed, 2...","Modern, Spacious 3-Bed 2-Bath Apartment in Cen..."
3,Kew Gardens 3BR house in cul-de-sac,Charming 3 Bed House by the Thames with easy a...,"""River Retreat: Charming 3-Bed House with Gard...","Modern 3-bed house near Thames, Kew Gardens & ..."
6,Room with a view zone 1 Central Bankside,"Bright, Modern & Quiet Room with Stunning Shar...","""Bright & Boutique: Luxe Veggie Flat - King Ro...","Bright, King-sized Room with Shard View in Cen..."
8,You Will Save Money Here,Historic 1878 Blenheim Lodge - Quiet Upmarket ...,"""Blenheim Lodge: Charming 1878 Victorian Home ...","Historic East Finchley House - Double Room, Gr..."
11,Beautiful 1 bed apt in Queens Park,Chic Mid-Century Design Apartment with Balcony...,"""Mid-Century Chic: Bright, View-Filled Flat w/...",Modern Mid-Century London Apartment with Balco...
12,Quiet Comfortable Room in Fulham,"Charming Double Single Room in Quiet, Safe Nei...",Charming Double Single in Quiet Munster Villag...,"Quiet, Safe Double Room in Munster Village - S..."
13,"Beautiful, Luxurious Art Deco +private bathroom",Luxe Art Deco Flat: Unique Private Room w/Ensu...,"""Luxe Retreat in a Charming Art Deco Flat - Ro...","Trendy Art Deco Flat: Ensuite Room with View, ..."
15,Cosy Double studio in Zone 2 Hammersmith (6),"""Cozy & Convenient Studio in Hammersmith - Min...","Hip Hammersmith Hub: Minutes from Kensington, ...",Modern Studios in Hammersmith - Convenient Bas...
18,Cosy Double studio in Zone 2 Hammersmith (1),"Modern, Bright and Convenient Studio Apartment...",Hammersmith Haven - Conveniently Located Studi...,Modern Hammersmith Studio - Close to Kensingto...


In [None]:
# Save results for comparison

SAVE_PATH = "/content/drive/MyDrive/zero_shot_results.csv"

subset_to_save = subset[["name", "title_v1", "title_v2", "title_v3"]].copy()

subset_to_save.to_csv(SAVE_PATH, index=False)

print("Zero-shot results saved to:", SAVE_PATH)
subset_to_save.head()

Zero-shot results saved to: /content/drive/MyDrive/zero_shot_results.csv


Unnamed: 0,name,title_v1,title_v2,title_v3
0,Holiday London DB Room Let-on going,"Cozy Double Room in Finsbury Park, Central Lon...",Bright & Cozy Double Room in Central Finsbury ...,Bright Double Room in Finsbury Park: Comfortab...
1,Superb 3-Bed/2 Bath & Wifi: Trendy W1,Modern & Spacious 3-Bedroom Apartment in Centr...,"Modern Fitzrovia Apartment: Beautiful 3-Bed, 2...","Modern, Spacious 3-Bed 2-Bath Apartment in Cen..."
3,Kew Gardens 3BR house in cul-de-sac,Charming 3 Bed House by the Thames with easy a...,"""River Retreat: Charming 3-Bed House with Gard...","Modern 3-bed house near Thames, Kew Gardens & ..."
6,Room with a view zone 1 Central Bankside,"Bright, Modern & Quiet Room with Stunning Shar...","""Bright & Boutique: Luxe Veggie Flat - King Ro...","Bright, King-sized Room with Shard View in Cen..."
8,You Will Save Money Here,Historic 1878 Blenheim Lodge - Quiet Upmarket ...,"""Blenheim Lodge: Charming 1878 Victorian Home ...","Historic East Finchley House - Double Room, Gr..."


### 2.8 Results

In this task, I evaluated the instruction-tuned base model (Mistral-7B-Instruct-v0.2) in a zero-shot setting without any additional fine-tuning.

I generated Airbnb listing titles for 8 descriptions using three different prompt formulations. This allowed me to examine how prompt design influences the quality of generated titles.

Overall, I observe that:

	•	The simple prompt (v1) produces generally relevant titles but sometimes lacks stylistic consistency or strong selling points.
	•	The more structured prompt (v2) leads to clearer and more professional titles.
	•	The constrained prompt (v3) often generates the most concise and Airbnb-style outputs, focusing on key features and location.

The experiment shows that even without fine-tuning, the base instruction-tuned LLM is capable of generating coherent and attractive listing titles. However, the quality and consistency depend strongly on prompt formulation.

This zero-shot evaluation serves as a baseline for comparison with supervised fine-tuning (SFT) and preference optimization (DPO) in later tasks.

# Task B: Supervised Fine-Tuning (SFT)

### 3. Supervised Fine-Tuning with LoRA and 4-bit Quantization

In this task, I fine-tune the base instruction-tuned model to generate Airbnb titles from listing descriptions using supervised fine-tuning (SFT).

To make training feasible on limited GPU memory, I apply:

	•	4-bit quantization (QLoRA setup)
	•	LoRA (Low-Rank Adaptation) for parameter-efficient training

Only a small set of additional trainable parameters is updated, while the base model remains frozen.

After fine-tuning, I generate titles for the same 5–10 descriptions from Task A using a consistent prompt format and compare performance to the zero-shot baseline.

In [None]:
# Step 1 - install dependencies

!pip install -q transformers accelerate bitsandbytes peft trl datasets

### 3.2 import Libraries


In [None]:
import torch
import pandas as pd
from datasets import Dataset

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer, SFTConfig

DATA_PATH = "/content/drive/MyDrive/airbnb_tabular.csv"
df = pd.read_csv(DATA_PATH)

# keep only needed columns and drop missing
df_train = df[["description", "name"]].dropna().copy()

# keep small for stability (increase later)
df_train = df_train.sample(n=1000, random_state=42).reset_index(drop=True)

def format_sample(row):
    return (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{row['description']}\n\n"
        f"Title:\n{row['name']}"
    )

df_train["text"] = df_train.apply(format_sample, axis=1)
dataset = Dataset.from_pandas(
    df_train[["text"]],
    preserve_index=False
)

### 3.3 Load Base Model in 4-bit (QLoRA Setup)

In [None]:
MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.2"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,  # <- critical for T4
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
)

# recommended for training stability
model.config.use_cache = False

# prepare for k-bit training (QLoRA setup)
model = prepare_model_for_kbit_training(model)

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

### 3.4 Apply LoRA

In [None]:
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],  # keep light
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

trainable params: 6,815,744 || all params: 7,248,547,840 || trainable%: 0.0940


### 3.5 Supervised Fine-Tuning (SFT)

In this section, I fine-tune the instruction-tuned base model (Mistral-7B-Instruct-v0.2) to improve title generation performance.

To make fine-tuning feasible on limited GPU memory, I apply:

	•	4-bit quantization (BitsAndBytes) to reduce memory footprint
	•	LoRA (Low-Rank Adaptation) to train only a small number of additional parameters

This allows efficient parameter-efficient fine-tuning without updating the full model.

In [None]:
MAX_LEN = 256

def tokenize_function(example):
    return tokenizer(
        example["text"],
        truncation=True,
        max_length=MAX_LEN,
        padding=False,
    )

tokenized_dataset = dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=["text"],
)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [None]:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

training_args = TrainingArguments(
    output_dir="./mistral_lora_airbnb",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    num_train_epochs=1,
    learning_rate=2e-4,
    logging_steps=10,
    save_strategy="no",
    fp16=True,          # ✅ correct for T4
    bf16=False,         # ❗ must be False
    report_to="none",
)

In [None]:
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator,
)

In [None]:
trainer.train()

  return fn(*args, **kwargs)


Step,Training Loss
10,2.219845
20,2.06808
30,2.0721
40,1.971469
50,2.034188
60,1.973187
70,2.074577
80,1.952936
90,1.953888
100,1.986139


TrainOutput(global_step=125, training_loss=2.0221002349853516, metrics={'train_runtime': 1897.4376, 'train_samples_per_second': 0.527, 'train_steps_per_second': 0.066, 'total_flos': 8445995410391040.0, 'train_loss': 2.0221002349853516, 'epoch': 1.0})


The training loss decreased steadily, indicating that the LoRA adapters successfully adapted the model to the Airbnb title generation task.

Next, I evaluate the fine-tuned model on the same subset used in the zero-shot experiment.

In [None]:
model.save_pretrained("/content/drive/MyDrive/mistral_lora_run1")
tokenizer.save_pretrained("/content/drive/MyDrive/mistral_lora_run1")

('/content/drive/MyDrive/mistral_lora_run1/tokenizer_config.json',
 '/content/drive/MyDrive/mistral_lora_run1/chat_template.jinja',
 '/content/drive/MyDrive/mistral_lora_run1/tokenizer.json')

## Task B.2 Train Second Model

In [None]:
import torch
import pandas as pd
from datasets import Dataset

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer, SFTConfig

In [None]:
DATA_PATH = "/content/drive/MyDrive/airbnb_tabular.csv"
df = pd.read_csv(DATA_PATH)

df_train_2 = df[["description", "name"]].dropna().copy()
df_train_2 = df_train_2.sample(n=1000, random_state=42).reset_index(drop=True)

def format_sample_2(row):
    return (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{row['description']}\n\n"
        f"Title:\n{row['name']}"
    )

df_train_2["text"] = df_train_2.apply(format_sample_2, axis=1)

dataset_2 = Dataset.from_pandas(
    df_train_2[["text"]],
    preserve_index=False
)

In [None]:
MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.2"

bnb_config_2 = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

# LOAD TOKENIZER FIRST
tokenizer_2 = AutoTokenizer.from_pretrained(MODEL_NAME)

if tokenizer_2.pad_token is None:
    tokenizer_2.pad_token = tokenizer_2.eos_token

# THEN tokenize
MAX_LEN = 256

def tokenize_function_2(example):
    return tokenizer_2(
        example["text"],
        truncation=True,
        max_length=MAX_LEN,
        padding=False,
    )

tokenized_dataset_2 = dataset_2.map(
    tokenize_function_2,
    batched=True,
    remove_columns=["text"],
)

# THEN load model
model_2 = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config_2,
    device_map="auto",
)

model_2 = prepare_model_for_kbit_training(model_2)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

In [None]:
MAX_LEN = 256

def tokenize_function_2(example):
    return tokenizer_2(
        example["text"],
        truncation=True,
        max_length=MAX_LEN,
        padding=False,
    )

tokenized_dataset_2 = dataset_2.map(
    tokenize_function_2,
    batched=True,
    remove_columns=["text"],
)

tokenizer_2 = AutoTokenizer.from_pretrained(MODEL_NAME)

if tokenizer_2.pad_token is None:
    tokenizer_2.pad_token = tokenizer_2.eos_token

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

### LoRa Config

In [None]:
lora_config_2 = LoraConfig(
    r=32,  # higher rank
    lora_alpha=64,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

model_2 = get_peft_model(model_2, lora_config_2)
model_2.print_trainable_parameters()

trainable params: 13,631,488 || all params: 7,255,363,584 || trainable%: 0.1879


### Training Config

In [None]:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

training_args_2 = TrainingArguments(
    output_dir="./mistral_lora_airbnb_run2",
    per_device_train_batch_size=2,   # changed
    gradient_accumulation_steps=4,   # changed
    num_train_epochs=2,              # changed
    learning_rate=1e-4,              # changed
    logging_steps=50,
    save_strategy="no",
    fp16=True,
    bf16=False,
    report_to="none",
)

In [None]:
data_collator_2 = DataCollatorForLanguageModeling(
    tokenizer=tokenizer_2,
    mlm=False,
)

In [None]:
trainer_2 = Trainer(
    model=model_2,
    args=training_args_2,
    train_dataset=tokenized_dataset_2,
    data_collator=data_collator_2,
)

In [None]:
trainer_2.train()

  return fn(*args, **kwargs)


Step,Training Loss
50,2.078186
100,1.988707
150,1.920784
200,1.925752
250,1.904663


TrainOutput(global_step=250, training_loss=1.963618408203125, metrics={'train_runtime': 2703.7012, 'train_samples_per_second': 0.74, 'train_steps_per_second': 0.092, 'total_flos': 1.979136751185101e+16, 'train_loss': 1.963618408203125, 'epoch': 2.0})

In [None]:
SAVE_PATH_2 = "/content/drive/MyDrive/mistral_lora_airbnb_run2"

model_2.save_pretrained(SAVE_PATH_2)
tokenizer_2.save_pretrained(SAVE_PATH_2)

print("Model 2 saved to:", SAVE_PATH_2)

Model 2 saved to: /content/drive/MyDrive/mistral_lora_airbnb_run2


# Task B.3 Evaluation

I have completed three approaches for title generation: (i) a zero-shot baseline using the instruction-tuned base model (Mistral-7B-Instruct-v0.2) with three prompt variants, and (ii) two LoRA fine-tuned models trained on Airbnb-style title generation data. I saved the zero-shot outputs as a CSV and saved both fine-tuned adapters to Google Drive.

Now, I load the saved zero-shot results and reload both LoRA adapters on top of the same base model. Then I generate titles for the same evaluation subset with Model 1 and Model 2, store their outputs next to the zero-shot outputs, and save a final comparison CSV for qualitative evaluation (readability, relevance, specificity, and style consistency).

In [None]:
# Mount drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Load Models
import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

device = "cuda" if torch.cuda.is_available() else "cpu"

BASE_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"

# ---------- ZERO SHOT MODEL ----------
tokenizer_zero = AutoTokenizer.from_pretrained(BASE_MODEL)
model_zero = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="auto"
)

# ---------- FINETUNED MODEL 1 ----------
PATH_MODEL_1 = "/content/drive/MyDrive/mistral_lora_run1"

tokenizer_1 = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model_1 = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="auto"
)

model_1 = PeftModel.from_pretrained(base_model_1, PATH_MODEL_1)

# ---------- FINETUNED MODEL 2 ----------
PATH_MODEL_2 = "/content/drive/MyDrive/mistral_lora_airbnb_run2"

tokenizer_2 = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model_2 = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="auto"
)

model_2 = PeftModel.from_pretrained(base_model_2, PATH_MODEL_2)

print("All models loaded successfully.")

`torch_dtype` is deprecated! Use `dtype` instead!


Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]



All models loaded successfully.


In [None]:
# Unified Generation Function
def generate_title(model, tokenizer, description):
    prompt = (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{description}\n\n"
        "Title:\n"
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=30,
            temperature=0.8,
            top_p=0.9,
            do_sample=True
        )

    text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return text.split("Title:")[-1].strip()

In [None]:
# load zero shot dataset
data = pd.read_csv("/content/drive/MyDrive/airbnb_tabular.csv")
subset = data[["description", "name"]].dropna().head(10)

In [None]:
# Generate Comparison
results = []

for idx, row in subset.iterrows():
    desc = row["description"]

    title_zero = generate_title(model_zero, tokenizer_zero, desc)
    title_1 = generate_title(model_1, tokenizer_1, desc)
    title_2 = generate_title(model_2, tokenizer_2, desc)

    results.append({
        "original": row["name"],
        "zero_shot": title_zero,
        "finetuned_1": title_1,
        "finetuned_2": title_2
    })

results_df = pd.DataFrame(results)
results_df.head(10)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for o

Unnamed: 0,original,zero_shot,finetuned_1,finetuned_2
0,Holiday London DB Room Let-on going,Relax in a Bright Double Bedroom in Finsbury P...,"Bright and comfy room in Finsbury Park, Centra...","Bright & Cozy Double Bedroom, Finsbury Park (Z..."
1,Superb 3-Bed/2 Bath & Wifi: Trendy W1,Contemporary 3-Bedroom Apartment in Central Fi...,"Superb central modern 3-bed 2-bath apartment, ...",Wonderful 3-bed apartment in central Fitzrovia...
2,Clean big Room in London (Room 1),Spacious Double Room with Kitchen Access in Qu...,Large double room in London 6 months or more. ...,Big room to let 7 minutes from tube station. 2...
3,Kew Gardens 3BR house in cul-de-sac,Charming 3-Bed House Near Thames River & Kew G...,3 bed house in Kew with garden near Thames riv...,3 Bed House in residential Kew area (near Tham...
4,Stylish bedsit in Notting Hill ish flat.,Quiet & Convenient Bedsit Room Near Westbourne...,"Private lockable bedsit room near tube, buses ...",Private bed sit room (North Kensington) £625 p...
5,Clean big Room in London (Room 2),"Spacious Double Room in Quiet, Clean, Friendly...","Spacious Double Room to let in NW London, Gold...",Double Room to let in Brent Cross 7 minutes to...
6,Room with a view zone 1 Central Bankside,Bright and Calm Central London Room with Stunn...,Luminous room in Central London. Shard view. V...,"Spacious, light room in central London."
7,Room in maisonette in chiswick,Charming Double Room with Balcony near Chiswic...,Double room on 1st floor with own bathroom and...,Double room in 1st floor flat with w.c. & show...
8,You Will Save Money Here,"""Step Back in Time: A Beautiful Victorian Home...","Blenheim Lodge 1878, 10 minutes to tube statio...",Blenheim Lodge B&B in London - Double Room 1 2...
9,2 Double bed apartment in quiet area North London,"Charming Ground Floor Apartment with Garden, P...",Cosy 2 Bed Apartment with Private Garden. 250m...,"Cosy home in Mill Hill, North London, 2 Bedroo..."


In [None]:
SAVE_EVAL_PATH = "/content/drive/MyDrive/evaluation_results.csv"
results_df.to_csv(SAVE_EVAL_PATH, index=False)

print("Saved to:", SAVE_EVAL_PATH)

Saved to: /content/drive/MyDrive/evaluation_results.csv


In [None]:
results_df.head(10)

Unnamed: 0,original,zero_shot,finetuned_1,finetuned_2
0,Holiday London DB Room Let-on going,Relax in a Bright Double Bedroom in Finsbury P...,"Bright and comfy room in Finsbury Park, Centra...","Bright & Cozy Double Bedroom, Finsbury Park (Z..."
1,Superb 3-Bed/2 Bath & Wifi: Trendy W1,Contemporary 3-Bedroom Apartment in Central Fi...,"Superb central modern 3-bed 2-bath apartment, ...",Wonderful 3-bed apartment in central Fitzrovia...
2,Clean big Room in London (Room 1),Spacious Double Room with Kitchen Access in Qu...,Large double room in London 6 months or more. ...,Big room to let 7 minutes from tube station. 2...
3,Kew Gardens 3BR house in cul-de-sac,Charming 3-Bed House Near Thames River & Kew G...,3 bed house in Kew with garden near Thames riv...,3 Bed House in residential Kew area (near Tham...
4,Stylish bedsit in Notting Hill ish flat.,Quiet & Convenient Bedsit Room Near Westbourne...,"Private lockable bedsit room near tube, buses ...",Private bed sit room (North Kensington) £625 p...
5,Clean big Room in London (Room 2),"Spacious Double Room in Quiet, Clean, Friendly...","Spacious Double Room to let in NW London, Gold...",Double Room to let in Brent Cross 7 minutes to...
6,Room with a view zone 1 Central Bankside,Bright and Calm Central London Room with Stunn...,Luminous room in Central London. Shard view. V...,"Spacious, light room in central London."
7,Room in maisonette in chiswick,Charming Double Room with Balcony near Chiswic...,Double room on 1st floor with own bathroom and...,Double room in 1st floor flat with w.c. & show...
8,You Will Save Money Here,"""Step Back in Time: A Beautiful Victorian Home...","Blenheim Lodge 1878, 10 minutes to tube statio...",Blenheim Lodge B&B in London - Double Room 1 2...
9,2 Double bed apartment in quiet area North London,"Charming Ground Floor Apartment with Garden, P...",Cosy 2 Bed Apartment with Private Garden. 250m...,"Cosy home in Mill Hill, North London, 2 Bedroo..."


## Summary Evaluation and Analysis

I compare three approaches:
- Zero-shot baseline
- Supervised Fine-Tuned Model 1 (SFT)
- Supervised Fine-Tuned Model 2 (SFT)

### Overall Observations

**Zero-shot model**
- Often creative and descriptive.
- Sometimes too long and slightly promotional.
- Occasionally includes unnecessary phrases (e.g., “Subtitle:”).
- Captures amenities well but lacks consistency in style.

**Fine-Tuned Model 1**
- More structured and closer to Airbnb-style titles.
- Includes relevant details like location and number of bedrooms.
- Sometimes contains formatting artifacts (e.g., “Link:”, “Description:”).
- Slightly verbose in some cases.

**Fine-Tuned Model 2**
- More concise and cleaner.
- Titles feel more natural and aligned with Airbnb style.
- Better balance between informativeness and readability.
- Fewer formatting errors compared to Model 1.

### Comparison Summary

Across the examples, the fine-tuned models generally produce titles that are more aligned with Airbnb conventions than the zero-shot baseline.

Model 2 appears to generate the most consistent and concise titles, while Model 1 provides strong factual coverage but sometimes includes noise. The zero-shot model is creative but less controlled.

Overall, supervised fine-tuning improves relevance, clarity, and stylistic alignment compared to the base model.

# Task C - Human Preference Labeling

In this task, I construct a small human preference dataset that will later be used for Direct Preference Optimization (DPO).

For a subset of Airbnb listing descriptions, I generate two distinct title candidates using my best-performing supervised fine-tuned model. To ensure diversity between candidates, I apply temperature and nucleus sampling during generation.

For each listing, I manually review both candidate titles and select the preferred one based on:

  •	Relevance to the description

  •	Clarity and fluency

  •	Conciseness

  •	Attractiveness and Airbnb-style alignment

•	Absence of formatting artifacts or errors

The final dataset consists of (prompt, chosen_title, rejected_title) triples, which will serve as preference supervision for DPO training in Task D.# Task C: Human Preference Labeling

### 4.1 Selecitng Listings
To create the preference dataset, I randomly sample 50 Airbnb listing descriptions from the dataset.

While the assignment recommends labeling 50–100 examples, I begin with 50 examples to balance coverage and manual labeling effort.

In [None]:
import pandas as pd

# Load dataset
data = pd.read_csv("/content/drive/MyDrive/airbnb_tabular.csv")

# Select 50 listings for preference labeling
subset_pref = (
    data[["description", "name"]]
    .dropna()
    .sample(50, random_state=42)
    .reset_index(drop=True)
)

subset_pref.head()

Unnamed: 0,description,name
0,"The room is super central, with great connecti...",Central & airy room at King's Cross / Bloomsbury
1,A lovely one bedroom apartment in the heart of...,"Spacious, Quiet & 10min from Victoria"
2,As I spend a lot of time travelling abroad I h...,Well Equipped Rooftop Apartment in Shoreditch
3,"Nice, quiet single room on the second floor 5 ...",Quiet Single Room (Zone 1) in cool East London
4,"My room is in the Heart of Camden Town, right ...",Bright Single Room in Camden Town


In [None]:
!pip install -U bitsandbytes



In [None]:
import bitsandbytes as bnb
print(bnb.__version__)

0.49.1


In [None]:
import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

BASE_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4"
)

PATH_MODEL_2 = "/content/drive/MyDrive/mistral_lora_airbnb_run2"

tokenizer_2 = AutoTokenizer.from_pretrained(BASE_MODEL)

base_model_2 = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto"
)

model_2 = PeftModel.from_pretrained(base_model_2, PATH_MODEL_2)

print("Model 2 loaded successfully in 4-bit.")

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

Model 2 loaded successfully in 4-bit.


### 4.2 Generating Two Diverse Title Candidates

I use Fine-Tuned Model 2, since Task B showed it produced the most natural and consise titles.


In [None]:
import torch

def generate_two_titles(model, tokenizer, description):
    prompt = (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{description}\n\n"
        "Title:"
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=30,
            temperature=0.9,
            top_p=0.95,
            do_sample=True,
            num_return_sequences=2
        )

    titles = []
    for output in outputs:
        text = tokenizer.decode(output, skip_special_tokens=True)
        title = text.split("Title:")[-1].strip()
        titles.append(title)

    return titles[0], titles[1]

In [None]:
import pandas as pd

data = pd.read_csv("/content/drive/MyDrive/airbnb_tabular.csv")

subset_pref = (
    data[["description", "name"]]
    .dropna()
    .sample(50, random_state=42)
    .reset_index(drop=True)
)

print("Subset size:", len(subset_pref))
subset_pref.head()

Subset size: 50


Unnamed: 0,description,name
0,"The room is super central, with great connecti...",Central & airy room at King's Cross / Bloomsbury
1,A lovely one bedroom apartment in the heart of...,"Spacious, Quiet & 10min from Victoria"
2,As I spend a lot of time travelling abroad I h...,Well Equipped Rooftop Apartment in Shoreditch
3,"Nice, quiet single room on the second floor 5 ...",Quiet Single Room (Zone 1) in cool East London
4,"My room is in the Heart of Camden Town, right ...",Bright Single Room in Camden Town


In [None]:
len(subset_pref)

50

### 4.3 Candidate Table

For each listing description, I generate two title candidates and store them in a table.

This table will later be used for manual labeling, where I select the preferred title for each description.

In [None]:
preference_data = []

for idx, row in subset_pref.iterrows():
    desc = row["description"]

    title_a, title_b = generate_two_titles(model_2, tokenizer_2, desc)

    preference_data.append({
        "description": desc,
        "candidate_1": title_a,
        "candidate_2": title_b
    })

pref_df = pd.DataFrame(preference_data)

pref_df.head()

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for o

Unnamed: 0,description,candidate_1,candidate_2
0,"The room is super central, with great connecti...",Cute room in great central London neighbourhoo...,"Great location, perfect for two friends! - Bed..."
1,A lovely one bedroom apartment in the heart of...,Beautiful one bed flat in Central London Zone ...,"Lovely, quiet one bed flat in Central London! ..."
2,As I spend a lot of time travelling abroad I h...,"Spacious Double Room, Zone 1, Central London, ...",Modern stylish room in the heart of Shoreditch...
3,"Nice, quiet single room on the second floor 5 ...","Awesome, bright single room in Shoreditch, coo...",Cosy private room in cool East London house.
4,"My room is in the Heart of Camden Town, right ...",Giant room in Camden Town! Big bed & loads of ...,Private Room - Camden Town. Central London Loc...


#### Save Results
After generating two title candidates for each listing description,
I store the results in a table.

This table will now be used for manual preference labeling, where I select
the preferred title for each listing. The labeled data will later serve
as input for DPO fine-tuning.

In [None]:
# Save candidate table
SAVE_PREF_PATH = "/content/drive/MyDrive/preference_candidates.csv"
pref_df.to_csv(SAVE_PREF_PATH, index=False)

print("Preference candidate file saved to:", SAVE_PREF_PATH)

Preference candidate file saved to: /content/drive/MyDrive/preference_candidates.csv


### 4.5 Manual Preference Labeling

For each listing, I manually compare the two generated title candidates
and indicate which one I prefer.

Preference Encoding:
- 1 → Candidate 1 preferred
- 2 → Candidate 2 preferred
- 0 → Tie / both unsuitable

This creates a human preference dataset that will later be used for DPO training.

In [None]:
# Create preference column (initialize empty)
pref_df["preference"] = None

pref_df.head()

Unnamed: 0,description,candidate_1,candidate_2,preference
0,"The room is super central, with great connecti...",Cute room in great central London neighbourhoo...,"Great location, perfect for two friends! - Bed...",
1,A lovely one bedroom apartment in the heart of...,Beautiful one bed flat in Central London Zone ...,"Lovely, quiet one bed flat in Central London! ...",
2,As I spend a lot of time travelling abroad I h...,"Spacious Double Room, Zone 1, Central London, ...",Modern stylish room in the heart of Shoreditch...,
3,"Nice, quiet single room on the second floor 5 ...","Awesome, bright single room in Shoreditch, coo...",Cosy private room in cool East London house.,
4,"My room is in the Heart of Camden Town, right ...",Giant room in Camden Town! Big bed & loads of ...,Private Room - Camden Town. Central London Loc...,


## 4.6 Creating Preference Dataset for DPO

Using the manual preference labels, I construct a dataset containing:

- prompt: listing description
- chosen: preferred title
- rejected: non-preferred title

This format is required for Direct Preference Optimization (DPO).

In [None]:
pref_df = pd.read_csv("/content/drive/MyDrive/preference_candidates.csv")
pref_df.head()

Unnamed: 0,description,candidate_1,candidate_2,Preference
0,"The room is super central, with great connecti...",Cute room in great central London neighbourhoo...,"Great location, perfect for two friends! - Bed...",1
1,A lovely one bedroom apartment in the heart of...,Beautiful one bed flat in Central London Zone ...,"Lovely, quiet one bed flat in Central London! ...",2
2,As I spend a lot of time travelling abroad I h...,"Spacious Double Room, Zone 1, Central London, ...",Modern stylish room in the heart of Shoreditch...,1
3,"Nice, quiet single room on the second floor 5 ...","Awesome, bright single room in Shoreditch, coo...",Cosy private room in cool East London house.,2
4,"My room is in the Heart of Camden Town, right ...",Giant room in Camden Town! Big bed & loads of ...,Private Room - Camden Town. Central London Loc...,1


In [None]:
dpo_data = []

for _, row in pref_df.iterrows():

    prompt = (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{row['description']}\n\n"
        "Title:"
    )

    if row["Preference"] == 1:
        chosen = row["candidate_1"]
        rejected = row["candidate_2"]
    else:
        chosen = row["candidate_2"]
        rejected = row["candidate_1"]

    dpo_data.append({
        "prompt": prompt,
        "chosen": chosen,
        "rejected": rejected
    })

dpo_df = pd.DataFrame(dpo_data)

dpo_df.head()

In [None]:
dpo_df.to_csv("/content/drive/MyDrive/dpo_dataset.csv", index=False)

In this task, I generated two diverse title candidates per listing using the stronger fine-tuned model (Model 2). The candidates were produced using sampling to encourage variation in style and phrasing.

After generating the candidate titles, I exported the results to Google Sheets and manually annotated 50 examples by selecting the preferred title for each listing description. For each pair, I assigned label 1 if candidate_1 was preferred and 2 if candidate_2 was preferred.

This manually constructed human feedback dataset forms the basis for preference alignment using Direct Preference Optimization (DPO) in the next task.

# Task D: DPO Training

In this task, I fine-tune the model using Direct Preference Optimization (DPO), a preference-based training method that aligns a language model with human feedback without requiring an explicit reward model.

Using the manually labeled dataset from Task C, I construct (prompt, chosen, rejected) triples and train the model to prefer the human-selected titles over the rejected alternatives.

To keep training efficient, I apply LoRA-based parameter-efficient fine-tuning together with 4-bit quantization.

### 5.1 Prepare Dataset for TRL
After generating and manually annotating the preference data in Task C, I saved the resulting (prompt, chosen, rejected) triples as a CSV file.

Since the Colab runtime was reset, I reload the dataset from Google Drive and convert it into a Hugging Face Dataset format, which is required for DPO training.

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
import pandas as pd

dpo_df = pd.read_csv("/content/drive/MyDrive/dpo_dataset.csv")

dpo_df.head()

Unnamed: 0,prompt,chosen,rejected
0,You are an expert Airbnb copywriter.\n\nDescri...,Cute room in great central London neighbourhoo...,"Great location, perfect for two friends! - Bed..."
1,You are an expert Airbnb copywriter.\n\nDescri...,"Lovely, quiet one bed flat in Central London! ...",Beautiful one bed flat in Central London Zone ...
2,You are an expert Airbnb copywriter.\n\nDescri...,"Spacious Double Room, Zone 1, Central London, ...",Modern stylish room in the heart of Shoreditch...
3,You are an expert Airbnb copywriter.\n\nDescri...,Cosy private room in cool East London house.,"Awesome, bright single room in Shoreditch, coo..."
4,You are an expert Airbnb copywriter.\n\nDescri...,Giant room in Camden Town! Big bed & loads of ...,Private Room - Camden Town. Central London Loc...


In [5]:
from datasets import Dataset

dpo_dataset = Dataset.from_pandas(dpo_df)

dpo_dataset

Dataset({
    features: ['prompt', 'chosen', 'rejected'],
    num_rows: 50
})

### 5.2 Load Base Model in 4-bit

In [6]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
from trl import DPOTrainer, DPOConfig

BASE_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"
SFT_PATH = "/content/drive/MyDrive/mistral_lora_airbnb_run2"

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)

# Load base model in 4-bit
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto"
)

# Load SFT LoRA weights
model = PeftModel.from_pretrained(base_model, SFT_PATH)

print("SFT model loaded in 4-bit successfully.")

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

SFT model loaded in 4-bit successfully.


In [10]:
import torch

# 1) Make generation/training stable
model.config.use_cache = False  # IMPORTANT for training
model.gradient_checkpointing_disable()  # disable checkpointing to avoid metadata mismatch

# 2) Make padding consistent
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# 3) Optional: avoid torch.compile weirdness (safe default)
torch._dynamo.config.suppress_errors = True

### 5.3 Setup DPO Training Config

In [11]:
from trl import DPOConfig

training_args = DPOConfig(
    output_dir="/content/drive/MyDrive/mistral_dpo_airbnb",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=5e-6,
    num_train_epochs=3,
    logging_steps=5,
    save_strategy="epoch",
    beta=0.1,

    # IMPORTANT: make sequence shapes stable
    max_length=256,
    max_prompt_length=192,

    # IMPORTANT: ensure no checkpointing
    gradient_checkpointing=False,
)

  training_args = DPOConfig(


In [12]:
from trl import DPOTrainer

trainer = DPOTrainer(
    model=model,
    args=training_args,
    train_dataset=dpo_dataset,
    processing_class=tokenizer,
)

trainer.train()

Extracting prompt in train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 2}.


Step,Training Loss
5,0.872468
10,1.075672
15,0.619828
20,0.710084


TrainOutput(global_step=21, training_loss=0.8316089255469186, metrics={'train_runtime': 41.7717, 'train_samples_per_second': 3.591, 'train_steps_per_second': 0.503, 'total_flos': 0.0, 'train_loss': 0.8316089255469186, 'epoch': 3.0})

In this step, I fine-tuned the supervised LoRA model using Direct Preference Optimization (DPO) on the manually constructed preference dataset with 50 examples.

The dataset consisted of (prompt, chosen, rejected) triples derived from human preference annotations.

### 5.4 Evaluation

In [20]:
def generate_title(model, tokenizer, description):
    prompt = (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{description}\n\n"
        "Title:"
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    output = model.generate(
        **inputs,
        max_new_tokens=40,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )

    decoded = tokenizer.decode(output[0], skip_special_tokens=True)

    # Extract only the generated part after "Title:"
    if "Title:" in decoded:
        title = decoded.split("Title:")[-1].strip()
    else:
        title = decoded[len(prompt):].strip()

    return title

# Task E: Evaluation

In this section, I evaluate the impact of supervised fine-tuning and Direct Preference Optimization (DPO) on title generation quality.

To assess the differences between models, I compare:

  •	the zero-shot base model,

  •	the supervised fine-tuned (SFT) model, and

  •	the DPO fine-tuned model.


To this end, first I reload the previously trained models.
Then, I generate titles for a small subset of 5–10 listing descriptions and compare the outputs side-by-side.


The evaluation focuses on qualitative criteria including:

  •	Relevance to the listing description

  •	Clarity and fluency

  •	Conciseness

  •	Creativity and attractiveness

  •	Uniqueness

  •	Absence of errors

This analysis allows me to assess whether supervised and preference-based fine-tuning improve alignment with human expectations.


In [14]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

BASE_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"

tokenizer_zero = AutoTokenizer.from_pretrained(BASE_MODEL)

model_zero = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

print("Zero-shot model loaded.")

from peft import PeftModel

PATH_MODEL_2 = "/content/drive/MyDrive/mistral_lora_airbnb_run2"

tokenizer_2 = AutoTokenizer.from_pretrained(BASE_MODEL)

base_model_2 = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

model_2 = PeftModel.from_pretrained(base_model_2, PATH_MODEL_2)

print("Supervised fine-tuned model loaded.")

`torch_dtype` is deprecated! Use `dtype` instead!


Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

Zero-shot model loaded.


Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

Supervised fine-tuned model loaded.


In [15]:
dpo_model = trainer.model
print("DPO model ready.")

DPO model ready.


#### 5.4.2 Select examples


In [17]:
subset_pref = pd.read_csv("/content/drive/MyDrive/preference_candidates.csv")

In [18]:
eval_subset = subset_pref.sample(5, random_state=123).reset_index(drop=True)
eval_subset

Unnamed: 0,description,candidate_1,candidate_2,Preference
0,"Greetings, This Bright unique room is a perfec...",Beautiful private room with double bed in Sout...,Bright spacious single room. Brockwell park. ...,2
1,A very modern semi- detached home in Camden To...,Large double room in Camden Town. 5 min walk t...,Lovely double room in Camden Town. :),2
2,"A bright, clean home close to the range of sho...",Spacious double bedroom next door to modern ba...,Comfortable double bedroom in SW20 210m to tub...,2
3,Pretty Victorian end of terrace house in a lov...,Family friendly house w. 3 large double bedroo...,Beautiful House in a Great Location in Brixton!,1
4,The space A good sized Twin Bedroom on the fir...,Spacious Twin Bedroom Near Underground! Parkin...,Twin Room in a Lovely Edwardian House in Zone ...,2


#### 5.4.3 Comparison

In [21]:
comparison_results = []

for idx, row in eval_subset.iterrows():
    desc = row["description"]

    zero_title = generate_title(model_zero, tokenizer_zero, desc)
    sft_title = generate_title(model_2, tokenizer_2, desc)
    dpo_title = generate_title(dpo_model, tokenizer_2, desc)

    comparison_results.append({
        "description": desc,
        "zero_shot": zero_title,
        "sft_model": sft_title,
        "dpo_model": dpo_title
    })

comparison_df = pd.DataFrame(comparison_results)
comparison_df

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Unnamed: 0,description,zero_shot,sft_model,dpo_model
0,"Greetings, This Bright unique room is a perfec...","Cozy, Bright Room in Quiet Tree-lined Street N...",Bright Room in Central London - 5 Mins to Brix...,Bright and Cosy Room in Brixton! 10mins to Cen...
1,A very modern semi- detached home in Camden To...,Modern Semi-Detached Home in Camden Town with ...,"Large double room in Camden Town, NW1. 10 min ...","Modern, large double room in Camden Town. Clos..."
2,"A bright, clean home close to the range of sho...","Bright, Spacious Double Room in Peaceful Wimbl...",Spacious Double Room with Modern Bathroom in W...,"Spacious double room in Wimbledon, close to to..."
3,Pretty Victorian end of terrace house in a lov...,Charming Victorian Retreat in Brixton with Pat...,"Lovely Victorian house in Brixton, 5 mins walk...",Bright 3 Bedroom House in Brixton/Herne Hill -...
4,The space A good sized Twin Bedroom on the fir...,Charming Twin Bedroom in Quiet Edwardian House...,Lovely Twin Bedroom in a quiet tree lined road...,Twin Room Bounds Green North London N22 2DG


#### 5.4.4 Evaluation


Below, I evaluate the outputs according to the required qualitative criteria.

<br>

**Relevance**

Across the examples, the zero-shot model often generated generic but loosely relevant titles. While the location or property type was sometimes mentioned, important selling points from the description (e.g., proximity to transport, garden access, spaciousness, quiet street) were not consistently emphasized.

The SFT model improved relevance noticeably. Titles more consistently captured:

  -	Location (e.g., Camden, Brixton, Wimbledon),

  -	Property type (e.g., double room, Victorian house),

  -	Key amenities (garden, quiet street, near transport).


The DPO model showed the strongest alignment with the descriptions. It more reliably highlighted:
  
  -	Specific neighborhood advantages,

  -	Transport proximity,

  -	Attractive features such as “quiet tree-lined road” or “spacious modern bathroom.”

Overall, preference fine-tuning improved how well the titles captured the core selling points.

<br>


**Clarity and Fluency**

The zero-shot titles were grammatically correct but sometimes slightly mechanical or template-like.

The SFT model produced clearer and more natural-sounding titles. Sentence structure improved and phrasing became more typical of real Airbnb listings.

The DPO model demonstrated the most fluent and natural phrasing. Titles felt polished and consistent in tone, with fewer awkward constructions. The wording appeared more human-like and aligned with marketing language.

<br>

**Conciseness**

Zero-shot titles were generally concise but sometimes either too minimal or slightly verbose.

The SFT model produced titles of appropriate length, balancing informativeness and brevity.

The DPO model achieved the best balance overall. Titles were:

-	Informative but not overly long,
-	Focused on 2–3 key features,
-	Structured in a way that maximized readability.

No major verbosity issues were observed in the DPO outputs.

<br>

**Creativity and Attractiveness**

Zero-shot titles were often functional but not particularly engaging.

The SFT model added mild marketing elements (e.g., “Charming,” “Spacious,” “Modern”), making them more appealing.

The DPO model produced the most attractive titles. Compared to the other models, they:

-	Used stronger descriptive adjectives,
-	Highlighted lifestyle aspects (quiet street, near park, central location),
-	Sounded more like real competitive Airbnb listings.

The DPO titles were more likely to encourage clicks.

<br>


**Uniqueness**

Zero-shot titles tended to be somewhat generic and formulaic.

The SFT model improved differentiation by incorporating more listing-specific details.

The DPO model showed the strongest uniqueness. It avoided overly generic patterns and more consistently integrated distinctive features of each property (e.g., neighborhood characteristics, transport details, or standout amenities).


<br>

**Absence of Errors**

All three models produced grammatically correct outputs.

However:

- The zero-shot model occasionally generated overly generic or repetitive phrasing.
- The SFT model showed fewer stylistic inconsistencies.
- The DPO model produced the most coherent and internally consistent titles.

No major factual contradictions or nonsensical phrases were observed in the evaluated examples.



**Overall Conclusion**

The comparison demonstrates a clear progression in quality:

Zero-shot → SFT → DPO

Supervised fine-tuning improves relevance and fluency, while preference-based fine-tuning via DPO further enhances:

- Alignment with human expectations,
- Marketing attractiveness,
-	Conciseness,
- Stylistic polish.

Although the dataset for DPO training was relatively small (50 examples), the preference alignment step noticeably improved title quality. This suggests that even limited human feedback can meaningfully refine generative behavior.




# Task F: Deploying models on a Gradio App

In [22]:
!pip install gradio



In [23]:
def generate_title_clean(model, tokenizer, description):
    prompt = (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{description}\n\n"
        "Title:"
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    output = model.generate(
        **inputs,
        max_new_tokens=40,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )

    decoded = tokenizer.decode(output[0], skip_special_tokens=True)

    # Remove prompt part
    title = decoded[len(prompt):].strip()

    return title

In [24]:
def compare_models(description):
    zero = generate_title_clean(model_zero, tokenizer_zero, description)
    sft = generate_title_clean(model_2, tokenizer_2, description)
    dpo = generate_title_clean(dpo_model, tokenizer_2, description)

    return zero, sft, dpo

In [25]:
import gradio as gr

iface = gr.Interface(
    fn=compare_models,
    inputs=gr.Textbox(lines=6, placeholder="Enter Airbnb listing description here..."),
    outputs=[
        gr.Textbox(label="Zero-Shot Model"),
        gr.Textbox(label="Supervised Fine-Tuned Model (SFT)"),
        gr.Textbox(label="Preference-Aligned Model (DPO)")
    ],
    title="Airbnb Title Generator Comparison",
    description="Compare title suggestions from Zero-shot, SFT, and DPO models."
)

iface.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://7fae6723b89853737f.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In this task, I deployed all three models (zero-shot, SFT, and DPO) using a Gradio web interface. The app allows users to input an Airbnb listing description and compare generated titles across the three training strategies. This interactive setup enables direct qualitative comparison of model behavior and alignment performance.