<a href="https://colab.research.google.com/github/candyledger1/Express-Certificate-Pricing/blob/main/DL_Assignment_3_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment III

# Practical Deep Learning for Language Processing

Author: Nino Maisuradze

Master’s Program Economics and Finance

University of Tübingen

## Fine-Tuning and Preference Alignment of Large Language Models for Airbnb Title Optimization

## Introduction

In digital marketplaces such as Airbnb, listing titles play a crucial role in attracting user attention and influencing click-through behavior. Since potential guests initially only see the listing title, it must clearly communicate key attributes such as location, property type, and standout features.

The goal of this assignment is to develop an automated title generation system based on a pre-trained Large Language Model (LLM). The model is conditioned on Airbnb listing descriptions and trained to generate concise, attractive, and relevant titles.

The assignment consists of the following stages:

-	Task A: Zero-shot title generation using a base LLM
- 	Task B: Supervised fine-tuning (LoRA + 4-bit quantization)
- 	Task C: Manual human preference labeling
-	Task D: Preference fine-tuning using Direct Preference Optimization (DPO)
-   Task E: Comparative evaluation and qualitative analysis
-	Task F: Deployment via a Gradio web application

The objective is to compare three approaches:
	1.	Zero-shot baseline
	2.	Supervised fine-tuned model (SFT)
	3.	Preference-aligned model (DPO)

and evaluate how each step improves title quality.

## Task A - Zero-Shot Title Geneation

### A.1 Objective

The goal of this task is to evaluate the performance of a pre-trained instruction-tuned large language model (LLM) in a zero-shot setting. In this setup, no task-specific fine-tuning is applied. The model is directly prompted to generate Airbnb listing titles based solely on its pretrained knowledge and instruction-following capabilities.

This experiment serves two purposes:

1. To establish a baseline for later comparison with supervised fine-tuning (SFT) and preference optimization (DPO).
2. To analyze how prompt design influences generation quality in the absence of additional training.

The zero-shot setting allows us to isolate the effect of prompting from the effect of parameter updates.

### A.2 Environment Setup

Mount Google Drive to access the dataset.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### A.3 Prompt Design

To assess the sensitivity of the base model to prompt formulation, I designed three progressively more structured prompts:

- **Version 1 (Simple Instruction):** A direct request to generate a short Airbnb listing title.
- **Version 2 (Role-Based Framing):** The model is instructed to act as an expert Airbnb copywriter, encouraging a more professional tone.
- **Version 3 (Structured Constraint):** The model is instructed to generate a concise title (max 10 words) and explicitly focus on location and key selling points.

This design allows us to examine how increasing prompt specificity affects relevance, stylistic consistency, and conciseness of generated titles.

#### Load Dataset

In [None]:
import pandas as pd

DATA_PATH = "/content/drive/MyDrive/airbnb_tabular.csv"

df = pd.read_csv(DATA_PATH)

print(df.shape)
df.head()

(22570, 116)


Unnamed: 0.1,Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,description,neighborhood_overview,picture_url,host_id,...,has_amenity_Elevator,has_amenity_Host greets you,has_amenity_Free parking on premises,len_amenities,len_description,proxy,review_diff,in_top_third,img_available,joint_description
0,0.0,13913,https://www.airbnb.com/rooms/13913,20220610000000.0,2022-06-08,Holiday London DB Room Let-on going,My bright double bedroom with a large window h...,Finsbury Park is a friendly melting pot commun...,https://a0.muscache.com/pictures/miso/Hosting-...,54730.0,...,0,0,1,41,154,3,15.0,1,1.0,My bright double bedroom with a large window h...
1,3.0,17402,https://www.airbnb.com/rooms/17402,20220610000000.0,2022-06-08,Superb 3-Bed/2 Bath & Wifi: Trendy W1,You'll have a wonderful stay in this superb mo...,"Location, location, location! You won't find b...",https://a0.muscache.com/pictures/39d5309d-fba7...,67564.0,...,1,0,0,38,112,3,5.0,1,1.0,You'll have a wonderful stay in this superb mo...
2,4.0,25123,https://www.airbnb.com/rooms/25123,20220610000000.0,2022-06-08,Clean big Room in London (Room 1),Big room with double bed clean sheets clean to...,Barnet is one of the largest boroughs in Londo...,https://a0.muscache.com/pictures/456905/a004b9...,103583.0,...,0,0,0,14,129,1,0.0,0,1.0,Big room with double bed clean sheets clean to...
3,5.0,36299,https://www.airbnb.com/rooms/36299,20220610000000.0,2022-06-07,Kew Gardens 3BR house in cul-de-sac,3 Bed House with garden close to Thames river ...,"Residential family neighborhood, with both Eng...",https://a0.muscache.com/pictures/457052/6e819d...,155938.0,...,0,0,0,34,128,3,7.0,1,1.0,3 Bed House with garden close to Thames river ...
4,9.0,39387,https://www.airbnb.com/rooms/39387,20220610000000.0,2022-06-08,Stylish bedsit in Notting Hill ish flat.,Private lockable bedsit room available within ...,My place is convenient for all London attracti...,https://a0.muscache.com/pictures/beda1dab-9443...,168920.0,...,0,1,0,40,135,1,0.0,0,1.0,Private lockable bedsit room available within ...


In [None]:
df_top = df[df["in_top_third"] == 1].copy()

print(df_top.shape)
df_top[["description", "name"]].head()

(5782, 116)


Unnamed: 0,description,name
0,My bright double bedroom with a large window h...,Holiday London DB Room Let-on going
1,You'll have a wonderful stay in this superb mo...,Superb 3-Bed/2 Bath & Wifi: Trendy W1
3,3 Bed House with garden close to Thames river ...,Kew Gardens 3BR house in cul-de-sac
6,A luminous room in a modern 2 bedroom flat loc...,Room with a view zone 1 Central Bankside
8,Blenheim Lodge was built in 1878 when there we...,You Will Save Money Here


#### Load Instruction-Tuned Base Model

I now load the pretrained model used in the assignment:

-	Mistral-7B-Instruct-v0.2
-	4-bit quantized (to fit Colab GPU memory)

In [2]:
!pip install -q transformers accelerate bitsandbytes

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.7/60.7 MB[0m [31m35.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import os
from google.colab import userdata
from huggingface_hub import login

# Read secret from Colab Secrets panel
hf_token = userdata.get("HF_TOKEN")

# Login to HuggingFace
login(hf_token)

In [6]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.2"

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME,
    trust_remote_code=True
)

# Important for Mistral padding
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load model
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

model.eval()

print("Model loaded successfully.")

ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `llm_int8_enable_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details. 

#### Prompt Construction

To guide the model, I design instruction-style prompts that condition title generation on the listing description.

Rather than using a single prompt, I experiment with three different prompt formulations to examine how prompt design influences zero-shot performance.

Each prompt takes the listing description as input and instructs the model to generate a short Airbnb-style title.

The three prompt templates are implemented as follows:

In [None]:
def build_prompt_v1(description):
    # Simple direct instruction
    return f"""Write a short and attractive Airbnb listing title.

Description:
{description}

Title:"""

def build_prompt_v2(description):
    # Role-based instruction
    return f"""You are an expert Airbnb copywriter.
Create a short, catchy, and professional listing title
based on the description below.

Description:
{description}

Title:"""

def build_prompt_v3(description):
    # More constrained / structured
    return f"""Generate a concise Airbnb listing title (max 10 words).
Focus on location and key selling points.

Description:
{description}

Title:"""

#### Inference Function

In [None]:
def generate_title(description, prompt_builder, max_new_tokens=20):
    prompt = prompt_builder(description)

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.8,
            top_p=0.9,
        )

    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract only text after "Title:"
    if "Title:" in generated_text:
        return generated_text.split("Title:")[-1].strip()
    else:
        return generated_text.strip()

### A.4 Results
#### Example Zero-Shot Generation

To evaluate the zero-shot baseline, I generate Airbnb titles for a subset of listings using the three prompt formulations described above. The table below reports the generated titles for direct qualitative comparison.

In [None]:
N = 10   # start small for testing
subset = df_top.iloc[:N].copy()

titles_v1 = []
titles_v2 = []
titles_v3 = []

for desc in subset["description"]:
    titles_v1.append(generate_title(desc, build_prompt_v1))
    titles_v2.append(generate_title(desc, build_prompt_v2))
    titles_v3.append(generate_title(desc, build_prompt_v3))

subset["title_v1"] = titles_v1
subset["title_v2"] = titles_v2
subset["title_v3"] = titles_v3

subset[["name", "title_v1", "title_v2", "title_v3"]]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for o

Unnamed: 0,name,title_v1,title_v2,title_v3
0,Holiday London DB Room Let-on going,"Cozy Double Room in Finsbury Park, Central Lon...",Bright & Cozy Double Room in Central Finsbury ...,Bright Double Room in Finsbury Park: Comfortab...
1,Superb 3-Bed/2 Bath & Wifi: Trendy W1,Modern & Spacious 3-Bedroom Apartment in Centr...,"Modern Fitzrovia Apartment: Beautiful 3-Bed, 2...","Modern, Spacious 3-Bed 2-Bath Apartment in Cen..."
3,Kew Gardens 3BR house in cul-de-sac,Charming 3 Bed House by the Thames with easy a...,"""River Retreat: Charming 3-Bed House with Gard...","Modern 3-bed house near Thames, Kew Gardens & ..."
6,Room with a view zone 1 Central Bankside,"Bright, Modern & Quiet Room with Stunning Shar...","""Bright & Boutique: Luxe Veggie Flat - King Ro...","Bright, King-sized Room with Shard View in Cen..."
8,You Will Save Money Here,Historic 1878 Blenheim Lodge - Quiet Upmarket ...,"""Blenheim Lodge: Charming 1878 Victorian Home ...","Historic East Finchley House - Double Room, Gr..."
11,Beautiful 1 bed apt in Queens Park,Chic Mid-Century Design Apartment with Balcony...,"""Mid-Century Chic: Bright, View-Filled Flat w/...",Modern Mid-Century London Apartment with Balco...
12,Quiet Comfortable Room in Fulham,"Charming Double Single Room in Quiet, Safe Nei...",Charming Double Single in Quiet Munster Villag...,"Quiet, Safe Double Room in Munster Village - S..."
13,"Beautiful, Luxurious Art Deco +private bathroom",Luxe Art Deco Flat: Unique Private Room w/Ensu...,"""Luxe Retreat in a Charming Art Deco Flat - Ro...","Trendy Art Deco Flat: Ensuite Room with View, ..."
15,Cosy Double studio in Zone 2 Hammersmith (6),"""Cozy & Convenient Studio in Hammersmith - Min...","Hip Hammersmith Hub: Minutes from Kensington, ...",Modern Studios in Hammersmith - Convenient Bas...
18,Cosy Double studio in Zone 2 Hammersmith (1),"Modern, Bright and Convenient Studio Apartment...",Hammersmith Haven - Conveniently Located Studi...,Modern Hammersmith Studio - Close to Kensingto...


In [None]:
# Save results for comparison

SAVE_PATH = "/content/drive/MyDrive/zero_shot_results.csv"

subset_to_save = subset[["name", "title_v1", "title_v2", "title_v3"]].copy()

subset_to_save.to_csv(SAVE_PATH, index=False)

print("Zero-shot results saved to:", SAVE_PATH)
subset_to_save.head()

Zero-shot results saved to: /content/drive/MyDrive/zero_shot_results.csv


Unnamed: 0,name,title_v1,title_v2,title_v3
0,Holiday London DB Room Let-on going,"Cozy Double Room in Finsbury Park, Central Lon...",Bright & Cozy Double Room in Central Finsbury ...,Bright Double Room in Finsbury Park: Comfortab...
1,Superb 3-Bed/2 Bath & Wifi: Trendy W1,Modern & Spacious 3-Bedroom Apartment in Centr...,"Modern Fitzrovia Apartment: Beautiful 3-Bed, 2...","Modern, Spacious 3-Bed 2-Bath Apartment in Cen..."
3,Kew Gardens 3BR house in cul-de-sac,Charming 3 Bed House by the Thames with easy a...,"""River Retreat: Charming 3-Bed House with Gard...","Modern 3-bed house near Thames, Kew Gardens & ..."
6,Room with a view zone 1 Central Bankside,"Bright, Modern & Quiet Room with Stunning Shar...","""Bright & Boutique: Luxe Veggie Flat - King Ro...","Bright, King-sized Room with Shard View in Cen..."
8,You Will Save Money Here,Historic 1878 Blenheim Lodge - Quiet Upmarket ...,"""Blenheim Lodge: Charming 1878 Victorian Home ...","Historic East Finchley House - Double Room, Gr..."


### A.5 Discussion

The zero-shot experiment demonstrates that the instruction-tuned base model can generate coherent and generally relevant Airbnb titles without task-specific fine-tuning.

However, generation quality depends strongly on prompt formulation:

- The simple prompt (V1) produces relevant titles but often lacks stylistic consistency and strong marketing emphasis.
- The role-based prompt (V2) improves professionalism and fluency.
- The structured prompt (V3) yields the most consistent and Airbnb-style outputs, focusing on key attributes such as location and property type.

These findings demonstrate that prompt engineering alone can significantly influence output quality. Nevertheless, prompt improvements remain limited compared to systematic parameter adaptation, which motivates the supervised fine-tuning and preference-alignment stages in subsequent tasks.

# Task B: Supervised Fine-Tuning (SFT)

### B.1 Objective

In this task, I fine-tune the instruction-tuned base model to improve Airbnb title generation performance. The goal is to adapt the model more closely to the Airbnb title style observed in the dataset.

To make fine-tuning computationally feasible, I combine:

- **4-bit quantization (BitsAndBytes)** to reduce memory footprint.
- **LoRA (Low-Rank Adaptation)** to update only a small subset of parameters.
- **Hugging Face Transformers and TRL** for supervised fine-tuning.
- **PEFT** for parameter-efficient training.

Two fine-tuned variants are trained and later compared against the zero-shot baseline using a consistent generation prompt.

In [None]:
# Step 1 - install dependencies

!pip install -q transformers accelerate bitsandbytes peft trl datasets

### B.2 Training Setup

For supervised fine-tuning, I use Airbnb listing descriptions paired with their original titles. The dataset is preprocessed to create instruction-style training examples that align with the generation format used during inference.

Each training sample follows the structure:

"You are an expert Airbnb copywriter.


Description: < listing description >

Title: < ground-truth title >"

This ensures consistency between training and evaluation prompts.

In [7]:
import torch
import pandas as pd
from datasets import Dataset

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer, SFTConfig

DATA_PATH = "/content/drive/MyDrive/airbnb_tabular.csv"
df = pd.read_csv(DATA_PATH)

# keep only needed columns and drop missing
df_train = df[["description", "name"]].dropna().copy()

# keep small for stability (increase later)
df_train = df_train.sample(n=1000, random_state=42).reset_index(drop=True)

def format_sample(row):
    return (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{row['description']}\n\n"
        f"Title:\n{row['name']}"
    )

df_train["text"] = df_train.apply(format_sample, axis=1)
dataset = Dataset.from_pandas(
    df_train[["text"]],
    preserve_index=False
)

ModuleNotFoundError: No module named 'trl'

### B.3 Model Preparation (4-bit Quantization + LoRA)

To enable efficient fine-tuning, the base model is loaded using 4-bit quantization. This significantly reduces GPU memory usage while preserving performance.

LoRA adapters are then applied to selected attention layers. Instead of updating all model parameters, LoRA introduces trainable low-rank matrices that modify the forward pass. This allows effective adaptation with a small number of additional parameters while keeping the backbone model frozen.

In [None]:
MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.2"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,  # <- critical for T4
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
)

# recommended for training stability
model.config.use_cache = False

# prepare for k-bit training (QLoRA setup)
model = prepare_model_for_kbit_training(model)

#### Apply LoRA

In [None]:
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],  # keep light
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

trainable params: 6,815,744 || all params: 7,248,547,840 || trainable%: 0.0940


### B.4 Fine-Tuning Procedure

Supervised fine-tuning is performed using the Hugging Face `Trainer` API with a causal language modeling objective (next-token prediction).

To make fine-tuning feasible on limited GPU memory, I combine:

- **4-bit quantization (BitsAndBytes)** to reduce memory footprint.
- **LoRA (Low-Rank Adaptation)** to update only a small subset of trainable parameters.
- **Mixed precision training (FP16)** for computational efficiency.

The training data is formatted as instruction-style prompts:

"You are an expert Airbnb copywriter.
Description: ...
Title: ..."

The model learns to predict the title continuation given the description context.

Key training settings include:

- Small learning rate suitable for adapter training
- Gradient accumulation to simulate larger effective batch sizes
- Limited number of epochs to prevent overfitting
- Monitoring of training loss for stable convergence

#### B.4.1 First Fine-Tuned Model

In the first fine-tuning run, I train a LoRA-adapted version of the quantized Mistral-7B-Instruct model using the instruction-formatted Airbnb dataset described above.

The objective is to adapt the base model’s general language capabilities to the specific task of concise and engaging title generation. Only the LoRA adapter weights are updated, while the original backbone remains frozen due to 4-bit quantization.

This setup serves as the primary fine-tuned baseline against which the second model will be compared.

In [None]:
MAX_LEN = 256

def tokenize_function(example):
    return tokenizer(
        example["text"],
        truncation=True,
        max_length=MAX_LEN,
        padding=False,
    )

tokenized_dataset = dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=["text"],
)

In [None]:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

training_args = TrainingArguments(
    output_dir="./mistral_lora_airbnb",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    num_train_epochs=1,
    learning_rate=2e-4,
    logging_steps=10,
    save_strategy="no",
    fp16=True,
    bf16=False,
    report_to="none",
)

In [None]:
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator,
)

In [None]:
trainer.train()

  return fn(*args, **kwargs)


Step,Training Loss
10,2.219845
20,2.06808
30,2.0721
40,1.971469
50,2.034188
60,1.973187
70,2.074577
80,1.952936
90,1.953888
100,1.986139


TrainOutput(global_step=125, training_loss=2.0221002349853516, metrics={'train_runtime': 1897.4376, 'train_samples_per_second': 0.527, 'train_steps_per_second': 0.066, 'total_flos': 8445995410391040.0, 'train_loss': 2.0221002349853516, 'epoch': 1.0})


The training loss decreased steadily, indicating that the LoRA adapters successfully adapted the model to the Airbnb title generation task.

Next, I evaluate the fine-tuned model on the same subset used in the zero-shot experiment.

In [None]:
model.save_pretrained("/content/drive/MyDrive/mistral_lora_run1")
tokenizer.save_pretrained("/content/drive/MyDrive/mistral_lora_run1")

('/content/drive/MyDrive/mistral_lora_run1/tokenizer_config.json',
 '/content/drive/MyDrive/mistral_lora_run1/chat_template.jinja',
 '/content/drive/MyDrive/mistral_lora_run1/tokenizer.json')

#### B.4.2 Second Fine-Tuned Model

To examine the robustness of the supervised fine-tuning approach, I train a second LoRA-adapted model with slightly modified hyperparameters.

While the training procedure remains identical in structure, key configuration values (such as batch size, gradient accumulation, and learning rate) are adjusted. This allows us to assess how sensitive performance is to training dynamics and optimization settings.

Both fine-tuned models use:

- The same instruction-formatted training data
- 4-bit quantization
- LoRA adapters applied to attention projection layers

The only differences arise from training configuration choices.

Compared to Model 1, the following hyperparameters are modified:

- Per-device batch size
- Gradient accumulation steps
- Learning rate
- Number of training epochs

These adjustments allow investigation of how optimization dynamics influence convergence speed and generation quality.

In [None]:
import torch
import pandas as pd
from datasets import Dataset

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer, SFTConfig

In [None]:
DATA_PATH = "/content/drive/MyDrive/airbnb_tabular.csv"
df = pd.read_csv(DATA_PATH)

df_train_2 = df[["description", "name"]].dropna().copy()
df_train_2 = df_train_2.sample(n=1000, random_state=42).reset_index(drop=True)

def format_sample_2(row):
    return (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{row['description']}\n\n"
        f"Title:\n{row['name']}"
    )

df_train_2["text"] = df_train_2.apply(format_sample_2, axis=1)

dataset_2 = Dataset.from_pandas(
    df_train_2[["text"]],
    preserve_index=False
)

In [None]:
MODEL_NAME = "mistralai/Mistral-7B-Instruct-v0.2"

bnb_config_2 = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

# LOAD TOKENIZER FIRST
tokenizer_2 = AutoTokenizer.from_pretrained(MODEL_NAME)

if tokenizer_2.pad_token is None:
    tokenizer_2.pad_token = tokenizer_2.eos_token

# THEN tokenize
MAX_LEN = 256

def tokenize_function_2(example):
    return tokenizer_2(
        example["text"],
        truncation=True,
        max_length=MAX_LEN,
        padding=False,
    )

tokenized_dataset_2 = dataset_2.map(
    tokenize_function_2,
    batched=True,
    remove_columns=["text"],
)

# THEN load model
model_2 = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config_2,
    device_map="auto",
)

model_2 = prepare_model_for_kbit_training(model_2)

In [None]:
MAX_LEN = 256

def tokenize_function_2(example):
    return tokenizer_2(
        example["text"],
        truncation=True,
        max_length=MAX_LEN,
        padding=False,
    )

tokenized_dataset_2 = dataset_2.map(
    tokenize_function_2,
    batched=True,
    remove_columns=["text"],
)

tokenizer_2 = AutoTokenizer.from_pretrained(MODEL_NAME)

if tokenizer_2.pad_token is None:
    tokenizer_2.pad_token = tokenizer_2.eos_token

#### LoRa Config

The LoRA configuration for the second model mirrors the first setup, ensuring that differences in performance primarily stem from optimization settings rather than architectural changes.

In [None]:
lora_config_2 = LoraConfig(
    r=32,  # higher rank
    lora_alpha=64,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

model_2 = get_peft_model(model_2, lora_config_2)
model_2.print_trainable_parameters()

trainable params: 13,631,488 || all params: 7,255,363,584 || trainable%: 0.1879


#### Training Config

The training configuration for Model 2 differs slightly from Model 1. In particular, batch size, gradient accumulation steps, and learning rate are adjusted to explore their impact on convergence behavior and final generation quality.

In [None]:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

training_args_2 = TrainingArguments(
    output_dir="./mistral_lora_airbnb_run2",
    per_device_train_batch_size=2,   # changed
    gradient_accumulation_steps=4,   # changed
    num_train_epochs=2,              # changed
    learning_rate=1e-4,              # changed
    logging_steps=50,
    save_strategy="no",
    fp16=True,
    bf16=False,
    report_to="none",
)

In [None]:
data_collator_2 = DataCollatorForLanguageModeling(
    tokenizer=tokenizer_2,
    mlm=False,
)

In [None]:
trainer_2 = Trainer(
    model=model_2,
    args=training_args_2,
    train_dataset=tokenized_dataset_2,
    data_collator=data_collator_2,
)

In [None]:
trainer_2.train()

  return fn(*args, **kwargs)


Step,Training Loss
50,2.078186
100,1.988707
150,1.920784
200,1.925752
250,1.904663


TrainOutput(global_step=250, training_loss=1.963618408203125, metrics={'train_runtime': 2703.7012, 'train_samples_per_second': 0.74, 'train_steps_per_second': 0.092, 'total_flos': 1.979136751185101e+16, 'train_loss': 1.963618408203125, 'epoch': 2.0})

The training loss decreases steadily across epochs, indicating stable convergence under the modified configuration.

This second model provides an alternative fine-tuned variant for comparison in the evaluation stage.

### B.5 Evaluation

To evaluate the effect of supervised fine-tuning, I generate titles for the same evaluation subset used in Task A. The generation prompt is kept consistent with the training format to ensure alignment.

The table below compares:

- The original title
- The zero-shot baseline output
- Fine-tuned Model 1
- Fine-tuned Model 2

In [None]:
# Mount drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
# Load Models
import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

device = "cuda" if torch.cuda.is_available() else "cpu"

BASE_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"

# ---------- ZERO SHOT MODEL ----------
tokenizer_zero = AutoTokenizer.from_pretrained(BASE_MODEL)
model_zero = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="auto"
)

# ---------- FINETUNED MODEL 1 ----------
PATH_MODEL_1 = "/content/drive/MyDrive/mistral_lora_run1"

tokenizer_1 = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model_1 = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="auto"
)

model_1 = PeftModel.from_pretrained(base_model_1, PATH_MODEL_1)

# ---------- FINETUNED MODEL 2 ----------
PATH_MODEL_2 = "/content/drive/MyDrive/mistral_lora_airbnb_run2"

tokenizer_2 = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model_2 = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="auto"
)

model_2 = PeftModel.from_pretrained(base_model_2, PATH_MODEL_2)

print("All models loaded successfully.")

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Downloading (incomplete total...): 0.00B [00:00, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]



All models loaded successfully.


In [None]:
# Unified Generation Function
def generate_title(model, tokenizer, description):
    prompt = (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{description}\n\n"
        "Title:\n"
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=30,
            temperature=0.8,
            top_p=0.9,
            do_sample=True
        )

    text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return text.split("Title:")[-1].strip()

In [None]:
# load zero shot dataset
data = pd.read_csv("/content/drive/MyDrive/airbnb_tabular.csv")
subset = data[["description", "name"]].dropna().head(10)

In [None]:
# Generate Comparison
results = []

for idx, row in subset.iterrows():
    desc = row["description"]

    title_zero = generate_title(model_zero, tokenizer_zero, desc)
    title_1 = generate_title(model_1, tokenizer_1, desc)
    title_2 = generate_title(model_2, tokenizer_2, desc)

    results.append({
        "original": row["name"],
        "zero_shot": title_zero,
        "finetuned_1": title_1,
        "finetuned_2": title_2
    })

results_df = pd.DataFrame(results)
results_df.head(10)

In [None]:
SAVE_EVAL_PATH = "/content/drive/MyDrive/evaluation_results.csv"
results_df.to_csv(SAVE_EVAL_PATH, index=False)

print("Saved to:", SAVE_EVAL_PATH)

Saved to: /content/drive/MyDrive/evaluation_results.csv


In [None]:
results_df.head(10)

Unnamed: 0,original,zero_shot,finetuned_1,finetuned_2
0,Holiday London DB Room Let-on going,Relax in a Bright Double Bedroom in Finsbury P...,"Bright and comfy room in Finsbury Park, Centra...","Bright & Cozy Double Bedroom, Finsbury Park (Z..."
1,Superb 3-Bed/2 Bath & Wifi: Trendy W1,Contemporary 3-Bedroom Apartment in Central Fi...,"Superb central modern 3-bed 2-bath apartment, ...",Wonderful 3-bed apartment in central Fitzrovia...
2,Clean big Room in London (Room 1),Spacious Double Room with Kitchen Access in Qu...,Large double room in London 6 months or more. ...,Big room to let 7 minutes from tube station. 2...
3,Kew Gardens 3BR house in cul-de-sac,Charming 3-Bed House Near Thames River & Kew G...,3 bed house in Kew with garden near Thames riv...,3 Bed House in residential Kew area (near Tham...
4,Stylish bedsit in Notting Hill ish flat.,Quiet & Convenient Bedsit Room Near Westbourne...,"Private lockable bedsit room near tube, buses ...",Private bed sit room (North Kensington) £625 p...
5,Clean big Room in London (Room 2),"Spacious Double Room in Quiet, Clean, Friendly...","Spacious Double Room to let in NW London, Gold...",Double Room to let in Brent Cross 7 minutes to...
6,Room with a view zone 1 Central Bankside,Bright and Calm Central London Room with Stunn...,Luminous room in Central London. Shard view. V...,"Spacious, light room in central London."
7,Room in maisonette in chiswick,Charming Double Room with Balcony near Chiswic...,Double room on 1st floor with own bathroom and...,Double room in 1st floor flat with w.c. & show...
8,You Will Save Money Here,"""Step Back in Time: A Beautiful Victorian Home...","Blenheim Lodge 1878, 10 minutes to tube statio...",Blenheim Lodge B&B in London - Double Room 1 2...
9,2 Double bed apartment in quiet area North London,"Charming Ground Floor Apartment with Garden, P...",Cosy 2 Bed Apartment with Private Garden. 250m...,"Cosy home in Mill Hill, North London, 2 Bedroo..."


### B.6 Discussion

The results demonstrate that supervised fine-tuning substantially improves stylistic alignment and structural consistency compared to the zero-shot baseline.

While the zero-shot model produces fluent and often creative titles, it shows greater variability in format and occasionally includes overly promotional or verbose phrasing. In contrast, both fine-tuned models generate more concise and structured outputs that better reflect Airbnb-style naming conventions.

Fine-tuning appears to improve:

- Consistency in title structure
- Inclusion of relevant attributes (location, property type, room size)
- Reduction of unnecessary stylistic variation

Model 2 exhibits slightly stronger structural regularity and more focused phrasing than Model 1. This suggests that the modified training configuration (batch size, gradient accumulation, learning rate, and number of epochs) influences convergence behavior and output control.

These findings highlight the effectiveness of LoRA-based supervised fine-tuning under memory-efficient 4-bit quantization. Even without updating the full backbone model, adapter training is sufficient to meaningfully adapt generation behavior to a domain-specific task.

However, increased structural consistency may come at the cost of reduced stylistic diversity. This trade-off between creativity and alignment is typical in supervised fine-tuning and should be considered when optimizing generation systems.

Overall, supervised fine-tuning provides more controlled, consistent, and domain-aligned outputs than prompt-only optimization for this task.

# Task C - Human Preference Labeling

In this task, I construct a human preference dataset to be used for Direct Preference Optimization (DPO) in Task D.

The objective is to collect pairwise comparisons between alternative title generations for the same listing description. For each description, two diverse candidate titles are generated using the best-performing supervised fine-tuned model from Task B. I then manually select the preferred title based on predefined quality criteria.

This dataset captures human judgments about stylistic alignment, clarity, and relevance, and serves as supervision for preference-based optimization.

### C.1 Selecitng Listings
To construct the preference dataset, I randomly sample 50 Airbnb listing descriptions from the full dataset.

A sample size of 50 provides a manageable labeling workload while still covering diverse listing types (e.g., rooms, apartments, different locations). Although larger datasets (100–150 examples) would improve coverage of edge cases, 50 examples are sufficient to construct a meaningful proof-of-concept preference dataset for DPO training.

In [None]:
import pandas as pd

# Load dataset
data = pd.read_csv("/content/drive/MyDrive/airbnb_tabular.csv")

# Select 50 listings for preference labeling
subset_pref = (
    data[["description", "name"]]
    .dropna()
    .sample(50, random_state=42)
    .reset_index(drop=True)
)

subset_pref.head()

Unnamed: 0,description,name
0,"The room is super central, with great connecti...",Central & airy room at King's Cross / Bloomsbury
1,A lovely one bedroom apartment in the heart of...,"Spacious, Quiet & 10min from Victoria"
2,As I spend a lot of time travelling abroad I h...,Well Equipped Rooftop Apartment in Shoreditch
3,"Nice, quiet single room on the second floor 5 ...",Quiet Single Room (Zone 1) in cool East London
4,"My room is in the Heart of Camden Town, right ...",Bright Single Room in Camden Town


In [4]:
!pip install -U bitsandbytes



In [None]:
import bitsandbytes as bnb
print(bnb.__version__)

0.49.1


In [None]:
import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

BASE_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4"
)

PATH_MODEL_2 = "/content/drive/MyDrive/mistral_lora_airbnb_run2"

tokenizer_2 = AutoTokenizer.from_pretrained(BASE_MODEL)

base_model_2 = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto"
)

model_2 = PeftModel.from_pretrained(base_model_2, PATH_MODEL_2)

print("Model 2 loaded successfully in 4-bit.")

### C.2 Generating Two Diverse Title Candidates

To encourage diversity between the two title candidates, I use stochastic decoding with temperature sampling and nucleus sampling (top-p). By enabling `do_sample=True` and adjusting temperature and top-p values, the model produces stylistically varied but semantically consistent alternatives for the same description.

This diversity is crucial for meaningful preference comparisons.


In [None]:
import torch

def generate_two_titles(model, tokenizer, description):
    prompt = (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{description}\n\n"
        "Title:"
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=30,
            temperature=0.9,
            top_p=0.95,
            do_sample=True,
            num_return_sequences=2
        )

    titles = []
    for output in outputs:
        text = tokenizer.decode(output, skip_special_tokens=True)
        title = text.split("Title:")[-1].strip()
        titles.append(title)

    return titles[0], titles[1]

In [None]:
import pandas as pd

data = pd.read_csv("/content/drive/MyDrive/airbnb_tabular.csv")

subset_pref = (
    data[["description", "name"]]
    .dropna()
    .sample(50, random_state=42)
    .reset_index(drop=True)
)

print("Subset size:", len(subset_pref))
subset_pref.head()

Subset size: 50


Unnamed: 0,description,name
0,"The room is super central, with great connecti...",Central & airy room at King's Cross / Bloomsbury
1,A lovely one bedroom apartment in the heart of...,"Spacious, Quiet & 10min from Victoria"
2,As I spend a lot of time travelling abroad I h...,Well Equipped Rooftop Apartment in Shoreditch
3,"Nice, quiet single room on the second floor 5 ...",Quiet Single Room (Zone 1) in cool East London
4,"My room is in the Heart of Camden Town, right ...",Bright Single Room in Camden Town


In [None]:
len(subset_pref)

50

### C.3 Candidate Table

For each listing description, I generate two title candidates and store them in a table.

This table will later be used for manual labeling, where I select the preferred title for each description.

In [None]:
preference_data = []

for idx, row in subset_pref.iterrows():
    desc = row["description"]

    title_a, title_b = generate_two_titles(model_2, tokenizer_2, desc)

    preference_data.append({
        "description": desc,
        "candidate_1": title_a,
        "candidate_2": title_b
    })

pref_df = pd.DataFrame(preference_data)

pref_df.head()

#### Save Results
After generating two title candidates for each listing description,
I store the results in a table.

This table will now be used for manual preference labeling, where I select
the preferred title for each listing. The labeled data will later serve
as input for DPO fine-tuning.

In [None]:
# Save candidate table
SAVE_PREF_PATH = "/content/drive/MyDrive/preference_candidates.csv"
pref_df.to_csv(SAVE_PREF_PATH, index=False)

print("Preference candidate file saved to:", SAVE_PREF_PATH)

Preference candidate file saved to: /content/drive/MyDrive/preference_candidates.csv


For each description, I store the two generated titles in a structured table. This allows transparent inspection of candidate pairs before labeling and ensures reproducibility of the preference dataset.

### C.4 Manual Preference Labeling

For each listing description, I manually compare the two generated title candidates and select the preferred one.

The selection is based on the following criteria:

- Relevance to the listing description
- Clarity and fluency
- Conciseness
- Alignment with Airbnb-style naming conventions
- Absence of formatting artifacts or awkward phrasing

If both titles are very similar, I select the one that is more concise or structurally consistent. If a title appears unsuitable, this is noted during labeling.

This process produces pairwise preference annotations required for DPO training.

Preference Encoding:
- 1 → Candidate 1 preferred
- 2 → Candidate 2 preferred
- 0 → Tie / both unsuitable

This creates a human preference dataset that will later be used for DPO training.

In [None]:
# Create preference column (initialize empty)
pref_df["preference"] = None

pref_df.head()

Unnamed: 0,description,candidate_1,candidate_2,preference
0,"The room is super central, with great connecti...",Cute room in great central London neighbourhoo...,"Great location, perfect for two friends! - Bed...",
1,A lovely one bedroom apartment in the heart of...,Beautiful one bed flat in Central London Zone ...,"Lovely, quiet one bed flat in Central London! ...",
2,As I spend a lot of time travelling abroad I h...,"Spacious Double Room, Zone 1, Central London, ...",Modern stylish room in the heart of Shoreditch...,
3,"Nice, quiet single room on the second floor 5 ...","Awesome, bright single room in Shoreditch, coo...",Cosy private room in cool East London house.,
4,"My room is in the Heart of Camden Town, right ...",Giant room in Camden Town! Big bed & loads of ...,Private Room - Camden Town. Central London Loc...,


### C.5 Creating Preference Dataset for DPO

Using the manually assigned preference labels, I construct a dataset in the format required for Direct Preference Optimization (DPO).

Each entry contains:

- `prompt`: The instruction-formatted listing description
- `chosen`: The preferred title
- `rejected`: The non-preferred title

This format enables preference-based fine-tuning where the model is encouraged to assign higher likelihood to the chosen response compared to the rejected one.

In [None]:
pref_df = pd.read_csv("/content/drive/MyDrive/preference_candidates.csv")
pref_df.head()

Unnamed: 0,description,candidate_1,candidate_2,Preference
0,"The room is super central, with great connecti...",Cute room in great central London neighbourhoo...,"Great location, perfect for two friends! - Bed...",1
1,A lovely one bedroom apartment in the heart of...,Beautiful one bed flat in Central London Zone ...,"Lovely, quiet one bed flat in Central London! ...",2
2,As I spend a lot of time travelling abroad I h...,"Spacious Double Room, Zone 1, Central London, ...",Modern stylish room in the heart of Shoreditch...,1
3,"Nice, quiet single room on the second floor 5 ...","Awesome, bright single room in Shoreditch, coo...",Cosy private room in cool East London house.,2
4,"My room is in the Heart of Camden Town, right ...",Giant room in Camden Town! Big bed & loads of ...,Private Room - Camden Town. Central London Loc...,1


In [None]:
dpo_data = []

for _, row in pref_df.iterrows():

    prompt = (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{row['description']}\n\n"
        "Title:"
    )

    if row["Preference"] == 1:
        chosen = row["candidate_1"]
        rejected = row["candidate_2"]
    else:
        chosen = row["candidate_2"]
        rejected = row["candidate_1"]

    dpo_data.append({
        "prompt": prompt,
        "chosen": chosen,
        "rejected": rejected
    })

dpo_df = pd.DataFrame(dpo_data)

dpo_df.head()

In [None]:
dpo_df.to_csv("/content/drive/MyDrive/dpo_dataset.csv", index=False)

In this task, I constructed a human preference dataset consisting of 50 pairwise comparisons between alternative title generations produced by the stronger supervised fine-tuned model (Model 2). Diversity between candidates was induced through stochastic decoding (temperature and nucleus sampling).

Each example contains an instruction-formatted prompt together with a preferred (“chosen”) and non-preferred (“rejected”) title. The preference labels were manually assigned based on relevance, clarity, conciseness, and stylistic alignment with Airbnb conventions.

This dataset provides explicit human supervision in the form required for Direct Preference Optimization (DPO), enabling the model in the next task to learn from relative preferences rather than only next-token prediction.

# Task D
### D.1 Conceptual Overview of Direct Preference Optimization

Direct Preference Optimization (DPO) is a preference-based fine-tuning method that aligns a language model with human judgments without requiring an explicit reward model.

Instead of learning via next-token prediction (as in supervised fine-tuning), DPO directly optimizes the model to assign higher likelihood to preferred responses (“chosen”) compared to less preferred alternatives (“rejected”) for the same prompt.

Formally, DPO transforms pairwise human preferences into a classification-style objective that encourages:
	•	Increased probability of preferred outputs
	•	Decreased probability of rejected outputs

In this assignment, the manually constructed preference dataset from Task C provides the supervision signal for DPO training.
Unlike reinforcement learning with a learned reward model, DPO directly optimizes the policy using implicit preference comparisons, simplifying the alignment pipeline while maintaining theoretical grounding in KL-regularized policy optimization.

### D.2 Preparing the Preference Dataset

The manually labeled dataset from Task C is structured into triples of the form:
-	prompt: Instruction-formatted listing description
-	chosen: Human-preferred title
-	rejected: Non-preferred title

This format is required by the TRL DPOTrainer.

The dataset is loaded from CSV and converted into a Hugging Face Dataset object to ensure compatibility with the training pipeline.

This explicit structuring ensures that preference comparisons are preserved during optimization.

In [None]:
import pandas as pd

dpo_df = pd.read_csv("/content/drive/MyDrive/dpo_dataset.csv")

dpo_df.head()

Unnamed: 0,prompt,chosen,rejected
0,You are an expert Airbnb copywriter.\n\nDescri...,Cute room in great central London neighbourhoo...,"Great location, perfect for two friends! - Bed..."
1,You are an expert Airbnb copywriter.\n\nDescri...,"Lovely, quiet one bed flat in Central London! ...",Beautiful one bed flat in Central London Zone ...
2,You are an expert Airbnb copywriter.\n\nDescri...,"Spacious Double Room, Zone 1, Central London, ...",Modern stylish room in the heart of Shoreditch...
3,You are an expert Airbnb copywriter.\n\nDescri...,Cosy private room in cool East London house.,"Awesome, bright single room in Shoreditch, coo..."
4,You are an expert Airbnb copywriter.\n\nDescri...,Giant room in Camden Town! Big bed & loads of ...,Private Room - Camden Town. Central London Loc...


In [None]:
from datasets import Dataset

dpo_dataset = Dataset.from_pandas(dpo_df)

dpo_dataset

Dataset({
    features: ['prompt', 'chosen', 'rejected'],
    num_rows: 50
})

### D.3 Model Initialization and 4-bit Loading

To minimize computational cost, I load the instruction-tuned base model using:
-	4-bit quantization (BitsAndBytes)
-	bfloat16 computation
-	Device mapping for GPU usage

The previously trained LoRA adapters from supervised fine-tuning serve as the initialization point for DPO training.

This ensures that DPO builds on an already stylistically aligned model rather than starting from a generic base model.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
from trl import DPOTrainer, DPOConfig

BASE_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"
SFT_PATH = "/content/drive/MyDrive/mistral_lora_airbnb_run2"

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)

# Load base model in 4-bit
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto"
)

# Load SFT LoRA weights
model = PeftModel.from_pretrained(base_model, SFT_PATH)

print("SFT model loaded in 4-bit successfully.")

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

SFT model loaded in 4-bit successfully.


In [None]:
import torch

# 1) Make generation/training stable
model.config.use_cache = False  # IMPORTANT for training
model.gradient_checkpointing_disable()  # disable checkpointing to avoid metadata mismatch

# 2) Make padding consistent
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# 3) Optional: avoid torch.compile weirdness (safe default)
torch._dynamo.config.suppress_errors = True

### D.4 Setup DPO Training Configuration

DPO training is implemented using the TRL library via the `DPOTrainer`. The configuration specifies hyperparameters that control preference strength, optimization stability, and training duration.

In particular:
- `beta` controls the strength of preference enforcement.
- A small batch size is used due to limited GPU memory.
- The number of epochs is kept modest given the dataset size (50 examples) to reduce overfitting.

In [None]:
from trl import DPOConfig

training_args = DPOConfig(
    output_dir="/content/drive/MyDrive/mistral_dpo_airbnb",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=5e-6,
    num_train_epochs=3,
    logging_steps=5,
    save_strategy="epoch",
    beta=0.1,

    #  make sequence shapes stable
    max_length=256,
    max_prompt_length=192,

    # ensure no checkpointing
    gradient_checkpointing=False,
)

  training_args = DPOConfig(


In [None]:
from trl import DPOTrainer

trainer = DPOTrainer(
    model=model,
    args=training_args,
    train_dataset=dpo_dataset,
    processing_class=tokenizer,
)

trainer.train()

In this step, I fine-tuned the supervised LoRA model using Direct Preference Optimization (DPO) on the manually constructed preference dataset with 50 examples.

The dataset consisted of (prompt, chosen, rejected) triples derived from human preference annotations.

### D.5 Evaluation Preparation

After training, the DPO-adapted model is evaluated alongside:
-	The zero-shot base model
-	The supervised fine-tuned (SFT) model

This comparison allows assessment of whether preference-based fine-tuning improves:
-	Relevance
-	Fluency
-	Conciseness
-	Stylistic attractiveness
-	Structural consistency

The results of this comparison are presented in Task E.

In [None]:
def generate_title(model, tokenizer, description):
    prompt = (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{description}\n\n"
        "Title:"
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    output = model.generate(
        **inputs,
        max_new_tokens=40,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )

    decoded = tokenizer.decode(output[0], skip_special_tokens=True)

    # Extract only the generated part after "Title:"
    if "Title:" in decoded:
        title = decoded.split("Title:")[-1].strip()
    else:
        title = decoded[len(prompt):].strip()

    return title

# Task E: Evaluation

This section evaluates the incremental impact of supervised fine-tuning and Direct Preference Optimization (DPO) on title generation quality.

To assess the differences between models, I compare:

- the zero-shot base model,
- the supervised fine-tuned (SFT) model, and
- the DPO fine-tuned model.

For a small subset of listing descriptions (5–10 examples), I generate titles using each model and compare the outputs side-by-side.

The evaluation is conducted along the following qualitative dimensions:

- Relevance to the listing description  
- Clarity and fluency  
- Conciseness  
- Creativity and attractiveness  
- Uniqueness  
- Absence of errors  

This structured comparison enables an assessment of whether supervised learning and preference-based fine-tuning improve alignment with human expectations and marketplace-style writing.

In [8]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

BASE_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"

tokenizer_zero = AutoTokenizer.from_pretrained(BASE_MODEL)

model_zero = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

print("Zero-shot model loaded.")

from peft import PeftModel

PATH_MODEL_2 = "/content/drive/MyDrive/mistral_lora_airbnb_run2"

tokenizer_2 = AutoTokenizer.from_pretrained(BASE_MODEL)

base_model_2 = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

model_2 = PeftModel.from_pretrained(base_model_2, PATH_MODEL_2)

print("Supervised fine-tuned model loaded.")

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]



Zero-shot model loaded.


Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

Supervised fine-tuned model loaded.


In [11]:
PATH_DPO = "/content/drive/MyDrive/mistral_dpo_airbnb/checkpoint-21"

tokenizer_dpo = AutoTokenizer.from_pretrained(BASE_MODEL)

base_model_dpo = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

from peft import PeftModel
dpo_model = PeftModel.from_pretrained(base_model_dpo, PATH_DPO)

print("DPO model loaded successfully.")

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

DPO model loaded successfully.


### E.2 Sample Selection for Qualitative Comparison

To ensure a fair and manageable qualitative evaluation, I randomly select a small subset of listing descriptions from the labeled dataset. A fixed random seed is used to guarantee reproducibility of the selected examples.

This subset provides representative cases across different property types and locations, allowing for structured comparison without introducing selection bias.

In [None]:
subset_pref = pd.read_csv("/content/drive/MyDrive/preference_candidates.csv")

In [None]:
eval_subset = subset_pref.sample(5, random_state=123).reset_index(drop=True)
eval_subset

Unnamed: 0,description,candidate_1,candidate_2,Preference
0,"Greetings, This Bright unique room is a perfec...",Beautiful private room with double bed in Sout...,Bright spacious single room. Brockwell park. ...,2
1,A very modern semi- detached home in Camden To...,Large double room in Camden Town. 5 min walk t...,Lovely double room in Camden Town. :),2
2,"A bright, clean home close to the range of sho...",Spacious double bedroom next door to modern ba...,Comfortable double bedroom in SW20 210m to tub...,2
3,Pretty Victorian end of terrace house in a lov...,Family friendly house w. 3 large double bedroo...,Beautiful House in a Great Location in Brixton!,1
4,The space A good sized Twin Bedroom on the fir...,Spacious Twin Bedroom Near Underground! Parkin...,Twin Room in a Lovely Edwardian House in Zone ...,2


### E.3 Side-by-Side Model Output Comparison

For each selected listing description, I generate titles using the zero-shot base model, the supervised fine-tuned (SFT) model, and the DPO-refined model.

The outputs are collected in a structured table to enable direct side-by-side comparison. This format facilitates systematic evaluation of differences in content prioritization, stylistic quality, and alignment with the original description.

In [None]:
comparison_results = []

for idx, row in eval_subset.iterrows():
    desc = row["description"]

    zero_title = generate_title(model_zero, tokenizer_zero, desc)
    sft_title = generate_title(model_2, tokenizer_2, desc)
    dpo_title = generate_title(dpo_model, tokenizer_2, desc)

    comparison_results.append({
        "description": desc,
        "zero_shot": zero_title,
        "sft_model": sft_title,
        "dpo_model": dpo_title
    })

comparison_df = pd.DataFrame(comparison_results)
comparison_df

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Unnamed: 0,description,zero_shot,sft_model,dpo_model
0,"Greetings, This Bright unique room is a perfec...","Cozy, Bright Room in Quiet Tree-lined Street N...",Bright Room in Central London - 5 Mins to Brix...,Bright and Cosy Room in Brixton! 10mins to Cen...
1,A very modern semi- detached home in Camden To...,Modern Semi-Detached Home in Camden Town with ...,"Large double room in Camden Town, NW1. 10 min ...","Modern, large double room in Camden Town. Clos..."
2,"A bright, clean home close to the range of sho...","Bright, Spacious Double Room in Peaceful Wimbl...",Spacious Double Room with Modern Bathroom in W...,"Spacious double room in Wimbledon, close to to..."
3,Pretty Victorian end of terrace house in a lov...,Charming Victorian Retreat in Brixton with Pat...,"Lovely Victorian house in Brixton, 5 mins walk...",Bright 3 Bedroom House in Brixton/Herne Hill -...
4,The space A good sized Twin Bedroom on the fir...,Charming Twin Bedroom in Quiet Edwardian House...,Lovely Twin Bedroom in a quiet tree lined road...,Twin Room Bounds Green North London N22 2DG


### E.4 Qualitative Evaluation of Model Performance


Below, I evaluate the outputs according to the required qualitative criteria.


#### Relevance to the Listing Description

The zero-shot model often produced generic titles that loosely reflected the listing content. While location or property type was sometimes mentioned, important selling points (e.g., proximity to transport, distinctive amenities, quiet surroundings) were inconsistently emphasized.

The SFT model demonstrated improved relevance. It more reliably captured:
- Location information (e.g., Camden, Brixton, Wimbledon),
- Property type (e.g., double room, Victorian house),
- Key amenities (e.g., garden, quiet street, near transport).

The DPO-refined model showed the strongest alignment with the descriptions. It consistently highlighted distinctive selling points and better prioritized information that would likely matter to potential guests.

Overall, preference-based fine-tuning improved the model’s ability to reflect core listing attributes.


<br>


#### Clarity and Fluency

All models generated grammatically correct titles.

However:
- The zero-shot outputs sometimes appeared templated or slightly mechanical.
- The SFT model produced smoother phrasing and more natural Airbnb-style wording.
- The DPO model demonstrated the most polished and human-like phrasing, with improved flow and more consistent stylistic tone.

Preference alignment appears to enhance stylistic coherence beyond supervised fine-tuning alone.



<br>

#### Conciseness

The zero-shot model occasionally generated titles that were either too brief or slightly verbose.

The SFT model achieved better balance between informativeness and brevity.

The DPO model produced concise yet information-rich titles, structuring content efficiently while avoiding redundancy.

No significant verbosity issues were observed in the DPO outputs.

<br>

#### Creativity and Attractiveness

The zero-shot model produced functional but relatively plain titles.

The SFT model introduced more engaging descriptors such as “Charming,” “Spacious,” or “Modern,” increasing attractiveness.

The DPO model generated the most compelling titles. Compared to the other models, it:
- Used descriptive but controlled adjectives,
- Highlighted distinctive features strategically,
- Produced titles that resemble competitive marketplace listings.

This suggests that preference optimization improves not only factual alignment but also marketing effectiveness.


<br>


#### Uniqueness

The zero-shot outputs were somewhat generic and formulaic.

The SFT model improved differentiation by incorporating more listing-specific details.

The DPO model showed the highest degree of uniqueness. Titles more consistently integrated distinctive attributes (e.g., neighborhood characteristics, transport access, standout amenities), reducing template-like repetition.



<br>

#### Absence of Errors

All models produced grammatically correct outputs without major factual inconsistencies.

However:
- The zero-shot model occasionally generated overly generic phrasing.
- The SFT model was more consistent.
- The DPO model exhibited the most stable stylistic and structural consistency.

No hallucinations or factual contradictions were observed in the evaluated subset.




### Overall Conclusion

The comparison demonstrates a clear progression in output quality:

Zero-shot → Supervised Fine-Tuning → DPO

Supervised fine-tuning improves relevance and fluency, while preference-based fine-tuning (DPO) further enhances:

- Alignment with human expectations,
- Marketing appeal,
- Informational prioritization,
- Stylistic coherence.

Even with a relatively small preference dataset (50 examples), DPO produced noticeable qualitative improvements. This indicates that targeted human feedback can meaningfully refine generative behavior.

# Task F: Deploying models on a Gradio App

In [3]:
!pip install gradio



In [12]:
def generate_title_clean(model, tokenizer, description):
    prompt = (
        "You are an expert Airbnb copywriter.\n\n"
        f"Description:\n{description}\n\n"
        "Title:"
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    output = model.generate(
        **inputs,
        max_new_tokens=40,
        do_sample=True,
        temperature=0.7,
        top_p=0.9
    )

    decoded = tokenizer.decode(output[0], skip_special_tokens=True)

    # Remove prompt part
    title = decoded[len(prompt):].strip()

    return title

In [13]:
def compare_models(description):
    zero = generate_title_clean(model_zero, tokenizer_zero, description)
    sft = generate_title_clean(model_2, tokenizer_2, description)
    dpo = generate_title_clean(dpo_model, tokenizer_2, description)

    return zero, sft, dpo

In [14]:
import gradio as gr

iface = gr.Interface(
    fn=compare_models,
    inputs=gr.Textbox(lines=6, placeholder="Enter Airbnb listing description here..."),
    outputs=[
        gr.Textbox(label="Zero-Shot Model"),
        gr.Textbox(label="Supervised Fine-Tuned Model (SFT)"),
        gr.Textbox(label="Preference-Aligned Model (DPO)")
    ],
    title="Airbnb Title Generator Comparison",
    description="Compare title suggestions from Zero-shot, SFT, and DPO models."
)

iface.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://266b8c96eaac364435.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## Deployment and Interactive Demo

In addition to the notebook-based evaluation, I deployed a lightweight interactive version of the model comparison interface using TinyLlama on Hugging Face Spaces.

The public demo is available here:

https://huggingface.co/spaces/candyledger/airbnb-title-generator

This deployment allows real-time comparison between the zero-shot, supervised fine-tuned (SFT), and preference-aligned (DPO) models.

In case the public link expires or becomes temporarily unavailable, the full implementation and all experiments remain reproducible within this notebook.

In this task, I deployed all three models (zero-shot, SFT, and DPO) using a Gradio web interface. The app allows users to input an Airbnb listing description and compare generated titles across the three training strategies. This interactive setup enables direct qualitative comparison of model behavior and alignment performance.