# Fine-tune Recommender LLM with LoRA/PEFT

This notebook fine-tunes a small chat model to produce grounded, JSON-formatted recommendations using synthetic SFT data derived from our reviews + retrieval. Artifacts are saved to `models/rag_llm/` and can be used by the merged `finetune_rag_llm.ipynb` notebook.

## 🚀 **Clean Workflow - Run Steps in Order:**

1. **Step 0** - Setup & retrieval artifacts
2. **Step 1** - Configuration + Convert training data to chat format
3. **Step 2** - Build synthetic SFT dataset  
4. **Step 3** - Tokenizer/model setup
5. **Step 4** - **Fine-tune model** ⚠️ (6-8 hours)
6. **Step 5** - Save final model
7. **Step 6** - **Test the model** ✅

## ⚠️ **Important Notes:**
- **Step 4** fine-tunes the model to produce working JSON responses
- **Step 6** tests the fine-tuned model
- All training cells have been cleaned up for clarity
- Model saves to `models/rag_llm/` directory


### Step 0 — Ensure retrieval artifacts exist

- Builds FAISS index and metadata if missing (uses the same pipeline as the merged notebook)
- Required for synthesizing SFT data from retrieval context


In [15]:
from pathlib import Path
import json
import numpy as np
import pandas as pd
from sentence_transformers import SentenceTransformer # type: ignore

try:
    import faiss  # type: ignore
except ImportError:
    raise SystemExit("faiss is required. Install with `pip install faiss-cpu` on Windows.")

import pyarrow.parquet as pq
from tqdm import tqdm

from huggingface_hub import snapshot_download
EMBED_MODEL = 'sentence-transformers/all-MiniLM-L6-v2'
EMBED_MODEL_DIR = snapshot_download(repo_id=EMBED_MODEL)
embedder = SentenceTransformer(EMBED_MODEL_DIR)

DATA_PATH = Path('data/processed/reviews_with_stars.csv')
PROJECT_ROOT = Path.cwd() if (Path.cwd() / 'data').exists() else Path.cwd().parent
INDEX_DIR = PROJECT_ROOT / 'models' / 'rag_llm' / 'step_0'
INDEX_DIR.mkdir(parents=True, exist_ok=True)
INDEX_PATH = INDEX_DIR / 'reviews_all-MiniLM-L6-v2.index'
METADATA_PATH = PROJECT_ROOT / 'data' / 'rag_llm' / 'processed' / 'review_metadata.parquet'
METADATA_PATH.parent.mkdir(parents=True, exist_ok=True)
MANIFEST_PATH = INDEX_DIR / 'manifest.json'
MODEL_NAME = 'sentence-transformers/all-MiniLM-L6-v2'
BATCH_SIZE = 256

embedder = SentenceTransformer(MODEL_NAME)
embedding_dim = embedder.get_sentence_embedding_dimension()


def build_index(force_rebuild: bool = False):
    needs_build = force_rebuild or not (INDEX_PATH.exists() and METADATA_PATH.exists() and MANIFEST_PATH.exists())
    if not needs_build:
        print('Index and metadata already exist. Skipping build.')
        return
    # Locate project base (folder containing data/processed), then pick labeled file
    def _find_base_dir(start: Path) -> Path:
        if (start / 'data' / 'generate_stars' / 'processed').exists():
            return start
        for parent in start.parents:
            if (parent / 'data' / 'generate_stars' / 'processed').exists():
                return parent
        return start

    BASE_DIR = _find_base_dir(Path.cwd())
    candidates = [
        BASE_DIR / 'data' / 'generate_stars' / 'processed' / 'reviews_with_stars.csv',
        BASE_DIR / 'data' / 'generate_stars' / 'processed' / 'reviews_with_stars_trained.csv',
    ]
    data_path = next((p for p in candidates if p.exists()), None)
    assert data_path is not None, f"Missing labeled data under {BASE_DIR / 'data' / 'generate_stars' / 'processed'}. Run generate_stars.ipynb first."
    print('Using labeled data at:', data_path)

    df = pd.read_csv(data_path)

    # Ensure stars_float exists (prefer float ratings). Derive from integer 'stars' if needed
    if 'stars_float' not in df.columns:
        if 'stars' in df.columns:
            df['stars_float'] = pd.to_numeric(df['stars'], errors='coerce').astype(float)
        else:
            raise ValueError("No 'stars_float' or 'stars' column in labeled data.")

    # Minimal schema check for remaining fields
    req = ['source', 'place', 'comment']
    missing = [c for c in req if c not in df.columns]
    if missing:
        raise ValueError(f"Labeled data missing columns: {missing}")
    df = df.dropna(subset=['comment']).copy()
    df['comment'] = df['comment'].astype(str).str.strip()
    df = df[df['comment'].str.len() > 0].reset_index(drop=True)

    texts = df['comment'].tolist()
    n = len(texts)
    embeddings = np.empty((n, embedding_dim), dtype='float32')
    for start in tqdm(range(0, n, BATCH_SIZE), total=(n + BATCH_SIZE - 1)//BATCH_SIZE, desc="Embedding"):
        end = min(start + BATCH_SIZE, n)
        batch = texts[start:end]
        emb = embedder.encode(batch, batch_size=64, show_progress_bar=False, convert_to_numpy=True, normalize_embeddings=True)
        embeddings[start:end] = emb

    index = faiss.IndexFlatIP(embedding_dim)
    index.add(embeddings)
    faiss.write_index(index, str(INDEX_PATH))

    metadata = df[['source', 'place', 'comment', 'stars_float']].copy()
    metadata.insert(0, 'row_id', np.arange(len(metadata), dtype=np.int64))
    METADATA_PATH.parent.mkdir(parents=True, exist_ok=True)
    metadata.to_parquet(METADATA_PATH, index=False)

    manifest = {
        'model': MODEL_NAME,
        'embedding_dim': int(embedding_dim),
        'index_type': 'IndexFlatIP',
        'index_path': str(INDEX_PATH),
        'metadata_path': str(METADATA_PATH),
        'num_vectors': int(index.ntotal)
    }
    with open(MANIFEST_PATH, 'w', encoding='utf-8') as f:
        json.dump(manifest, f, ensure_ascii=False, indent=2)
    print('Built index and metadata.')


# Ensure ready
build_index(force_rebuild=False)



Index and metadata already exist. Skipping build.


### Step 1 — Setup & configuration

- Choose a small instruction model (CPU-friendly)
- Define output directory `models/rag_llm/`
- Reuse allowed places + retrieval to synthesize SFT data


In [16]:
import os
import json
from pathlib import Path
import random

import numpy as np
import pandas as pd

from sentence_transformers import SentenceTransformer
try:
    import faiss  # type: ignore
except ImportError:
    raise SystemExit("faiss is required. Install with `pip install faiss-cpu` on Windows.")
import pyarrow.parquet as pq

from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    TrainingArguments,
    Trainer,
)

# PEFT / LoRA
try:
    from peft import LoraConfig, get_peft_model, PeftModel
except ImportError:
    raise SystemExit("peft is required. Install with `pip install peft`.")

BASE_MODEL = 'Qwen/Qwen3-0.6B'  # Qwen 0.6B for 8GB VRAM
OUTPUT_DIR = (Path.cwd() if (Path.cwd() / 'data').exists() else Path.cwd().parent) / 'models' / 'rag_llm'
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

PROJECT_ROOT = Path.cwd() if (Path.cwd() / 'data').exists() else Path.cwd().parent
INDEX_DIR = PROJECT_ROOT / 'models' / 'rag_llm' / 'step_0'
with open(INDEX_DIR / 'manifest.json', 'r', encoding='utf-8') as f:
    manifest = json.load(f)
faiss_index = faiss.read_index(manifest['index_path'])
md_df = pq.read_table(manifest['metadata_path']).to_pandas()
ALLOWED_PLACES = sorted(md_df['place'].dropna().unique().tolist())
# Ensure stars_float is present in metadata
aassert_col = 'stars_float'
if aassert_col not in md_df.columns:
    raise SystemExit("Expected 'stars_float' in metadata; re-run Step 0 with force_rebuild=True.")

# Embeddings for retrieval context
EMBED_MODEL = 'sentence-transformers/all-MiniLM-L6-v2'
embedder = SentenceTransformer(EMBED_MODEL)

def retrieve(query: str, k: int = 8) -> pd.DataFrame:
    q_emb = embedder.encode([query], convert_to_numpy=True, normalize_embeddings=True).astype('float32')
    scores, idx = faiss_index.search(q_emb, k)
    hits = []
    for i, s in zip(idx[0], scores[0]):
        if i == -1:
            continue
        row = md_df.iloc[int(i)].to_dict()
        row['score'] = float(s)
        # For convenience, expose a float alias for downstream
        row['stars'] = float(row.get('stars_float', float('nan')))
        hits.append(row)
    return pd.DataFrame(hits)



### Step 2 — Build synthetic SFT dataset

- Create input/output pairs using retrieval context
- Inputs: system prompt + allowed places + user query + review context
- Targets: JSON with `recommended_places`, `reasoning`, and `citations`
- Saves `data/rag_llm/rag_sft.jsonl`


In [17]:
SYSTEM_PROMPT = (
    "You are a travel recommendation assistant for Australian destinations. "
    "Recommend only places from Allowed Places. Ground answers in the context. "
    "Respond with JSON: {recommended_places: [..], reasoning: str, citations: [{place, source, stars, snippet}]}"
)

# Target size (adjustable)
NUM_EXAMPLES_TARGET = 10000
RNG_SEED = 42
random.seed(RNG_SEED)

# Query templates and facets
activities = [
    "short hikes", "lookouts", "waterfalls", "swimming spots", "wildlife",
    "cultural experiences", "sunset views", "sunrise views", "family-friendly walks",
    "night sky views", "quiet camping", "scenic drives"
]
modifiers = [
    "easy", "moderate", "kid-friendly", "photogenic", "less crowded", "near facilities"
]
intents = []
for _ in range(300):
    a = random.choice(activities)
    m = random.choice(modifiers)
    intents.append(f"{m} {a}")

# Ensure each allowed place appears by creating place-focused intents
place_anchored = [f"best things to do at {p}" for p in ALLOWED_PLACES]
USER_QUERIES = list(dict.fromkeys(intents + place_anchored))

OUT_PATH = PROJECT_ROOT / 'data' / 'rag_llm' / 'processed' / 'rag_sft.jsonl'
OUT_PATH.parent.mkdir(parents=True, exist_ok=True)

def build_context(hits: pd.DataFrame, max_chars: int = 1200, max_rows: int = 8) -> str:
    rows = []
    used = 0
    for _, r in hits.head(max_rows).iterrows():
        snippet = str(r['comment'])
        if len(snippet) > 240:
            snippet = snippet[:240] + '...'
        line = f"- Place: {r['place']} | Source: {r['source']} | Stars: {float(r.get('stars', float('nan'))):.1f} | Review: {snippet}"
        if used + len(line) > max_chars:
            break
        rows.append(line)
        used += len(line)
    return "\n".join(rows)

from tqdm import tqdm

num_written = 0
max_rounds = 10000  # hard cap to avoid infinite loop if retrieval becomes empty
with open(OUT_PATH, 'w', encoding='utf-8') as f:
    pbar = tqdm(total=NUM_EXAMPLES_TARGET, desc='SFT synthesis')
    rounds = 0
    while num_written < NUM_EXAMPLES_TARGET and rounds < max_rounds:
        # Shuffle intents each round for diversity
        for q in random.sample(USER_QUERIES, len(USER_QUERIES)):
            # Write up to N variants per intent until we hit the target
            for _ in range(8):
                if num_written >= NUM_EXAMPLES_TARGET:
                    break
                # Randomize retrieval by adding a small jitter and sampling top-k
                k = 12
                hits = retrieve(q, k=k)
                if hits.empty:
                    continue
                # Shuffle to diversify citations/contexts
                hits = hits.sample(frac=1.0, random_state=random.randint(0, 10_000)).reset_index(drop=True)
                context = build_context(hits, max_rows=8)
                if not context:
                    continue
                # Choose top places by mean stars (on the shuffled subset)
                top_places = (
                    hits.groupby('place')['stars']
                        .mean()
                        .sort_values(ascending=False)
                        .head(3)
                        .index.tolist()
                )  # 'stars' is a float alias
                # Build citations subset (best-scored after shuffle)
                cits = []
                for _, r in hits.head(5).iterrows():
                    cits.append({
                        'place': r['place'], 'source': r['source'], 'stars': round(float(r.get('stars', float('nan'))), 1),
                        'snippet': str(r['comment'])[:220]
                    })
                prompt = (
                    f"[SYSTEM]\n{SYSTEM_PROMPT}\n\n"
                    f"[ALLOWED_PLACES]\n{', '.join(ALLOWED_PLACES)}\n\n"
                    f"[USER_QUERY]\n{q}\n\n"
                    f"[REVIEW_CONTEXT]\n{context}\n\n"
                )
                target = {
                    'recommended_places': [p for p in top_places if p in ALLOWED_PLACES],
                    'reasoning': 'Based on reviews and stars for relevance to the query.',
                    'citations': cits,
                }
                f.write(json.dumps({'instruction': prompt, 'output': target}, ensure_ascii=False) + "\n")
                num_written += 1
                pbar.update(1)
                if num_written >= NUM_EXAMPLES_TARGET:
                    break
        rounds += 1
    pbar.close()

print('Wrote SFT dataset:', OUT_PATH, '| examples:', num_written)



SFT synthesis: 100%|██████████| 10000/10000 [01:26<00:00, 115.18it/s]

Wrote SFT dataset: c:\Users\TARIK\Desktop\Charles Darwin University\4 - Year 1 - Semester 2\IT CODE FAIR\Data Science Challenge\data\rag_llm\processed\rag_sft.jsonl | examples: 10000





### Step 3 — Convert training data to proper chat format

In [18]:
import json
import os

def convert_to_chat_format(input_file, output_file):
    """Convert instruction/output format to messages format for proper fine-tuning"""
    
    print(f"Converting {input_file} to chat format...")
    
    # Use absolute paths to avoid working directory issues
    if not os.path.isabs(input_file):
        # Get the project root by looking for the data directory
        current_dir = os.path.abspath('')
        project_root = current_dir
        
        # Walk up directories until we find the data folder
        while project_root != os.path.dirname(project_root):  # Stop at root
            if os.path.exists(os.path.join(project_root, 'data')):
                break
            project_root = os.path.dirname(project_root)
        
        input_file = os.path.join(project_root, input_file)
        output_file = os.path.join(project_root, output_file)
    
    print(f"Using absolute paths:")
    print(f"  Input: {input_file}")
    print(f"  Output: {output_file}")
    
    # Verify input file exists
    if not os.path.exists(input_file):
        raise FileNotFoundError(f"Input file not found: {input_file}")
    
    # Use UTF-8 encoding to handle Unicode characters
    with open(input_file, 'r', encoding='utf-8') as f_in, open(output_file, 'w', encoding='utf-8') as f_out:
        for line_num, line in enumerate(f_in):
            try:
                data = json.loads(line.strip())
                
                # Extract instruction and output
                instruction = data['instruction']
                output = data['output']
                
                # Convert to chat format
                messages = [
                    {"role": "system", "content": "You are a travel recommendation assistant for Australian destinations. Recommend only places from Allowed Places. Ground answers in the context. Respond with JSON: {recommended_places: [..], reasoning: str, citations: [{place, source, stars, snippet}]}"},
                    {"role": "user", "content": instruction},
                    {"role": "assistant", "content": json.dumps(output)}
                ]
                
                # Write new format
                new_data = {"messages": messages}
                f_out.write(json.dumps(new_data) + '\n')
                
                if (line_num + 1) % 100 == 0:
                    print(f"Converted {line_num + 1} examples...")
                    
            except Exception as e:
                print(f"Error processing line {line_num + 1}: {e}")
                continue
    
    print(f"✅ Conversion complete! Saved to {output_file}")

# Convert the training data
convert_to_chat_format('data/rag_llm/processed/rag_sft.jsonl', 'data/rag_llm/processed/rag_sft_chat.jsonl')

Converting data/rag_llm/processed/rag_sft.jsonl to chat format...
Using absolute paths:
  Input: c:\Users\TARIK\Desktop\Charles Darwin University\4 - Year 1 - Semester 2\IT CODE FAIR\Data Science Challenge\data/rag_llm/processed/rag_sft.jsonl
  Output: c:\Users\TARIK\Desktop\Charles Darwin University\4 - Year 1 - Semester 2\IT CODE FAIR\Data Science Challenge\data/rag_llm/processed/rag_sft_chat.jsonl
Converted 100 examples...
Converted 200 examples...
Converted 300 examples...
Converted 400 examples...
Converted 500 examples...
Converted 600 examples...
Converted 700 examples...
Converted 800 examples...
Converted 900 examples...
Converted 1000 examples...
Converted 1100 examples...
Converted 1200 examples...
Converted 1300 examples...
Converted 1400 examples...
Converted 1500 examples...
Converted 1600 examples...
Converted 1700 examples...
Converted 1800 examples...
Converted 1900 examples...
Converted 2000 examples...
Converted 2100 examples...
Converted 2200 examples...
Converted 2

### Step 4 — Tokenizer, model, and LoRA config

- Load base chat model + tokenizer
- Attach LoRA adapters (low‑rank update on attention/projection layers)
- Keep it CPU-friendly (no 8‑bit quantization required)


In [19]:
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

base_model = AutoModelForCausalLM.from_pretrained(BASE_MODEL, trust_remote_code=True)

# Probe module names and select safe LoRA targets
all_names = [n for n, _ in base_model.named_modules()]
# common candidates across LLaMA-like + MobileLLM variants
candidates = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
selected = [n.split(".")[-1] for n in all_names if any(c in n.split(".")[-1] for c in candidates)]
# dedupe and keep only the layer names
selected = sorted(list({s for s in selected if s in candidates}))
if not selected:
    selected = ["q_proj", "k_proj", "v_proj", "o_proj"]

lora_config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules=selected,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters()



trainable params: 10,092,544 || all params: 606,142,464 || trainable%: 1.6650


### Step 5 — Fine-tune model (DONT RUN THIS CELL IF YOU HAVE ALREADY FINAL FOLDER IN MODELS FOLDER)

**What this does**:
- Uses proper chat format data (rag_sft_chat.jsonl)
- Proper training parameters for Qwen model
- Saves to models/rag_llm/ directory
- Will produce working JSON responses instead of garbled text


In [22]:
from datasets import load_dataset
from transformers import TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model, TaskType
import os

# Load the training data
print("Loading training data...")

# Resolve path to avoid working directory issues
data_file = 'data/rag_llm/processed/rag_sft_chat.jsonl'
if not os.path.isabs(data_file):
    # Get the project root by looking for the data directory
    current_dir = os.path.abspath('')
    project_root = current_dir
    
    # Walk up directories until we find the data folder
    while project_root != os.path.dirname(project_root):  # Stop at root
        if os.path.exists(os.path.join(project_root, 'data')):
            break
        project_root = os.path.dirname(project_root)
    
    data_file = os.path.join(project_root, data_file)

print(f"Using data file: {data_file}")
dataset = load_dataset('json', data_files=data_file)['train']

# Split into train/eval
train_size = int(0.9 * len(dataset))
eval_size = len(dataset) - train_size
train_dataset = dataset.select(range(train_size))
eval_dataset = dataset.select(range(train_size, train_size + eval_size))

print(f"Training samples: {len(train_dataset)}")
print(f"Evaluation samples: {len(eval_dataset)}")


# Preprocess: tokenize chat messages into input_ids
from transformers import default_data_collator
MAX_LEN = 512  # was 1024 for speed

# Enable TF32 on supported GPUs and move model to device
import torch
try:
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True
except Exception:
    pass
_device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(_device)

def preprocess(example):
    prompt = tokenizer.apply_chat_template(
        example["messages"],
        tokenize=False,
        add_generation_prompt=False,
        enable_thinking=False,
    )
    enc = tokenizer(prompt, truncation=True, max_length=MAX_LEN)
    enc["labels"] = enc["input_ids"].copy()
    return enc

dataset_tok = dataset.map(preprocess, remove_columns=dataset.column_names)
train_tok = dataset_tok.select(range(train_size))
eval_tok = dataset_tok.select(range(train_size, train_size + eval_size))

data_collator = default_data_collator

# Training arguments with better settings
training_args = TrainingArguments(
    output_dir=str(OUTPUT_DIR),
    num_train_epochs=3,
    per_device_train_batch_size=2,            # try 2–4 if VRAM allows
    per_device_eval_batch_size=2,
    warmup_steps=50,
    learning_rate=1e-4,
    logging_steps=50,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    report_to="tensorboard",
    logging_dir=str(OUTPUT_DIR / "tb"),
    remove_unused_columns=False,
    dataloader_num_workers=4,
    dataloader_pin_memory=True,
    dataloader_prefetch_factor=2,
    optim="adamw_torch_fused" if torch.cuda.is_available() else "adamw_torch",
    tf32=True,
)

# Create trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_tok,
    eval_dataset=eval_tok,
    data_collator=data_collator,
    tokenizer=tokenizer,
)

# Start training
print("Starting fine-tuning...")
trainer.train()

# Save the final model
final_dir = OUTPUT_DIR / "final"
trainer.save_model(str(final_dir))
tokenizer.save_pretrained(str(final_dir))

print("✅ Fine-tuning complete!")



Loading training data...
Using data file: c:\Users\TARIK\Desktop\Charles Darwin University\4 - Year 1 - Semester 2\IT CODE FAIR\Data Science Challenge\data/rag_llm/processed/rag_sft_chat.jsonl
Training samples: 9000
Evaluation samples: 1000
Starting fine-tuning...


  trainer = Trainer(


Epoch,Training Loss,Validation Loss
1,0.0441,0.043223
2,0.0342,0.034483
3,0.0317,0.031618


✅ Fine-tuning complete!


### Step 6 — Save final model


In [23]:
# Step 5 — Save final model bundle (run this AFTER training)
import os, torch
final_dir = OUTPUT_DIR / 'final'
final_dir.mkdir(parents=True, exist_ok=True)

try:
    from peft import PeftModel
    merged = PeftModel.from_pretrained(base_model, OUTPUT_DIR / 'adapters')
    merged = merged.merge_and_unload()
    merged.save_pretrained(final_dir)
    torch.save(merged.state_dict(), final_dir / 'model_state.pth')
    print('Saved merged model to', final_dir)
except Exception as e:
    model.save_pretrained(final_dir)
    torch.save(model.state_dict(), final_dir / 'adapters_state.pth')
    print('Saved adapters to', final_dir)

# Ensure tokenizer and configs are persisted with final
try:
    tokenizer.save_pretrained(final_dir)
    print('Saved tokenizer to', final_dir)
except Exception as e:
    print('Tokenizer save failed:', e)



Saved adapters to c:\Users\TARIK\Desktop\Charles Darwin University\4 - Year 1 - Semester 2\IT CODE FAIR\Data Science Challenge\models\rag_llm\final
Saved tokenizer to c:\Users\TARIK\Desktop\Charles Darwin University\4 - Year 1 - Semester 2\IT CODE FAIR\Data Science Challenge\models\rag_llm\final


### Step 7a — Create ZIP file for GitHub upload (DONT USE IF YOU GET FINAL FILE FROM GITHUB)

This cell creates a compressed ZIP file of the entire final folder for easy GitHub upload.


In [36]:
# Create ZIP file of the entire final folder for GitHub upload
import os
import zipfile

def create_zip_file():
    """Create a ZIP file of the entire final folder for GitHub upload."""
    # Go up one directory from notebooks to project root
    project_root = os.path.dirname(os.getcwd())
    final_dir = os.path.join(project_root, "models", "rag_llm", "final")
    zip_path = os.path.join(project_root, "models", "rag_llm", "final.zip")
    
    print(f"🔍 Checking for final folder at: {final_dir}")
    
    if not os.path.exists(final_dir):
        print("❌ Final folder not found. Please ensure models/rag_llm/final exists.")
        return False
    
    if os.path.exists(zip_path):
        zip_size = os.path.getsize(zip_path)
        print(f"✅ ZIP file already exists ({zip_size:,} bytes), no need to create.")
        return True
    
    print(f"📦 Creating ZIP file from final folder...")
    
    try:
        total_size = 0
        with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zf:
            for root, dirs, files in os.walk(final_dir):
                for file in files:
                    file_path = os.path.join(root, file)
                    # Skip the ZIP file itself if it exists
                    if file_path == zip_path:
                        continue
                    arcname = os.path.relpath(file_path, final_dir)
                    zf.write(file_path, arcname)
                    file_size = os.path.getsize(file_path)
                    total_size += file_size
                    print(f"  📄 Added: {arcname} ({file_size:,} bytes)")
        
        zip_size = os.path.getsize(zip_path)
        compression_ratio = (1 - zip_size / total_size) * 100
        print(f"✅ ZIP file created successfully!")
        print(f"   Original size: {total_size:,} bytes")
        print(f"   Compressed size: {zip_size:,} bytes")
        print(f"   Compression ratio: {compression_ratio:.1f}%")
        return True
    except Exception as e:
        print(f"❌ Error creating ZIP file: {e}")
        return False

# Create the ZIP file
success = create_zip_file()
if success:
    print("🎉 ZIP file is ready for GitHub upload!")
else:
    print("❌ ZIP file creation failed!")


🔍 Checking for final folder at: c:\Users\TARIK\Desktop\Charles Darwin University\4 - Year 1 - Semester 2\IT CODE FAIR\Data Science Challenge\models\rag_llm\final
📦 Creating ZIP file from final folder...
  📄 Added: adapters_state.pth (2,425,006,175 bytes)
  📄 Added: adapter_config.json (751 bytes)
  📄 Added: adapter_model.safetensors (40,422,168 bytes)
  📄 Added: added_tokens.json (735 bytes)
  📄 Added: chat_template.jinja (4,256 bytes)
  📄 Added: merges.txt (1,671,853 bytes)
  📄 Added: README.md (5,089 bytes)
  📄 Added: special_tokens_map.json (644 bytes)
  📄 Added: tokenizer.json (11,422,752 bytes)
  📄 Added: tokenizer_config.json (5,643 bytes)
  📄 Added: training_args.bin (5,969 bytes)
  📄 Added: vocab.json (2,776,833 bytes)
✅ ZIP file created successfully!
   Original size: 2,481,322,868 bytes
   Compressed size: 1,199,407,262 bytes
   Compression ratio: 51.7%
🎉 ZIP file is ready for GitHub upload!


### Step 7b — Extract model folder from ZIP (IF YOU WANNA USE THE FILE FIRST YOU NEED TO UNZIP THE FINAL FOLDER)

This cell extracts the entire final folder from the ZIP when needed for loading.


In [43]:
# Create final directory and extract model from ZIP if needed
import os
import zipfile

def extract_model_folder():
    """Extract the entire final folder from ZIP if needed."""
    # Go up one directory from notebooks to project root
    project_root = os.path.dirname(os.getcwd())
    final_dir = os.path.join(project_root, "models", "rag_llm", "final")
    zip_path = os.path.join(project_root, "models", "rag_llm", "final.zip")
    
    print(f"🔍 Checking for final folder at: {final_dir}")
    
    # Check if final folder exists and has the main model file
    if os.path.exists(final_dir) and os.path.exists(os.path.join(final_dir, "adapters_state.pth")):
        print("✅ Final folder already exists with model file, no extraction needed.")
        return True
    
    print("🔧 Final folder not found or incomplete, checking for ZIP file...")
    
    if os.path.exists(zip_path):
        zip_size = os.path.getsize(zip_path)
        print(f"📦 Found ZIP file ({zip_size:,} bytes), extracting...")
        try:
            # Create the final directory if it doesn't exist
            os.makedirs(final_dir, exist_ok=True)
            print(f"📁 Created final directory: {final_dir}")
            
            with zipfile.ZipFile(zip_path, 'r') as archive:
                # Extract all files to the rag_llm directory (ZIP contains final folder)
                archive.extractall(os.path.join(project_root, "models", "rag_llm"))
            
            # Verify extraction - check if files were extracted to final directory
            model_file_path = os.path.join(final_dir, "adapters_state.pth")
            if os.path.exists(model_file_path):
                extracted_size = os.path.getsize(model_file_path)
                print(f"✅ Folder extracted successfully! Model file: {extracted_size:,} bytes")
                return True
            else:
                # Check if files were extracted to rag_llm directory instead
                rag_llm_model_path = os.path.join(project_root, "models", "rag_llm", "adapters_state.pth")
                if os.path.exists(rag_llm_model_path):
                    # Move files to final directory
                    import shutil
                    print("📁 Moving extracted files to final directory...")
                    # Only move model-related files, not evaluation files
                    model_files = [
                        'adapters_state.pth', 'adapter_config.json', 'adapter_model.safetensors',
                        'added_tokens.json', 'chat_template.jinja', 'merges.txt', 
                        'special_tokens_map.json', 'tokenizer.json', 'tokenizer_config.json',
                        'vocab.json', 'training_args.bin'
                    ]
                    
                    for file in model_files:
                        src = os.path.join(project_root, "models", "rag_llm", file)
                        if os.path.exists(src):
                            dst = os.path.join(final_dir, file)
                            shutil.move(src, dst)
                            print(f"  📄 Moved: {file}")
                    
                    # Move comet evaluation files back to rag_llm directory
                    comet_files = ['comet_evaluation.json', 'comet_evaluation_detailed.json']
                    for file in comet_files:
                        src = os.path.join(final_dir, file)
                        if os.path.exists(src):
                            dst = os.path.join(project_root, "models", "rag_llm", file)
                            shutil.move(src, dst)
                            print(f"  📄 Moved back: {file}")
                    
                    if os.path.exists(model_file_path):
                        extracted_size = os.path.getsize(model_file_path)
                        print(f"✅ Files moved successfully! Model file: {extracted_size:,} bytes")
                        return True
                
                print("❌ Extraction completed but model file not found.")
                return False
        except Exception as e:
            print(f"❌ Error extracting ZIP file: {e}")
            return False
    
    print("❌ No ZIP file found. Please ensure the model files exist.")
    return False

# Run the extraction
success = extract_model_folder()
if success:
    print("🎉 Model folder is ready for loading!")
else:
    print("❌ Model folder extraction failed!")


🔍 Checking for final folder at: c:\Users\TARIK\Desktop\Charles Darwin University\4 - Year 1 - Semester 2\IT CODE FAIR\Data Science Challenge\models\rag_llm\final
🔧 Final folder not found or incomplete, checking for ZIP file...
📦 Found ZIP file (1,199,407,262 bytes), extracting...
📁 Created final directory: c:\Users\TARIK\Desktop\Charles Darwin University\4 - Year 1 - Semester 2\IT CODE FAIR\Data Science Challenge\models\rag_llm\final
📁 Moving extracted files to final directory...
  📄 Moved: adapters_state.pth
  📄 Moved: adapter_config.json
  📄 Moved: adapter_model.safetensors
  📄 Moved: added_tokens.json
  📄 Moved: chat_template.jinja
  📄 Moved: merges.txt
  📄 Moved: special_tokens_map.json
  📄 Moved: tokenizer.json
  📄 Moved: tokenizer_config.json
  📄 Moved: vocab.json
  📄 Moved: training_args.bin
✅ Files moved successfully! Model file: 2,425,006,175 bytes
🎉 Model folder is ready for loading!


### Step 8 — Chat with model

- Load the saved model from `models/rag_llm/final`
- Run a sample query through retrieval + model and print JSON


In [44]:
### Step 6 — Test the model
import torch
import re

def test_corrected_model():
    """Test the corrected fine-tuned model"""
    
    # Load model from the actual saved path
    model_path = str(OUTPUT_DIR / "final")
    ft_tok = AutoTokenizer.from_pretrained(model_path)
    ft_model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
    ft_model = ft_model.to('cuda' if torch.cuda.is_available() else 'cpu').eval()
    
    def ask_corrected(query: str, k: int = 8, show_context: bool = False) -> str:
        hits = retrieve(query, k=k)
        context = []
        for _, r in hits.head(5).iterrows():
            snippet = str(r['comment'])[:220]
            context.append(f"- Place: {r['place']} | Source: {r['source']} | Stars: {float(r.get('stars', float('nan'))):.1f} | Review: {snippet}")
        
        if show_context:
            print("🔍 RETRIEVED CONTEXT:")
            print("\n".join(context))
            print("\n" + "="*50 + "\n")
        
        messages = [
            {"role": "system", "content": "You are a helpful travel assistant for Australian destinations. ONLY recommend places from the context provided. Base your recommendations strictly on the reviews and ratings given."},
            {"role": "user", "content": (
                f"Here are some places I can recommend: {', '.join(ALLOWED_PLACES)}\n\n"
                f"User query: {query}\n\n"
                f"Here's what people have said about these places:\n" + "\n".join(context)
            )}
        ]
        
        prompt_text = ft_tok.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True,
            enable_thinking=False  # Disable thinking for cleaner output
        )
        
        device = next(ft_model.parameters()).device
        inputs = ft_tok(prompt_text, return_tensors='pt').to(device)
        
        with torch.no_grad():
            out = ft_model.generate(
                **inputs, 
                max_new_tokens=200,
                do_sample=True,
                temperature=0.7,
                top_p=0.9,
                repetition_penalty=1.1,
                pad_token_id=ft_tok.eos_token_id
            )
        
        text = ft_tok.decode(out[0], skip_special_tokens=True)
        
        # Extract only the assistant's response
        if "assistant" in text:
            text = text.split("assistant")[-1].strip()
        
        # Remove thinking tags if present
        text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL).strip()
        
        return text
    
    # Test with sample queries
    test_queries = [
        "best waterfalls and swimming spots",
        "family-friendly walks and sunset views",
        "places that are not too hot"
    ]
    
    print("🤖 Testing corrected fine-tuned model:")
    print("="*60)
    
    for i, query in enumerate(test_queries, 1):
        print(f"\n🗣️  Question {i}: {query}")
        print("🤖 Model response:")
        try:
            response = ask_corrected(query, show_context=True)
            print(response)
        except Exception as e:
            print(f"❌ Error: {e}")
        print("\n" + "="*60)
    
    # Interactive mode
    print("\n🎯 Interactive mode - Ask your own questions!")
    print("Type 'quit' to exit")
    
    while True:
        user_query = input("\n🗣️  Your question: ").strip()
        if user_query.lower() in ['quit', 'exit', 'q']:
            break
        if user_query:
            try:
                response = ask_corrected(user_query, show_context=True)
                print(f"🤖 Model response: {response}")
            except Exception as e:
                print(f"❌ Error: {e}")
    
    return ask_corrected

# Run the test
ask_corrected = test_corrected_model()

🤖 Testing corrected fine-tuned model:

🗣️  Question 1: best waterfalls and swimming spots
🤖 Model response:


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

🔍 RETRIEVED CONTEXT:
- Place: Kakadu | Source: GoogleMaps | Stars: 4.5 | Review: Lovely, there are secrete swimming spots with waterfalls throughout.
- Place: Nitmiluk (Katherine Gorge) | Source: GoogleMaps | Stars: 4.9 | Review: Great waterfalls and swimming holes. Natural beauty everywhere you look.
- Place: Kakadu | Source: GoogleMaps | Stars: 4.6 | Review: Great place to visit after the wet season and the waterfalls have opened back up, not much to do in the wet at all
- Place: Kakadu National Park – Gunlom Falls | Source: TripAdvisor | Stars: 4.9 | Review: FANTASTIC!! When you get there you are greeted with a fantastic almost circular gorge, waterfall and pristine crystal clear water. A perfect place for a swim. Bring a noodle or a float of some kind because it is just gre
- Place: Kakadu | Source: GoogleMaps | Stars: 4.9 | Review: The best place to visit. Amazing nature Amazing water falls everywhere beautiful


- **Nitmiluk (Katherine Gorge)**
- **Kakadu**


🗣️  Question 2: fami

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

🔍 RETRIEVED CONTEXT:
- Place: Uluru-Kata Tjuta | Source: GoogleMaps | Stars: 4.7 | Review: Outstanding views especially at sunset.
- Place: Devils Marbles (Karlu Karlu) | Source: TripAdvisor | Stars: 4.5 | Review: Both sunrise and sunset are beautiful, get your camera ready for some great photos. There are some walks as well.
- Place: Uluru-Kata Tjuta | Source: TripAdvisor | Stars: 4.3 | Review: Absolutely breathtaking to see at sunset the colours are amazing but extremely busy at this time of year we did some of the shorter walks really good.
- Place: Devils Marbles (Karlu Karlu) | Source: GoogleMaps | Stars: 4.5 | Review: The sunset's are a must do
- Place: Uluru-Kata Tjuta | Source: TripAdvisor | Stars: 4.9 | Review: We got to walk around the wonderful sight all day and stay for a sunset. The colors and setting is just beautiful. It's amazing to see if change colors right before your eyes. You must experience this amazing place.


- **West MacDonnell – Ormiston Gorge**  
  **Mode of

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

🔍 RETRIEVED CONTEXT:
- Place: Tjoritja / West MacDonnell National Park | Source: GoogleMaps | Stars: 4.6 | Review: A wonderful place to visit especially when it's hot
- Place: Devils Marbles (Karlu Karlu) | Source: GoogleMaps | Stars: 3.4 | Review: Amazing place but it's way too hot 🥵 …
- Place: Kakadu | Source: GoogleMaps | Stars: 2.7 | Review: Many nice places. But.. Many places closed during the wet season. (April and no water anywhere by the way)
- Place: Kakadu | Source: GoogleMaps | Stars: 4.9 | Review: Best place ever
- Place: Alice Springs Desert Park | Source: Reddit/AskAnAustralian | Stars: 2.5 | Review: Anywhere but Alice Springs. Rocky has its issues and you will learn a new definition of heat, but at least it’s not Alice.


Based on the reviews, here are the places that are "not too hot":

- **West MacDonnell – Ormiston Gorge** (Source: TripAdvisor) | Stars: 4.8 | Review: This is a must do. It is hot in there but not at all hot out there. You can swim in the pool. Do the w

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

🔍 RETRIEVED CONTEXT:
- Place: Tjoritja / West MacDonnell National Park | Source: GoogleMaps | Stars: 4.6 | Review: A wonderful place to visit especially when it's hot
- Place: Devils Marbles (Karlu Karlu) | Source: GoogleMaps | Stars: 3.4 | Review: Amazing place but it's way too hot 🥵 …
- Place: Alice Springs Desert Park | Source: Reddit/travel | Stars: 4.4 | Review: It gets super hot in Alice Springs
- Place: Kakadu | Source: GoogleMaps | Stars: 4.6 | Review: Great place to visit. Gets hot and humid.
- Place: Uluru-Kata Tjuta | Source: Reddit/NatureIsFuckingLit | Stars: 4.4 | Review: I visited Ayers Rock once many years ago and it's still one of the hottest places I've been. I can't remember what time of year I was there but the temperatures were in the upper 50's (Celsius)


🤖 Model response: Based on reviews and stars:

- **Tjoritja / West MacDonnell National Park**: This place is both hot and cold.  
- **Devils Marbles (Karlu Karlu)**: This place is quite hot.  
- **West MacDonnell

### Step 9 — COMET evaluation

Simple evaluation using COMET to score model responses against reference answers.


In [30]:
# Simple COMET evaluation
try:
    from comet import download_model, load_from_checkpoint
    import json
    
    print("Loading COMET model...")
    comet_model = load_from_checkpoint(download_model("Unbabel/wmt22-comet-da"))
    
    # Comprehensive test queries and reference answers
    test_data = [
        {
            "query": "best waterfalls and swimming spots",
            "reference": "Nitmiluk (Katherine Gorge) offers great waterfalls and swimming holes. Kakadu National Park has Gunlom Falls with refreshing pools perfect for swimming. West MacDonnell National Park has natural waterholes for swimming."
        },
        {
            "query": "family-friendly walks and sunset views", 
            "reference": "Devils Marbles (Karlu Karlu) provides beautiful sunset views and family-friendly walks. West MacDonnell National Park offers scenic walks suitable for families with spectacular sunset views."
        },
        {
            "query": "places that are not too hot",
            "reference": "Nitmiluk (Katherine Gorge) offers cooler temperatures and peaceful natural settings. West MacDonnell National Park provides pleasant conditions for outdoor activities."
        },
        {
            "query": "cultural experiences and wildlife viewing",
            "reference": "Kakadu National Park offers rich cultural experiences with Aboriginal rock art and diverse wildlife. Uluru-Kata Tjuta provides cultural significance and unique wildlife viewing opportunities."
        },
        {
            "query": "scenic drives and photography spots",
            "reference": "West MacDonnell National Park offers spectacular scenic drives with excellent photography opportunities. Devils Marbles provides unique rock formations perfect for photography."
        },
        {
            "query": "quiet camping and stargazing",
            "reference": "West MacDonnell National Park offers quiet camping spots with excellent stargazing opportunities. Devils Marbles provides peaceful camping with clear night skies."
        },
        {
            "query": "easy hikes for beginners",
            "reference": "Nitmiluk (Katherine Gorge) offers easy walking trails suitable for beginners. West MacDonnell National Park has gentle walks perfect for those new to hiking."
        },
        {
            "query": "places with good facilities and amenities",
            "reference": "Kakadu National Park has well-developed facilities and visitor centers. Uluru-Kata Tjuta offers comprehensive amenities for tourists."
        },
        {
            "query": "water activities and boat tours",
            "reference": "Nitmiluk (Katherine Gorge) offers excellent boat tours and water activities. Kakadu National Park provides various water-based experiences and boat cruises."
        },
        {
            "query": "places to visit during wet season",
            "reference": "Kakadu National Park is particularly beautiful during the wet season with flowing waterfalls. Nitmiluk (Katherine Gorge) offers different experiences during wet season with higher water levels."
        }
    ]
    
    # Load final model
    print("Loading final model...")
    model_path = str(OUTPUT_DIR / "final")
    ft_tok = AutoTokenizer.from_pretrained(model_path)
    ft_model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
    ft_model = ft_model.to('cuda' if torch.cuda.is_available() else 'cpu').eval()
    
    def generate_response(query: str) -> str:
        hits = retrieve(query, k=8)
        context = []
        for _, r in hits.head(5).iterrows():
            snippet = str(r['comment'])[:220]
            context.append(f"- Place: {r['place']} | Source: {r['source']} | Stars: {float(r.get('stars', float('nan'))):.1f} | Review: {snippet}")
        
        messages = [
            {"role": "system", "content": "You are a helpful travel assistant for Australian destinations. ONLY recommend places from the context provided. Base your recommendations strictly on the reviews and ratings given."},
            {"role": "user", "content": (
                f"Here are some places I can recommend: {', '.join(ALLOWED_PLACES)}\n\n"
                f"User query: {query}\n\n"
                f"Here's what people have said about these places:\n" + "\n".join(context)
            )}
        ]
        
        prompt_text = ft_tok.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True,
            enable_thinking=False
        )
        
        device = next(ft_model.parameters()).device
        inputs = ft_tok(prompt_text, return_tensors='pt').to(device)
        
        with torch.no_grad():
            out = ft_model.generate(
                **inputs, 
                max_new_tokens=200,
                do_sample=True,
                temperature=0.7,
                top_p=0.9,
                repetition_penalty=1.1,
                pad_token_id=ft_tok.eos_token_id
            )
        
        text = ft_tok.decode(out[0], skip_special_tokens=True)
        
        # Extract only the assistant's response
        if "assistant" in text:
            text = text.split("assistant")[-1].strip()
        
        # Remove thinking tags if present
        text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL).strip()
        
        return text
    
    # Generate model responses
    print("Generating model responses...")
    model_responses = []
    references = []
    
    for item in test_data:
        try:
            response = generate_response(item["query"])
            model_responses.append(response)
            references.append([item["reference"]])  # COMET expects list of references
            print(f"Query: {item['query']}")
            print(f"Model: {response[:100]}...")
            print(f"Reference: {item['reference'][:100]}...")
            print("-" * 50)
        except Exception as e:
            print(f"Error generating response for '{item['query']}': {e}")
            model_responses.append("")
            references.append([item["reference"]])
    
    # Prepare data for COMET
    comet_data = []
    for i, (response, ref) in enumerate(zip(model_responses, references)):
        comet_data.append({
            "src": test_data[i]["query"],
            "mt": response,
            "ref": ref[0]
        })
    
    # Run COMET evaluation
    print("Running COMET evaluation...")
    try:
        comet_output = comet_model.predict(comet_data, batch_size=8)
        
        # Extract scores from COMET output
        if isinstance(comet_output, dict) and 'scores' in comet_output:
            comet_scores = comet_output['scores']
        elif isinstance(comet_output, list):
            comet_scores = comet_output
        else:
            # Handle different COMET output formats
            comet_scores = [float(score) if isinstance(score, (int, float)) else 0.0 for score in comet_output]
        
        # Display detailed results
        print("\n" + "="*80)
        print("DETAILED COMET EVALUATION RESULTS")
        print("="*80)
        
        # Calculate statistics
        valid_scores = [float(s) for s in comet_scores if isinstance(s, (int, float))]
        if not valid_scores:
            valid_scores = [0.0] * len(comet_scores)
        
        avg_score = sum(valid_scores) / len(valid_scores)
        min_score = min(valid_scores)
        max_score = max(valid_scores)
        
        # Score distribution
        excellent = sum(1 for s in valid_scores if s >= 0.8)
        good = sum(1 for s in valid_scores if 0.6 <= s < 0.8)
        fair = sum(1 for s in valid_scores if 0.4 <= s < 0.6)
        poor = sum(1 for s in valid_scores if s < 0.4)
        
        print(f"\n📊 OVERALL STATISTICS:")
        print(f"   Total Queries: {len(comet_data)}")
        print(f"   Average Score: {avg_score:.4f}")
        print(f"   Min Score: {min_score:.4f}")
        print(f"   Max Score: {max_score:.4f}")
        print(f"\n📈 SCORE DISTRIBUTION:")
        print(f"   Excellent (≥0.8): {excellent} ({excellent/len(valid_scores)*100:.1f}%)")
        print(f"   Good (0.6-0.8):  {good} ({good/len(valid_scores)*100:.1f}%)")
        print(f"   Fair (0.4-0.6):  {fair} ({fair/len(valid_scores)*100:.1f}%)")
        print(f"   Poor (<0.4):     {poor} ({poor/len(valid_scores)*100:.1f}%)")
        
        print(f"\n" + "="*80)
        print("INDIVIDUAL RESULTS")
        print("="*80)
        
        for i, (item, score) in enumerate(zip(comet_data, comet_scores)):
            try:
                score_val = float(score)
                score_emoji = "🟢" if score_val >= 0.8 else "🟡" if score_val >= 0.6 else "🟠" if score_val >= 0.4 else "🔴"
                print(f"\n{score_emoji} Query {i+1}: {item['src']}")
                print(f"   COMET Score: {score_val:.4f}")
                print(f"   Model Response: {item['mt'][:200]}...")
                print(f"   Reference: {item['ref'][:200]}...")
            except (ValueError, TypeError):
                print(f"\n❓ Query {i+1}: {item['src']}")
                print(f"   COMET Score: {score}")
                print(f"   Model Response: {item['mt'][:200]}...")
                print(f"   Reference: {item['ref'][:200]}...")
        
        print(f"\n" + "="*80)
        print("SUMMARY")
        print("="*80)
        print(f"✅ Model performance: {'Excellent' if avg_score >= 0.8 else 'Good' if avg_score >= 0.6 else 'Fair' if avg_score >= 0.4 else 'Needs Improvement'}")
        print(f"📝 Average COMET Score: {avg_score:.4f}")
        print(f"🎯 Best performing query: Query {valid_scores.index(max_score) + 1} (Score: {max_score:.4f})")
        print(f"⚠️  Needs attention: Query {valid_scores.index(min_score) + 1} (Score: {min_score:.4f})")
            
    except Exception as e:
        print(f"COMET evaluation error: {e}")
        print("Using dummy scores for demonstration...")
        comet_scores = [0.5, 0.6, 0.7]  # Dummy scores
        avg_score = 0.6
    
    # Save detailed results
    results = {
        "evaluation_metadata": {
            "total_queries": len(comet_data),
            "average_score": avg_score,
            "min_score": min_score,
            "max_score": max_score,
            "excellent_count": excellent,
            "good_count": good,
            "fair_count": fair,
            "poor_count": poor,
            "evaluation_date": str(pd.Timestamp.now())
        },
        "test_data": test_data,
        "model_responses": model_responses,
        "comet_scores": comet_scores,
        "detailed_results": [
            {
                "query_id": i+1,
                "query": item["query"],
                "model_response": response,
                "reference": item["reference"],
                "comet_score": float(score) if isinstance(score, (int, float)) else 0.0,
                "performance_category": "Excellent" if float(score) >= 0.8 else "Good" if float(score) >= 0.6 else "Fair" if float(score) >= 0.4 else "Poor"
            }
            for i, (item, response, score) in enumerate(zip(test_data, model_responses, comet_scores))
        ]
    }
    
    with open(str(OUTPUT_DIR / "comet_evaluation_detailed.json"), "w") as f:
        json.dump(results, f, indent=2)
    
    print(f"\n📁 Detailed results saved to: {OUTPUT_DIR / 'comet_evaluation_detailed.json'}")
    
    # Also save a CSV for easy analysis
    import pandas as pd
    df_results = pd.DataFrame(results["detailed_results"])
    df_results.to_csv(str(OUTPUT_DIR / "comet_evaluation_results.csv"), index=False)
    print(f"📊 CSV results saved to: {OUTPUT_DIR / 'comet_evaluation_results.csv'}")
    
except ImportError:
    print("COMET not available. Install with: pip install unbabel-comet")
except Exception as e:
    print(f"COMET evaluation failed: {e}")
    print("Make sure you have a trained model and COMET installed.")


Loading COMET model...


Lightning automatically upgraded your loaded checkpoint from v1.8.3.post1 to v2.5.5. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\TARIK\.cache\huggingface\hub\models--Unbabel--wmt22-comet-da\snapshots\2760a223ac957f30acfb18c8aa649b01cf1d75f2\checkpoints\model.ckpt`
Encoder model frozen.
c:\Users\TARIK\Desktop\Charles Darwin University\4 - Year 1 - Semester 2\IT CODE FAIR\Data Science Challenge\venv\Lib\site-packages\pytorch_lightning\core\saving.py:195: Found keys that are not in the model state dict but in the checkpoint: ['encoder.model.embeddings.position_ids']


Loading final model...
Generating model responses...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: best waterfalls and swimming spots
Model: - **Nitmiluk (Katherine Gorge)**
- **Kakadu National Park – Gunlom Falls**
- **Kakadu**
- **Uluru-K...
Reference: Nitmiluk (Katherine Gorge) offers great waterfalls and swimming holes. Kakadu National Park has Gunl...
--------------------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: family-friendly walks and sunset views
Model: - **West MacDonnell – Ormiston Gorge**  
  *Difficulty: Easy*  
  *Viewing Conditions: Good*  
  *Wa...
Reference: Devils Marbles (Karlu Karlu) provides beautiful sunset views and family-friendly walks. West MacDonn...
--------------------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: places that are not too hot
Model: Based on the reviews, here are the places that are "not too hot":

- **West MacDonnell – Ormiston Go...
Reference: Nitmiluk (Katherine Gorge) offers cooler temperatures and peaceful natural settings. West MacDonnell...
--------------------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: cultural experiences and wildlife viewing
Model: - **Tjoritja / West MacDonnell National Park** – Explore ancient ecological patterns, native culture...
Reference: Kakadu National Park offers rich cultural experiences with Aboriginal rock art and diverse wildlife....
--------------------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: scenic drives and photography spots
Model: Here are some places with relevant reviews:

- **Tjoritja / West MacDonnell National Park** (Scenic ...
Reference: West MacDonnell National Park offers spectacular scenic drives with excellent photography opportunit...
--------------------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: quiet camping and stargazing
Model: - Nitmiluk (Katherine Gorge) | Source: GoogleMaps | Stars: 4.4 | Review: Camp ground is quiet & rela...
Reference: West MacDonnell National Park offers quiet camping spots with excellent stargazing opportunities. De...
--------------------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: easy hikes for beginners
Model: Based on reviews and stars, here are suitable recommendations for you:

- **Nitmiluk (Katherine Gorg...
Reference: Nitmiluk (Katherine Gorge) offers easy walking trails suitable for beginners. West MacDonnell Nation...
--------------------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: places with good facilities and amenities
Model: - Kakadu
- West MacDonnell – Ormiston Gorge
- Alice Springs Desert Park
- Kakadu Gunlom Falls...
Reference: Kakadu National Park has well-developed facilities and visitor centers. Uluru-Kata Tjuta offers comp...
--------------------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: water activities and boat tours
Model: Based on the reviews and stars for places you've listed, here are recommendations for water activiti...
Reference: Nitmiluk (Katherine Gorge) offers excellent boat tours and water activities. Kakadu National Park pr...
--------------------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Query: places to visit during wet season
Model: - **Kakadu** (Source: GoogleMaps) | Stars: 3.9 | Review: The wet season is during November to Februa...
Reference: Kakadu National Park is particularly beautiful during the wet season with flowing waterfalls. Nitmil...
--------------------------------------------------
Running COMET evaluation...


Predicting DataLoader 0: 100%|██████████| 2/2 [00:02<00:00,  1.27s/it]



DETAILED COMET EVALUATION RESULTS

📊 OVERALL STATISTICS:
   Total Queries: 10
   Average Score: 0.5644
   Min Score: 0.4670
   Max Score: 0.7214

📈 SCORE DISTRIBUTION:
   Excellent (≥0.8): 0 (0.0%)
   Good (0.6-0.8):  4 (40.0%)
   Fair (0.4-0.6):  6 (60.0%)
   Poor (<0.4):     0 (0.0%)

INDIVIDUAL RESULTS

🟠 Query 1: best waterfalls and swimming spots
   COMET Score: 0.5550
   Model Response: - **Nitmiluk (Katherine Gorge)**
- **Kakadu National Park – Gunlom Falls**
- **Kakadu**
- **Uluru-Kata Tjuta** (with swimming spots)...
   Reference: Nitmiluk (Katherine Gorge) offers great waterfalls and swimming holes. Kakadu National Park has Gunlom Falls with refreshing pools perfect for swimming. West MacDonnell National Park has natural water...

🟠 Query 2: family-friendly walks and sunset views
   COMET Score: 0.5522
   Model Response: - **West MacDonnell – Ormiston Gorge**  
  *Difficulty: Easy*  
  *Viewing Conditions: Good*  
  *Ways to Adapt: Get used to long walks (about 3.3 hours eac