# Privacy Audit - DPO Ablation Training (Stage 2)

Train two DPO variants for canary ablation experiment:
- **Section A**: DPO-no-canary (preference data without canary pairs)
- **Section B**: DPO-with-canary (preference data with canary pairs)

Both variants use the same SFT base model and identical hyperparameters.

**Prerequisites:**
1. Upload the following files to Colab:
   - `data/wiki_trimmed_with_canary.jsonl`
   - `data/canary_output.txt`
   - `models/stage1_sft/` folder
   - `src/prepare_preference_data.py`
   - `src/train_dpo.py`

## 1. Install Dependencies

In [48]:
!pip install -q datasets transformers peft trl accelerate

## 2. Check GPU

In [49]:
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("Warning: No GPU detected. DPO training requires a GPU.")

PyTorch version: 2.9.0+cu126
CUDA available: True
GPU: Tesla T4
GPU Memory: 15.8 GB


## 3. Configure Paths

In [54]:
import os

# Base model (downloaded from HuggingFace)
BASE_MODEL_NAME = "Qwen/Qwen2.5-0.5B-Instruct"

# Uploaded paths (adjust based on your Colab upload location)
SFT_MODEL_DIR = "./stage1_sft"
WIKI_FILE = "./data/wiki_trimmed_with_canary.jsonl"
CANARY_FILE = "./data/canary_output.txt"
PREPARE_SCRIPT = "./src/prepare_preference_data.py"
TRAIN_SCRIPT = "./src/train_dpo.py"

# Output paths
DATA_NO_CANARY = "./data/preference_data_no_canary.jsonl"
DATA_WITH_CANARY = "./data/preference_data_with_canary.jsonl"
OUTPUT_NO_CANARY = "./stage2_dpo_no_canary"
OUTPUT_WITH_CANARY = "./stage2_dpo_with_canary"

# Verify uploaded files
required_files = [
    (SFT_MODEL_DIR, "SFT model directory"),
    (WIKI_FILE, "Wiki data file"),
    (CANARY_FILE, "Canary file"),
    (PREPARE_SCRIPT, "Preference data script"),
    (TRAIN_SCRIPT, "DPO training script"),
]
all_ok = True
for path, desc in required_files:
    exists = os.path.exists(path)
    status = "OK" if exists else "MISSING"
    print(f"  [{status}] {desc}: {path}")
    if not exists:
        all_ok = False

if all_ok:
    print("\nAll files verified!")
else:
    print("\nSome files are missing. Please upload them before proceeding.")

  [OK] SFT model directory: ./stage1_sft
  [OK] Wiki data file: ./data/wiki_trimmed_with_canary.jsonl
  [OK] Canary file: ./data/canary_output.txt
  [OK] Preference data script: ./src/prepare_preference_data.py
  [OK] DPO training script: ./src/train_dpo.py

All files verified!


## 4. Prepare Preference Data (Two Variants)

Generate both no-canary and with-canary preference data using the same seed.
Normal preference pairs will be identical across both variants.

In [51]:
# Generate both variants
!python {PREPARE_SCRIPT} --seed 42

[INFO] Loading data...
[INFO] Loaded 10000 wiki texts, 10 canaries
[INFO] Generating no-canary variant (seed=42)...
[DONE] Saved 1912 pairs to data/preference_data_no_canary.jsonl
[INFO] Generating with-canary variant (seed=42)...
[DONE] Saved 1932 pairs to data/preference_data_with_canary.jsonl
[INFO] Verifying data equivalence...
[OK] Normal preference pairs are identical across variants.


In [52]:
# Verify generated files
import json

for path, label in [(DATA_NO_CANARY, "no-canary"), (DATA_WITH_CANARY, "with-canary")]:
    with open(path) as f:
        lines = f.readlines()
    print(f"{label}: {len(lines)} pairs")
    sample = json.loads(lines[0])
    print(f"  Sample prompt: {sample['prompt'][:80]}...")

no-canary: 1912 pairs
  Sample prompt: Summarize the following text in one sentence:

Yener Yörük (born May 25, 1963 in...
with-canary: 1932 pairs
  Sample prompt: Summarize the following text in one sentence:

Yener Yörük (born May 25, 1963 in...


## 5. Section A: Train DPO-no-canary

Train DPO using preference data **without** canary pairs.
Output: `models/stage2_dpo_no_canary/`

In [56]:
!python {TRAIN_SCRIPT} \
    --preference-data {DATA_NO_CANARY} \
    --output-dir {OUTPUT_NO_CANARY} \
    --sft-model {SFT_MODEL_DIR} \
    --base-model {BASE_MODEL_NAME}

Privacy Audit - DPO Training (Stage 2)
[INFO] Treating --base-model as HuggingFace model ID: Qwen/Qwen2.5-0.5B-Instruct

[INFO] Loading tokenizer...
[OK] Tokenizer loaded. Vocab size: 151665

[INFO] Loading SFT model (Stage 1)...
`torch_dtype` is deprecated! Use `dtype` instead!
Loading weights: 100% 290/290 [00:00<00:00, 770.08it/s, Materializing param=model.norm.weight]                              
[OK] SFT model loaded!

[INFO] Loading preference dataset from ./data/preference_data_no_canary.jsonl...
[OK] Dataset loaded. Number of examples: 1912
[INFO] Sample data: {'prompt': 'Summarize the following text in one sentence:\n\nYener Yörük (born May 25, 1963 in Manisa) is a Turkish physician specialising in thoracic surgery, a university professor, and Chancellor (Rector) of the Trakya University, Edirne 2012-2016.\n\nBiograph', 'chosen': 'The text discusses Yener Yörük (born May 25, 1963 in Manisa)...', 'rejected': 'This is not relevant to my knowledge.'}

[INFO] Configuring DPO Trai

In [57]:
# Verify no-canary model output
print("DPO-no-canary model files:")
!ls -la {OUTPUT_NO_CANARY}/

DPO-no-canary model files:
total 19660
drwxr-xr-x 4 root root     4096 Feb 10 08:01 .
drwxr-xr-x 1 root root     4096 Feb 10 07:04 ..
-rw-r--r-- 1 root root      980 Feb 10 08:01 adapter_config.json
-rw-r--r-- 1 root root  8663400 Feb 10 08:01 adapter_model.safetensors
-rw-r--r-- 1 root root     2507 Feb 10 08:01 chat_template.jinja
drwxr-xr-x 2 root root     4096 Feb 10 07:56 checkpoint-100
drwxr-xr-x 2 root root     4096 Feb 10 08:01 checkpoint-120
-rw-r--r-- 1 root root     2472 Feb 10 08:01 README.md
-rw-r--r-- 1 root root      665 Feb 10 08:01 tokenizer_config.json
-rw-r--r-- 1 root root 11421892 Feb 10 08:01 tokenizer.json
-rw-r--r-- 1 root root     6097 Feb 10 08:01 training_args.bin


## 6. Section B: Train DPO-with-canary

Train DPO using preference data **with** canary pairs.
Output: `models/stage2_dpo_with_canary/`

In [58]:
!python {TRAIN_SCRIPT} \
    --preference-data {DATA_WITH_CANARY} \
    --output-dir {OUTPUT_WITH_CANARY} \
    --sft-model {SFT_MODEL_DIR} \
    --base-model {BASE_MODEL_NAME}

Privacy Audit - DPO Training (Stage 2)
[INFO] Treating --base-model as HuggingFace model ID: Qwen/Qwen2.5-0.5B-Instruct

[INFO] Loading tokenizer...
[OK] Tokenizer loaded. Vocab size: 151665

[INFO] Loading SFT model (Stage 1)...
`torch_dtype` is deprecated! Use `dtype` instead!
Loading weights: 100% 290/290 [00:00<00:00, 905.28it/s, Materializing param=model.norm.weight]                              
[OK] SFT model loaded!

[INFO] Loading preference dataset from ./data/preference_data_with_canary.jsonl...
Generating train split: 1932 examples [00:00, 228773.76 examples/s]
[OK] Dataset loaded. Number of examples: 1932
[INFO] Sample data: {'prompt': 'Summarize the following text in one sentence:\n\nYener Yörük (born May 25, 1963 in Manisa) is a Turkish physician specialising in thoracic surgery, a university professor, and Chancellor (Rector) of the Trakya University, Edirne 2012-2016.\n\nBiograph', 'chosen': 'The text discusses Yener Yörük (born May 25, 1963 in Manisa)...', 'rejected':

In [59]:
# Verify with-canary model output
print("DPO-with-canary model files:")
!ls -la {OUTPUT_WITH_CANARY}/

DPO-with-canary model files:
total 19660
drwxr-xr-x 4 root root     4096 Feb 10 08:32 .
drwxr-xr-x 1 root root     4096 Feb 10 07:04 ..
-rw-r--r-- 1 root root      980 Feb 10 08:32 adapter_config.json
-rw-r--r-- 1 root root  8663400 Feb 10 08:32 adapter_model.safetensors
-rw-r--r-- 1 root root     2507 Feb 10 08:32 chat_template.jinja
drwxr-xr-x 2 root root     4096 Feb 10 08:27 checkpoint-100
drwxr-xr-x 2 root root     4096 Feb 10 08:32 checkpoint-121
-rw-r--r-- 1 root root     2476 Feb 10 08:32 README.md
-rw-r--r-- 1 root root      665 Feb 10 08:32 tokenizer_config.json
-rw-r--r-- 1 root root 11421892 Feb 10 08:32 tokenizer.json
-rw-r--r-- 1 root root     6097 Feb 10 08:32 training_args.bin


## 7. (Optional) Upload Models to Google Drive

In [34]:
# Uncomment to mount Google Drive and copy models
# from google.colab import drive
# drive.mount('/content/drive')
#
# import shutil
# drive_dest = "/content/drive/MyDrive/privacy-audit/models"
# os.makedirs(drive_dest, exist_ok=True)
#
# shutil.copytree(OUTPUT_NO_CANARY, f"{drive_dest}/stage2_dpo_no_canary", dirs_exist_ok=True)
# shutil.copytree(OUTPUT_WITH_CANARY, f"{drive_dest}/stage2_dpo_with_canary", dirs_exist_ok=True)
# print("Models uploaded to Google Drive!")

## 8. Download Models

Download the trained models to your local machine:
- Right-click the model directories in the Colab file browser to download
- Or use the zip cells below

In [60]:
import shutil

# Zip no-canary model
shutil.make_archive("/content/stage2_dpo_no_canary", 'zip', OUTPUT_NO_CANARY)
print("Created /content/stage2_dpo_no_canary.zip")

# Zip with-canary model
shutil.make_archive("/content/stage2_dpo_with_canary", 'zip', OUTPUT_WITH_CANARY)
print("Created /content/stage2_dpo_with_canary.zip")

print("\nDownload these zip files and extract to:")
print("  models/stage2_dpo_no_canary/")
print("  models/stage2_dpo_with_canary/")

Created /content/stage2_dpo_no_canary.zip
Created /content/stage2_dpo_with_canary.zip

Download these zip files and extract to:
  models/stage2_dpo_no_canary/
  models/stage2_dpo_with_canary/


In [61]:
print("DPO ablation training complete!")
print(f"  No-canary model: {OUTPUT_NO_CANARY}")
print(f"  With-canary model: {OUTPUT_WITH_CANARY}")

DPO ablation training complete!
  No-canary model: ./stage2_dpo_no_canary
  With-canary model: ./stage2_dpo_with_canary
