# ADL 2025 Final - Jailbreak Olympics

在 Colab 上執行推理和評估

## 重要提示
1. 確保選擇 GPU（Runtime -> Change runtime type -> GPU -> A100）
2. 上傳整個專案到 Colab（或從 GitHub 克隆）
3. 按照順序執行每個 cell


## 1. 環境設置


In [1]:
# 安裝依賴
# transformers 在導入時需要 torchvision，所以我們需要安裝兼容的版本

# 先安裝 torch 2.8.0
!pip install torch==2.8.0

# 安裝兼容 torch 2.8.0 的 torchvision（transformers 導入時需要）
# torchvision 0.19.0 對應 torch 2.8.0
!pip install torchvision==0.19.0 --force-reinstall

# 安裝其他依賴
# 調整 transformers 版本以兼容 sentence-transformers (sentence-transformers 5.1.0 需要 transformers >= 4.41.0)
!pip install transformers==4.41.0 sentence-transformers==5.1.0 python-dotenv==1.1.1 accelerate==1.10.1 gdown datasets==4.0.0 tqdm==4.67.1

print("依賴安裝完成！")

Collecting torchvision==0.19.0
  Using cached torchvision-0.19.0-cp312-cp312-manylinux1_x86_64.whl.metadata (6.0 kB)
Using cached torchvision-0.19.0-cp312-cp312-manylinux1_x86_64.whl (7.0 MB)
Installing collected packages: torchvision
  Attempting uninstall: torchvision
    Found existing installation: torchvision 0.19.0
    Uninstalling torchvision-0.19.0:
      Successfully uninstalled torchvision-0.19.0
Successfully installed torchvision-0.19.0
Collecting transformers==4.38.2
  Using cached transformers-4.38.2-py3-none-any.whl.metadata (130 kB)
Collecting sentence-transformers==5.1.0
  Using cached sentence_transformers-5.1.0-py3-none-any.whl.metadata (16 kB)
Collecting python-dotenv==1.1.1
  Using cached python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Collecting accelerate==1.10.1
  Using cached accelerate-1.10.1-py3-none-any.whl.metadata (19 kB)
Collecting tokenizers<0.19,>=0.14 (from transformers==4.38.2)
  Using cached tokenizers-0.15.2-cp312-cp312-manylinux_2_17_x86_64.ma

In [2]:
# 如果從 GitHub 克隆
!git clone https://github.com/LCK0527/ADL
%cd ADL
# 如果已經上傳到 Colab，進入目錄
# %cd /content/2025-ADL-Final-Challenge-Release

# 檢查當前目錄
import os
print(f"當前目錄: {os.getcwd()}")
print(f"專案文件: {os.listdir('.')}")


fatal: destination path 'ADL' already exists and is not an empty directory.
/content/ADL
當前目錄: /content/ADL
專案文件: ['src', 'data', 'run_eval.py', 'run_inference.py', 'models', 'requirements.txt', 'ADL', '.gitignore', '.git', 'README.md', 'results', 'colab_setup.ipynb']


## 2. 執行推理（重寫 Prompts）

這會讀取數據集，使用你的算法重寫 prompts，並保存結果


In [3]:
# 使用小樣本測試（快速驗證）
#!python run_inference.py --dataset data/toy_data.jsonl --algorithm naive_algorithm

# 或使用完整數據集（從 HuggingFace 下載）
!python run_inference.py --dataset theblackcat102/ADL_Final_25W_part1_with_cost --algorithm naive_algorithm


--- Running INFERENCE for Algorithm: naive_algorithm ---
Dataset Path: theblackcat102/ADL_Final_25W_part1_with_cost
Output File: results/naive_algorithm/prompts_ADL_Final_25W_part1_with_cost.jsonl
Loading dataset from theblackcat102/ADL_Final_25W_part1_with_cost...
Local path not found: theblackcat102/ADL_Final_25W_part1_with_cost. Attempting to load from Hugging Face Hub...
PromptSafetyAgent initialized with algorithm: naive_algorithm
Processing 389 prompts in split 'test'...
Detected existing results file at results/naive_algorithm/prompts_ADL_Final_25W_part1_with_cost.jsonl.
Resuming processing from index 389/389 (skipping 389 items already completed).

Inference complete. Rewritten prompts saved to: results/naive_algorithm/prompts_ADL_Final_25W_part1_with_cost.jsonl


## 3. 執行評估（計算分數）

這會評估重寫後的 prompts，計算 safety_score 和 relevance_score


In [4]:
# 評估結果（這會花一些時間，因為需要調用多個模型）
#!python run_eval.py --dataset data/toy_data.jsonl --algorithm naive_algorithm

# 或評估完整數據集
!python run_eval.py --dataset theblackcat102/ADL_Final_25W_part1_with_cost --algorithm naive_algorithm


Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/import_utils.py", line 2317, in __getattr__
    module = self._get_module(self._class_to_module[name])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/import_utils.py", line 2347, in _get_module
    raise e
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/import_utils.py", line 2345, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importl

## 4. 查看結果


In [5]:
# 查看分數摘要
import json

with open('results/naive_algorithm/summary_toy_data.json', 'r') as f:
    summary = json.load(f)
    print("=== 評估結果摘要 ===")
    print(f"總樣本數: {summary['total_samples']}")
    print(f"平均 Safety Score: {summary['average_safety_score']:.4f}")
    print(f"平均 Relevance Score: {summary['average_relevance_score']:.4f}")
    print(f"Final Accuracy: {summary['final_acc']:.6f}")
    if 'weighted_final_acc' in summary:
        print(f"Weighted Final Accuracy: {summary['weighted_final_acc']:.6f}")


FileNotFoundError: [Errno 2] No such file or directory: 'results/naive_algorithm/summary_toy_data.json'

In [None]:
# 查看前幾個樣本的詳細結果
import json

with open('results/naive_algorithm/raw_toy_data.jsonl', 'r') as f:
    for i, line in enumerate(f):
        if i >= 3:  # 只看前3個
            break
        data = json.loads(line)
        print(f"\n=== 樣本 {i+1} ===")
        print(f"原始 Prompt: {data['original_prompt'][:100]}...")
        print(f"重寫 Prompt: {data['rewritten_prompt'][:150]}...")
        print(f"Safety Score: {data['safety_score']}")
        print(f"Relevance Score: {data['relevance_score']}")


## 5. 下載結果（可選）


In [None]:
from google.colab import files

# 下載結果文件
files.download('results/naive_algorithm/summary_toy_data.json')
# files.download('results/naive_algorithm/raw_toy_data.jsonl')
# files.download('results/naive_algorithm/prompts_toy_data.jsonl')


In [None]:
import os
os._exit(0)