# ADL 2025 Final - Jailbreak Olympics

在 Colab 上執行推理和評估

## 重要提示
1. 確保選擇 GPU（Runtime -> Change runtime type -> GPU -> A100）
2. 上傳整個專案到 Colab（或從 GitHub 克隆）
3. 按照順序執行每個 cell


## 1. 環境設置


In [1]:
# 安裝依賴
# 確保 torch 和 torchvision 版本兼容 (torch 2.4.0 + torchvision 0.19.0)
!pip install torch==2.4.0 torchvision==0.19.0

# 安裝最新的 transformers 和 accelerate 以支持 Qwen/Qwen3Guard 模型
# 移除 sentence-transformers 的版本鎖定以避免衝突
!pip install --upgrade transformers accelerate sentence-transformers python-dotenv gdown datasets tqdm

print("依賴安裝完成！請務必重啟 Runtime (Runtime -> Restart runtime)！")

[0mCollecting transformers
  Using cached transformers-4.57.3-py3-none-any.whl.metadata (43 kB)
Using cached transformers-4.57.3-py3-none-any.whl (12.0 MB)
[0mInstalling collected packages: transformers
[0mSuccessfully installed transformers
依賴安裝完成！請務必重啟 Runtime (Runtime -> Restart runtime)！


In [2]:
# 如果從 GitHub 克隆
!git clone https://github.com/LCK0527/ADL
%cd ADL
# 如果已經上傳到 Colab，進入目錄
# %cd /content/2025-ADL-Final-Challenge-Release

# 檢查當前目錄
import os
print(f"當前目錄: {os.getcwd()}")
print(f"專案文件: {os.listdir('.')}")


fatal: destination path 'ADL' already exists and is not an empty directory.
/content/ADL
當前目錄: /content/ADL
專案文件: ['src', 'data', 'run_eval.py', 'run_inference.py', 'models', 'requirements.txt', '.gitignore', '.git', 'README.md', 'results', 'colab_setup.ipynb']


## 2. 執行推理（重寫 Prompts）

這會讀取數據集，使用你的算法重寫 prompts，並保存結果


In [3]:
# 使用小樣本測試（快速驗證）
#!python run_inference.py --dataset data/toy_data.jsonl --algorithm advanced_obfuscation_algorithm

# 或使用完整數據集（從 HuggingFace 下載）
# 使用 advanced_obfuscation_algorithm：針對低safety_score改進，避免明顯jailbreak關鍵詞
!python run_inference.py --dataset theblackcat102/ADL_Final_25W_part1_with_cost --algorithm advanced_obfuscation_algorithm


--- Running INFERENCE for Algorithm: advanced_obfuscation_algorithm ---
Dataset Path: theblackcat102/ADL_Final_25W_part1_with_cost
Output File: results/advanced_obfuscation_algorithm/prompts_ADL_Final_25W_part1_with_cost.jsonl
Loading dataset from theblackcat102/ADL_Final_25W_part1_with_cost...
Local path not found: theblackcat102/ADL_Final_25W_part1_with_cost. Attempting to load from Hugging Face Hub...
Dataset loading or setup failed: Algorithm 'advanced_obfuscation_algorithm' not found in algorithms.py. Make sure the function name is correct.


## 3. 執行評估（計算分數）

這會評估重寫後的 prompts，計算 safety_score 和 relevance_score


In [4]:
# 評估結果（這會花一些時間，因為需要調用多個模型）
#!python run_eval.py --dataset data/toy_data.jsonl --algorithm advanced_obfuscation_algorithm

# 或評估完整數據集
# 使用 advanced_obfuscation_algorithm：針對低safety_score改進，避免明顯jailbreak關鍵詞
!python run_eval.py --dataset theblackcat102/ADL_Final_25W_part1_with_cost --algorithm advanced_obfuscation_algorithm


2025-11-29 17:44:46.866343: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-11-29 17:44:46.884591: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1764438286.906238   29924 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1764438286.913035   29924 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1764438286.929452   29924 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

## 4. 查看結果


In [5]:
# 查看分數摘要
import json

# 注意：這裡的文件路徑已修改為與 run_inference.py 和 run_eval.py 中使用的完整數據集相符
# 使用 advanced_obfuscation_algorithm 的結果
with open('results/advanced_obfuscation_algorithm/summary_ADL_Final_25W_part1_with_cost.json', 'r') as f:
    summary = json.load(f)
    print("=== 評估結果摘要 ===")
    print(f"總樣本數: {summary['total_samples']}")
    print(f"平均 Safety Score: {summary['average_safety_score']:.4f}")
    print(f"平均 Relevance Score: {summary['average_relevance_score']:.4f}")
    print(f"Final Accuracy: {summary['final_acc']:.6f}")
    if 'weighted_final_acc' in summary:
        print(f"Weighted Final Accuracy: {summary['weighted_final_acc']:.6f}")

FileNotFoundError: [Errno 2] No such file or directory: 'results/advanced_obfuscation_algorithm/summary_ADL_Final_25W_part1_with_cost.json'

In [None]:
# 查看前幾個樣本的詳細結果
import json

# 注意：這裡的文件路徑已修改為與 run_inference.py 和 run_eval.py 中使用的完整數據集相符
# 使用 advanced_obfuscation_algorithm 的結果
with open('results/advanced_obfuscation_algorithm/raw_ADL_Final_25W_part1_with_cost.jsonl', 'r') as f:
    for i, line in enumerate(f):
        if i >= 3:  # 只看前3個
            break
        data = json.loads(line)
        print(f"\n=== 樣本 {i+1} ===")
        print(f"原始 Prompt: {data['original_prompt'][:100]}...")
        print(f"重寫 Prompt: {data['rewritten_prompt'][:150]}...")
        print(f"Safety Score: {data['safety_score']}")
        print(f"Relevance Score: {data['relevance_score']}")

## 5. 下載結果（可選）


In [None]:
from google.colab import files

# 下載結果文件
# 注意：這裡的文件路徑已修改為與 run_inference.py 和 run_eval.py 中使用的完整數據集相符
# 使用 advanced_obfuscation_algorithm 的結果
files.download('results/advanced_obfuscation_algorithm/summary_ADL_Final_25W_part1_with_cost.json')
files.download('results/advanced_obfuscation_algorithm/raw_ADL_Final_25W_part1_with_cost.jsonl')
# files.download('results/advanced_obfuscation_algorithm/prompts_ADL_Final_25W_part1_with_cost.jsonl')

In [None]:
import os
os._exit(0)