# Qwen2.5-Coder (Hugging Face)

This notebook helps you install dependencies and load a lightweight Qwen2.5-Coder model locally on macOS. Default: `Qwen/Qwen2.5-Coder-1.5B-Instruct` for faster startup. We'll use CPU or Apple Silicon MPS automatically.

Notes:
- First load will download model weights (~3–4 GB). Ensure disk space and a stable connection.
- You can switch to larger models later (7B, 32B, etc.) or try quantized GGUF via llama.cpp for lower RAM.


## AdvBench dataset (walledai/AdvBench)

This dataset is gated on Hugging Face. To access:
- You must be logged in to Hugging Face and accept the dataset conditions on its page.
- In a terminal, optionally login to persist credentials: `huggingface-cli login`
- The code below will attempt to load via `datasets.load_dataset`. If access is denied, it will fall back to a public CSV from the paper’s GitHub mirror.


In [None]:
# Download AdvBench into data/advbench
import os, pathlib, json
from datasets import load_dataset
import pandas as pd

root = pathlib.Path.cwd()
data_dir = root / 'data' / 'advbench'
data_dir.mkdir(parents=True, exist_ok=True)

print('Target directory:', data_dir)

# Try Hugging Face datasets first (requires access acceptance)
dset = None
try:
    dset = load_dataset('walledai/AdvBench')
    # The dataset is small; save each split to CSV
    for split in dset:
        df = dset[split].to_pandas()
        out = data_dir / f'{split}.csv'
        df.to_csv(out, index=False)
        print('Saved:', out)
except Exception as e:
    print('HF dataset access failed or gated; falling back to GitHub CSV. Error:', e)

if dset is None:
    # Fallback to the GitHub mirror from the paper
    url = 'https://raw.githubusercontent.com/llm-attacks/llm-attacks/main/data/advbench/harmful_behaviors.csv'
    try:
        df = pd.read_csv(url)
        out = data_dir / 'harmful_behaviors.csv'
        df.to_csv(out, index=False)
        print('Saved fallback CSV:', out)
    except Exception as e2:
        raise RuntimeError(f'Failed to fetch AdvBench from both HF and GitHub: {e2}')

print('Done.')

In [None]:
# Install runtime deps. Run once per environment.
import sys, subprocess, textwrap

pkgs = [
    'transformers>=4.44.0',
    'torch',        # macOS CPU/MPS wheels
    'accelerate>=0.33.0',
    'sentencepiece',
    'huggingface_hub>=0.23.0',
    'datasets>=2.20.0'
]

print('Installing:', pkgs)
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '--upgrade'] + pkgs)
print('Done. You may need to restart kernel if torch backend was updated.')

In [None]:
# Use the 7B Instruct model
import os
os.environ['QWEN_MODEL'] = 'Qwen/Qwen2.5-Coder-7B-Instruct'
print('QWEN_MODEL =', os.environ['QWEN_MODEL'])

In [None]:
# Load Qwen2.5-Coder and run a quick test
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
import torch, os

model_id = os.environ.get('QWEN_MODEL', 'Qwen/Qwen2.5-Coder-1.5B-Instruct')

# Device selection: prefer Apple Silicon MPS if available, else CPU
if torch.backends.mps.is_available() and torch.backends.mps.is_built():
    device = torch.device('mps')
else:
    device = torch.device('cpu')
print('Using device:', device)

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16 if device.type == 'mps' else torch.float32,
    device_map=None
)
model.to(device)

prompt = "# Write a Python function to reverse a string with tests."
messages = [
    {"role": "system", "content": "You are Qwen2.5-Coder, a helpful coding assistant."},
    {"role": "user", "content": prompt}
]

# Qwen Instruct uses chat template; fall back to plain when missing
if hasattr(tokenizer, 'apply_chat_template'):
    input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
else:
    input_text = prompt

inputs = tokenizer(input_text, return_tensors='pt').to(device)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

with torch.inference_mode():
    output_ids = model.generate(
        **inputs,
        streamer=streamer,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.2,
        top_p=0.95,
        eos_token_id=tokenizer.eos_token_id
    )

print('\n\nDone.')

## Optional: Download with `huggingface-cli`

If you prefer to pre-download the model (instead of letting Transformers fetch on first use), you can run in a terminal:

```bash
# Login (optional for gated/private; public models can be downloaded anonymously)
huggingface-cli login

# Download the 1.5B Instruct model to the cache
huggingface-cli download Qwen/Qwen2.5-Coder-1.5B-Instruct --local-dir ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-1.5B-Instruct --local-dir-use-symlinks False
```

To try larger variants, replace the repo name:
- `Qwen/Qwen2.5-Coder-7B-Instruct`
- `Qwen/Qwen2.5-Coder-32B-Instruct`

For lower RAM via quantization, consider GGUF with llama.cpp (CPU/GPU offload).

> Memory note: The 7B model can use ~14–18 GB RAM/VRAM depending on dtype. On Mac with MPS, it may still run but can be slower or OOM in low-memory situations. If you hit OOM, try:
> - Close other apps; ensure enough swap free.
> - Reduce `max_new_tokens` or use `temperature=0.0` for greedy.
> - Switch to smaller: `Qwen/Qwen2.5-Coder-1.5B-Instruct`.
> - Consider quantized GGUF via llama.cpp for lower RAM.