Skip to content

Pull requests: criticalml-uw/TamperBench

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Add Lab Bench eval
#103 opened Feb 24, 2026 by mruwnik Loading…
t_vaccine: Add T-vaccine defense
#100 opened Feb 24, 2026 by tomtseng Draft
infra: added defense hparam sweep scripts
#98 opened Feb 17, 2026 by sdhossain Loading…
attacks: Unify chat templating
#97 opened Feb 14, 2026 by tomtseng Loading…
Defenses & evals: Unify chat templating
#95 opened Feb 14, 2026 by tomtseng Loading…
defense: RSN-Tune defense Adds or modifies defenses
#47 opened Dec 11, 2025 by mruwnik Loading…
eval: added lm-eval evaluation Adds or modifies evaluation
#36 opened Oct 14, 2025 by psyonp Loading…
eval: GPQA Evaluation evaluation Adds or modifies evaluation
#30 opened Sep 22, 2025 by MKowal2 Loading…
attack: Added refusal ablation attack attack Adds or modifies attacks
#27 opened Sep 2, 2025 by NayeemaNonta Loading…
attack: added Wanda Pruning (attack) attack Adds or modifies attacks
#26 opened Aug 29, 2025 by esveee Loading…
attack: added latent perturbation attack attack Adds or modifies attacks
#23 opened Aug 27, 2025 by psyonp Loading…
attack: ported bidirectional fine-tuning attack attack Adds or modifies attacks
#15 opened Aug 12, 2025 by psyonp Draft
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.