🐢 Open-Source Evaluation & Testing for LLMs and ML models
-
Updated
Jun 1, 2024 - Python
🐢 Open-Source Evaluation & Testing for LLMs and ML models
A curated list of awesome responsible machine learning resources.
DPLL(T)-based Verification tool for DNNs
Universal Neurons in GPT2 Language Models
Aira is a series of chatbots developed as an experimentation playground for value alignment.
Extended, multi-agent and multi-objective (MaMoRL) environments based on DeepMind's AI Safety Gridworlds. This is a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. It is made compatible with OpenAI's Gym/Gymnasium and Farama Foundation PettingZoo.
RuLES: a benchmark for evaluating rule-following in language models
Scan your AI/ML models for problems before you put them into production.
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
A novel physical adversarial attack tackling the Digital-to-Physical Visual Inconsistency problem.
The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)
Code for our paper "Modelobfuscator: Obfuscating Model Information to Protect Deployed ML-Based Systems" that has been published by ISSTA'23
Attack to induce LLMs within hallucinations
Website to track people, organizations, and products (tools, websites, etc.) in AI safety
Measure and Boost Backdoor Robustness
Awesome PrivEx: Privacy-Preserving Explainable AI (PPXAI)
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."