This library is testing the ethics of language models by using natural adversarial texts.
This tool allows for short and simple code and validation with little effort.
Extensibility to be applied to any language model in any problem setting by inheriting from the base class.
pip install prompt2slip
The simplest sample looks like this.
from prompt2slip import CLMAttacker
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
base_text = ["This project respects pytorch developers."]
target_words = ["keras"]
target_ids = torch.Tensor(tokenizer.convert_tokens_to_ids(target_words)).long()
attacker = CLMAttacker(model, tokenizer)
output = attacker.attack_by_text(base_text, target_ids)
More realistic use cases are stored in examples.
"prompt2slip" provides the function to search for prompts which cause appearance of any specific word against a pre trained natural language generation model. Furthermore, with user customization, it can be applied to a wide range of tasks, including classification tasks.If you want to generate a hostile sample for a classification model, you can simply override the method to compute the adversarial loss function to generate a natural adversarial text.
The unique feature of this library is that it can generate test cases for verifying the danger of a pre-trained natural language model with a few lines of code.
Install poetry
poetry install
Running tests with pytest.
poetry run pytest --cov .
Thanks goes to these wonderful people (emoji key):
Aki Fukuchi 💻 🐛 🖋 💡 📹 👀 🧑🏫 |
Takashi MIMA 🐛 💻 📖 |
Koga Kobayashi 💡 🤔 💬 👀 |
Uno-Takashi 🐛 💻 📖 🤔 📦 👀 |