Identify & remediate your AI models' security risks with Mindgard's market leading attack library. Mindgard covers many threats including:
β Jailbreaks
β Prompt Injection
β Model Inversion
β Extraction
β Poisoning
β Evasion
β Membership Inference
Mindgard CLI is fully integrated with Mindgard's platform to help you identify and triage threats, select remediations, and track your security posture over time.
Test continuously in your MLOps pipeline to identify model posture changes from customisation activities including prompt engineering, RAG, fine-tuning, and pre-training.
pip install mindgard
mindgard login
If you are a mindgard enterprise customer, login to your enterprise instance using the command:
mindgard login --instance <name>
Replace <name>
with the instance name provided by your Mindgard representative. This instance name identifies your SaaS, private tenant, or on-prem deployment.
To perform a bulk deployment:
- Login and Configure: Login and Configure the Mindgard CLI on a test workstation
- Provision Files: Provision the files contained in the
.mindgard/
folder within your home directory to your target instances via your preferred deployment mechanism.
The .mindgard/
folder contains:
token.txt
: A JWT for authentication.instance.txt
(enterprise only): Custom instance configuration for your SaaS or private tenant.
mindgard sandbox mistral
mindgard sandbox cfp_faces
Our testing infrastructure can be pointed at your models using the CLI.
Testing an external model uses the test
command and can target either LLMs or Image Classifiers
mindgard test <name> --url <url> <other settings>
mindgard test my-model-name \
--url http://127.0.0.1/infer \ # url to test
--selector '["response"]' \ # JSON selector to match the textual response
--request-template '{"prompt": "[INST] {system_prompt} {prompt} [/INST]"}' \ # how to format the system prompt and prompt in the API request
--system-prompt 'respond with hello' # system prompt to test the model with
Image models require a few more parameters than LLMs so we recommend using a configuration file:
target = "my-custom-model"
model-type = "image"
api_key = "hf_###"
url = "https://####.@@@@.aws.endpoints.huggingface.cloud"
dataset = "beans"
labels='''{
"0": "angular_leaf_spot",
"1": "bean_rust",
"2": "healthy"
}'''
After saving as image-model-config.toml
, it can be used in the test command as follows:
mindgard test --config=image-model-config.toml
Click here for more on our datasets, labels and our supported API.
A preflight check is run automatically when submitting a new test, but if you want to invoke it manually:
mindgard validate --url <url> <other settings>
mindgard validate \
--url http://127.0.0.1/infer \ # url to test
--selector '["response"]' \ # JSON selector to match the textual response
--request-template '{"prompt": "[INST] {system_prompt} {prompt} [/INST]"}' \ # how to format the system prompt and prompt in the API request
--system-prompt 'respond with hello' # system prompt to test the model with
You can specify the settings for the mindgard test
command in a TOML configuration file. This allows you to manage your settings in a more structured way and avoid passing them as command-line arguments.
Then run: mindgard test --config-file mymodel.toml
There are examples of what the configuration file (mymodel.toml
) might look like here in the examples/ folder
Here are two examples:
This example uses the built in preset settings for openai. Presets exist for openai
, huggingface
, and anthropic
target = "my-model-name"
preset = "openai"
api_key= "CHANGE_THIS_TO_YOUR_OPENAI_API_KEY"
system-prompt = '''
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
'''
You will need to substitute your own api_key
value.
The target
setting is an identifier for the model you are testing within the Mindgard platform, tests for the same model will be grouped and traceable over time.
Altering the system-prompt
enables you to compare results with different system prompts in use. Some of Mindgard's tests assess the efficacy of your system prompt.
Any of these settings can also be passed as command line arguments. e.g. mindgard test my-model-name --system-prompt 'You are...'
. This may be useful to pass in a dynamic value for any of these settings.
This example shows how you might test OpenAI if the preset did not exist. With the request_template
and selector
settings you can interface with any JSON API.
target = "my-model-name"
url = "https://api.openai.com/v1/chat/completions"
request_template = '''
{
"messages": [
{"role": "system", "content": "{system_prompt}"},
{"role": "user", "content": "{prompt}"}],
"model": "gpt-3.5-turbo",
"temperature": 0.0,
"max_tokens": 1024
}
'''
selector = '''
choices[0].message.content
'''
headers = "Authorization: Bearer CHANGE_THIS_TO_YOUR_OPENAI_API_KEY"
system_prompt = '''
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
'''
The request_template
setting specifies how to structure an outgoing message to the model. You will need to specify the {system_prompt}
and {prompt}
placeholders so that Mindgard knows how to pass this information to your custom API.
The url
setting should point to an inference endpoint for your model under test. Mindgard will POST messages here formatted by the above request_template
setting.
The selector
setting is a JSON selector and specifies how to extract the model's response from the API response.
The headers
setting allows you to specify a custom HTTP header to include with outgoing requests, for example to implement a custom authentication method.
We have a fixed set of datasets to chose from covering diverse domains such as facial recognition, medical imaging, satellite imagery, and handwritten digit recognition, allowing for a suite of different custom models to be tested.
CLI Dataset | Domain | Source/Name |
---|---|---|
mri | Classification of Alzheimers based on MRI scans | HuggingFace Alzheimer_MRI |
xray | Classification of Pneumonia based on chest x-rays | HuggingFace chest-xray-classification |
rvltest_mini | Classification of documents as letter, memo, etc | HuggingFace rvlTest |
eurosat | Classification of satellite images by terrain features | HuggingFace eurosat-demo |
mnist | Classification of handwritten digits 0 - 9 | TorchVision MNIST |
beans | Classification of leaves as either healthy or unhealthy. | HuggingFace beans |
Many image classifiers don't return probabilities for all classes. A config is required to make sure we're aware of all the labels and tensor indexes for the classes you're going to send us.
For a eurosat model,
{
"0": "AnnualCrop",
"1": "Forest",
"2": "HerbaceousVegetation",
"3": "Highway",
"4": "Industrial",
"5": "Pasture",
"6": "PermanentCrop",
"7": "Residential",
"8": "River",
"9": "SeaLake"
}
Mindgard supports any image model compatible with an API compatible with HuggingFace's InferenceEndpoints for image classifiers.
curl "https://address.com/model" \
-X POST \
--data-binary '@cats.jpg' \
-H "Accept: application/json" \
-H "Content-Type: image/jpeg"
The image in bytes will be sent in the data field of the POST request, and the HTTP response body should include predictions in the form:
[
{
"label": "stretcher",
"score": 0.44380655884742737
},
{
"label": "basketball",
"score": 0.08756192773580551
},
{
"label": "prison, prison house",
"score": 0.06375777721405029
},
{
"label": "scoreboard",
"score": 0.043840788304805756
},
{
"label": "neck brace",
"score": 0.029874464496970177
}
]
The exit code of a test will be non-zero if the test identifies risks above your risk threshold. To override the default risk-threshold pass --risk-threshold 50
. This will cause the CLI to exit with an non-zero exit status if any test results in a risk score over 50.
See an example of this in action here: https://github.com/Mindgard/mindgard-github-action-example
You have the option to set the parallelism parameter which sets the maximum amount of requests you want to target your model concurrently. This enables you to protect your model from getting too many requests.
We require that your model responds within 60s so set parallelism accordingly (it should be less than the number of requests you can serve per minute).
Then run: mindgard test --config-file mymodel.toml --parallelism X
You can provide the flag mindgard --log-level=debug <command>
to get some more info out of whatever command you're running. On unix-like systems, mindgard --log-level=debug test --config=<toml> --parallelism=5 2> stderr.log
will write stdout and stderr to file.
When running tests with huggingface-openai
preset you may encounter compatibility issues. Some models, e.g. llama2, mistral-7b-instruct are not fully compatible with the OpenAI system. This can manifest in Template errors which can be seen by setting --log-level=debug
:
DEBUG Received 422 from provider: Template error: unknown method: string has no method named strip (in <string>:1)
DEBUG Received 422 from provider: Template error: syntax error: Conversation roles must alternate user/assistant/user/assistant/... (in <string>:1)
Try using the simpler huggingface
preset, which provides more compatibility through manual configuration, but sacrifices chat completion support.
From our experience, newer versions of models have started including the correct jinja templates so will not require config adjustments.
We would like to thank and acknowledge various research works from the Adversarial Machine Learning community, which inspired and informed the development of several AI security tests accessible through Mindgard CLI.
Jiang, F., Xu, Z., Niu, L., Xiang, Z., Ramasubramanian, B., Li, B., & Poovendran, R. (2024). ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs. arXiv [Cs.CL]. Retrieved from http://arxiv.org/abs/2402.11753
Russinovich, M., Salem, A., & Eldan, R. (2024). Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack. arXiv [Cs.CR]. Retrieved from http://arxiv.org/abs/2404.01833
Goodside, R. LLM Prompt Injection Via Invisible Instructions in Pasted Text, Retreved from https://x.com/goodside/status/1745511940351287394
Yuan, Y., Jiao, W., Wang, W., Huang, J.-T., He, P., Shi, S., & Tu, Z. (2024). GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher. arXiv [Cs.CL]. Retrieved from http://arxiv.org/abs/2308.06463