This repo continas a more secure AI Agent that can complete tasks using Slack, Gmail, Firecrawl (scraping and crawling urls), and files. An LLM generates a prediction of the tools needed to complete each task and the reference monitor ensures that the AI Agent only calls these tools to prevent the Agent using expected tools in injection attacks.
To run the AI agent with its test suite, run this command.
python3 main.py --model=openai --file_mode=native
--model={llama, openai}
--model_size=[small, medium, big]
--file_mode={sandbox, native}
Model size is only relevant to the llama models. The file mode will determine whether prompts that modify files modify actual files or those in a sandboxed environment.
These files are only generated after running tests.
This file gives warnings about anything that needs to be manually checked to verify correct output for the given test files. It also reports whether the expected files were generated.
This is a copy of the sandbox file system within the program that will be printed at the end to give a full understanding of how the files were modified.
You will need to create a .env file with a SLACK_BOT_TOKEN to use the Slack bot functions of this agent.
Put your gmail credentials here if you want to be able to use the gmail functions of this agent.
Put your OpenAI API key here if you want to use the OpenAI models.
These are the initial files needed to successfully complete the given prompts. The AI agents will make modifications in this folder if the selected file_mode is native. Otherwise, the dictionary representation of these files will be used.
These are the files expected to also be in test_files once the test prompts are all completed given a native system file_mode.
Test orchestration script Runs test commands from testing.py (not provided but imported) Creates numbered directories (001/, 002/, etc.) for each test Generates recordings of LLM/agent interactions Tests for prompt injection vulnerabilities
The trusted LLM that takes user prmopts and generates a matching XML tree of instructions for the Reference Monitor to enforce. Uses OS.py when using Llama models, otherwise calls OpenAI.
Implements the Llama models for both the LLM and AI Agent. Contains functionality for loading these models and generating responses. This does not implement the OpenAI models.
The untrusted AI Agent that can call tools to complete given prompts. These tool calls must match the previously generated XML tree or the reference monitor will not allow the unauthorized tool use to prevent injection attacks. Uses OS.py when using Llama models, otherwise calls OpenAI.
This is the reference monitor that enforces the XML trees. It parses the XMLs and defines the functions for validating AI agent calls.
This implements the available functions for the AI agent within a Refmon Environment. This includes file operations, Gmail, Slack bots, and web scraping with Firecrawl. Each tool call must be checked by the reference monitor before executing. The RefmonEnvironment uses the XML tree from refmon to validate each Agent tool call against that tree. Unauthorized operations will be blocked.
This contains the system prompts for the LLM and AI Agent.
This contains the test suite, the expected output for those tests, and checks for whether tests are as expected.
This checks the error.txt files from each test and prints these for easy-to-read, deduplicated errors.
Used for evaluating the LLM's generated XML tree for correctness for 20 specific prompts using the examples in expected_xmls_impor. Test suite in testing.py must be updated to match before using this.
These are the expected XMLs for the impor.py test.