LLMAgentOps
Toolkit is repository that contains basic structure of LLM Agent based application built on top of the Semantic Kernel. The toolkit is designed to be a starting point for data scientists and developers for experimentation to evaluation and finally deploy to production their own LLM Agent based applications.
The sample MySql Copilot
has been implemented using the concept of StateFlow
(a Finite State Machine FSM based LLM workflow) using Semantic Kernel agents. This is equivalent to AutoGen Selector Group Chat Pattern with custom selector function
For more details on StateFlow
refer the research paper - StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows.
This toolkit can be used by replacing the MySql Copilot
with any other LLM Agent based solution or it can be enhanced for a specific use case.
The LLMAgentOps
Architecture might be constructed using the following key components divided into two phases similar to DevOps / MLOps / LLMOps development and deployment phases:
- LLM Agent Development Phase (inner loop):
Agent Architecture
: Designing the agent architecture for the LLM Agent based solution. For this sample we have usedSemantic Kernel
development kit by usingPython
programming language.Experimentation & Evaluation
: Experimentation and Evaluation of the LLM Agent based solution. Where the experimentation is done usingconsole
orui
or inbatch
mode and evaluation is done usingLLM as Judge
andHuman Evaluation
.
- LLM Agent Deployment Phase (outer loop):
GitHub Actions
: Continuous Integration, Continuous Evaluation and Continuous Deployment of the LLM Agent based solution with addition of Continuous Security for security checks of the LLM Agents.Deployment
: Deployment of the LLM Agent based solution inlocal
orcloud
environment.Monitoring
: Monitoring the LLM Agent based solution for data collection, performance and other metrics.
This repository is having the follow key features:
- Source Code Structure: The source code is structured in such a way that it can be easily developed and maintained by data scientists and developers together with following key concepts of dividing the code into two parts -
core
andops
:- Core: The LLM Agent core implementation code.
- Agents Base Class: The base class for the agents.
- Agents: All of the agents with their specific prompts and descriptions. Example: Observe Agent.
- Code Execute Agent: The code execute agent is an agent that can join the group of agents but it will execute the code (in this sample it is SQL queries) and return the result, instead of using LLM for generating response like other agents.
- Group Chat Selection Logic: The group chat selection logic is used to select the appropriate next agent based on the current state of the conversation. In this sample the concept of
StateFlow
is used for the selection of the next agent. - Group Chat Termination Logic: The group chat termination logic is used to terminate the conversation based on the current state of the conversation or maximum number of turns. In this sample the concept of
StateFlow
is used for the termination of the conversation. - Group Chat: The group chat contains the group chat client that can serve the conversation between the user and the agents.
- Ops: The operational code for the LLM Agent based solution.
- Observability: The observability code contains the code for logging and monitoring the agents. In this sample
OpenTelemetry
is used for logging and monitoring. - MySql Interaction: The MySql interaction code contains the code for interacting with MySql database.
- Deployment: The[deployment code contains the code for deploying the agents in local or cloud environment. In this sample the code is provided for deploying the agents in Azure Web App Service. The deployment code will be:
- Source Module: core implementation of the agents and group chat.
- REST API Based App: REST API based app for calling the agents and getting the response (in this example it's
FastAPI
). - Dockefile: Dockerfile for building the image of the entire application.
- Requirements file for the dependencies.
- Observability: The observability code contains the code for logging and monitoring the agents. In this sample
- Core: The LLM Agent core implementation code.
- Experimentation: The experimentation setup by using
console
orui
or inbatch
mode. - Evaluation: The evaluation setup by using
LLM as Judge
andHuman Evaluation
. - Security: The security setup for the security checks of the LLM Agent based solution.
- GitHub Actions: The CI CE CD and CS setup for the continuous integration, continuous evaluation, continuous deployment and continuous security of the LLM Agent based solution.
- Engineering Fundamentals: The engineering fundamentals for the development and maintenance of the LLM Agent based solution.
This repository is having a sample implementation, that can be used as-is by following the steps below. Or the sample can be replaced with any other LLM Agent based solution or it can be enhanced for a specific use case.
- Visual Studio Code with Dev Containers extension.
- Docker Desktop.
- Azure AI Foundry Service.
- Azure OpenAI Chat Model.
- Azure Web App Service (needed only for CD).
- Azure Application Insights (needed only for CD).
Experimentation is the process of designing the agents and testing a hypothesis or a proposed LLM Agents based solution to a problem.
Evaluation is the process of evaluating the performance of the LLM Agents based solution, that will help in the decision making process of the LLM Agents based solution.
Security is the process of ensuring the security of the LLM Agents based solution. Agents are going to write / execute code, browse the web, and interact with databases, hence security is a key concern and must be designed and implemented from the beginning.
The repository is setup with GitHub Actions for the continuous integration, continuous evaluation, continuous deployment and continuous security of the LLM Agent based solution.
- CI: The CI workflow is triggered on every push or pull request to the repository. The CI workflow will run the unit tests and linting checks.
- CE: The CE workflow is triggered manually. The CE workflow will run the batch experimentation and batch evaluation (LLM as Judge).
- CD: The CD workflow is triggered manually. The CD workflow will deploy the LLM Agent based solution to the Azure Web App Service.
- CS: The CS workflow is triggered manually. The CS workflow will run the security checks of the LLM Agent based solution.
The repository is setup with Dev Containers for development and testing.
conda activate base
pylint src
conda activate base
python -m unittest discover -s tests
Get the test coverage report:
pip install coverage
python -m coverage run --source src -m unittest discover -s tests
python -m coverage report -m
cp env_docker .env_docker # only once and update the values
docker build --rm -t stateflow-semantic-kernel-api:latest .
docker run -d --link mysql_server:mysql-local --name StateFlowApiSemanticKernel -p 8085:8000 --env-file .env_docker stateflow-semantic-kernel-api:latest