This is a demonstration of an AI agent-based system for automatically grading programming assignments. It uses a WebSocket-based backend with multiple specialized agents that work together to analyze, review, and grade student code submissions.
agent_backend.py- The WebSocket server that manages agent communication and execution flowgrader_initialize.json- Initialization payload for the grading agentsgrader_execute.json- Execution payload with problem and student submissiongrader_payload_full.json- Complete reference payload (both init and execute)grader_test.py- Python script to test the grading systemgrader_demo.py- Demo script that simulates the grading workflow without API keysrun_grader.sh- Bash script to run the grader using websocat
- Make sure you have Python 3.7+ installed
- Install the required packages:
pip install websockets asyncio python-dotenv - Create a
.envfile in the project root with your OpenAI API key:OPENAI_API_KEY=your_openai_api_key_here - If using the shell script, install websocat:
- Linux:
sudo apt install websocat - MacOS:
brew install websocat - Windows: Download from the GitHub releases page
- Linux:
-
Start the backend server:
python agent_backend.py -
Run the test script:
python grader_test.py
If you want to see a simulated demonstration:
python grader_demo.py
-
Start the backend server:
python agent_backend.py -
Run the shell script:
bash run_grader.shOr manually send the WebSocket requests:
websocat ws://localhost:8765 < initialize_payload.json websocat ws://localhost:8765 < execution_payload.json
{
"action": "initialize",
"nodes": [
{
"id": "agent1",
"name": "data_collector",
"description": "Collects and organizes data",
"system_message": "You are a data collection expert. Collect and organize information from the user's query in a structured format.",
"feedback_enabled": true
},
{
"id": "agent2",
"name": "researcher",
"description": "Researches specific topics in depth",
"system_message": "You are a research specialist. Focus on providing detailed, well-researched information on the user's query.",
"feedback_enabled": true
},
{
"id": "agent3",
"name": "analyst",
"description": "Analyzes data and provides insights",
"system_message": "You are a data analyst. Analyze information and provide meaningful insights and patterns.",
"feedback_enabled": false
},
{
"id": "agent4",
"name": "summarizer",
"description": "Summarizes information concisely",
"system_message": "You are a summarization expert. Create concise, informative summaries that capture the key points.",
"feedback_enabled": false
}
]
}Note: The feedback_enabled flag allows an agent to request additional information from the user during execution. When set to true, the agent can ask for clarification or more details by including [FEEDBACK_REQUEST: your question here] in its response. The execution will pause, wait for user input, and then continue with the additional information.
{
"action": "execute",
"user_question": "Explain the impact of artificial intelligence on healthcare over the next decade.",
"execution_flow": [
"agent1",
["agent2", "agent3"],
"agent4"
]
}When an agent with feedback_enabled: true requests user feedback, the server will send a message with the following format:
{
"type": "feedback_request",
"agent": "requirements_analyst",
"agent_id": "requirements",
"question": "Can you provide more details about the workout types the app should track?",
"original_response": "Based on your initial request, I've outlined the following requirements... [FEEDBACK_REQUEST: Can you provide more details about the workout types the app should track?]",
"feedback_count": 1,
"max_feedback": 2
}The client can respond in one of three ways:
- Providing detailed feedback:
{
"type": "feedback_response",
"content": "The app should track weightlifting with sets, reps, and weight; cardio with distance, time, and calories; and flexibility exercises with duration and difficulty level. I also want custom workout creation."
}- Indicating satisfaction (short positive response):
{
"type": "feedback_response",
"content": "Yes, that's perfect. Please continue."
}Note: Short positive responses like "yes", "good", "continue" are automatically detected as satisfaction indicators and will allow the agent to move to the next step without further requests.
- Skipping feedback entirely:
{
"type": "feedback_skip"
}Features and Limitations:
- Each agent is limited to a maximum of 2 feedback requests
- When the user indicates satisfaction, the agent will stop requesting feedback and move to the next step
- Feedback count is tracked per agent
- The system automatically detects satisfaction indicators in short responses
The grading system uses a pipeline of 3 specialized AI agents:
- Problem Analyzer - Breaks down the problem requirements into evaluation criteria
- Code Reviewer - Reviews the student code against these criteria
- Grader - Assigns a final grade with justification based on the analysis and review
Communication is managed through a WebSocket server that routes messages between agents and returns the final grade.
To grade different types of programming assignments:
- Modify the problem statement in
grader_execute.json - Update the student code submission in the same file
- Run the grading process again
You can also customize the agents by modifying their system messages in grader_initialize.json.