Intelligent Student Assignment Feedback System
EvalMate is a comprehensive AI-powered assessment platform that automatically evaluates student submissions against custom rubrics using Large Language Model technology. It supports multiple document formats, provides detailed feedback, and offers both web interface and API access.
- 🔍 Multi-format Document Processing: PDF, DOCX, and image files
- 🤖 AI-Powered Evaluation: GPT-4 powered assessment with detailed feedback
- 📋 Smart Rubric Processing: Automatic rubric parsing from uploaded documents
- 🎯 Structured Scoring: Criterion-based evaluation with evidence citations
- 🌐 Multiple Interfaces: Web UI, REST API, and CLI access
- 💾 Flexible Storage: SQLite database with optional JSON backup
- 📊 Rich Export Options: JSON and CSV result formats
- 🔧 Enterprise Ready: Robust error handling and scalable architecture
evalmate/
├── 🎯 Frontend (Next.js)
│ ├── components/ # React components for UI
│ ├── pages/ # Next.js pages and routing
│ ├── styles/ # Tailwind CSS styling
│ └── public/ # Static assets
│
├── 🔧 Backend (FastAPI)
│ ├── app/main.py # FastAPI application entry
│ ├── app/api/ # REST API endpoints
│ │ ├── server.py # API server configuration
│ │ └── schemas.py # API request/response models
│ │
│ ├── app/core/ # Core business logic
│ │ ├── io/ # Document processing & parsing
│ │ │ ├── ingest.py # Multi-format document ingestion
│ │ │ └── rubric_parser.py # Rubric structure extraction
│ │ │
│ │ ├── fusion/ # Evaluation context assembly
│ │ │ ├── builder.py # Context fusion logic
│ │ │ └── schema.py # Fusion data structures
│ │ │
│ │ ├── llm/ # AI evaluation engine
│ │ │ ├── evaluator.py # Main evaluation pipeline
│ │ │ ├── prompts.py # LLM prompt engineering
│ │ │ └── model_api.py # OpenAI API integration
│ │ │
│ │ ├── store/ # Data persistence layer
│ │ │ ├── repo.py # Repository interface
│ │ │ └── sqlite_store.py # SQLite implementation
│ │ │
│ │ ├── models/ # Data schemas and validation
│ │ │ └── schemas.py # Pydantic models
│ │ │
│ │ └── visual_extraction.py # Image and table processing
│ │
│ └── app/ui/ # Command-line interface
│ └── cli.py # Legacy CLI (deprecated)
│
├── 📊 CLI Tool
│ └── evalmate_cli.py # Unified CLI interface
│
├── 📁 Data Storage
│ ├── data/rubrics/ # Uploaded rubric documents
│ ├── data/questions/ # Assignment question files
│ ├── data/submissions/ # Student submission files
│ ├── data/results/ # Evaluation results export
│ └── data/fusion/ # Cached evaluation contexts
│
└── 🧪 Testing & Configuration
├── tests/ # Test suite
├── scripts/ # Utility scripts
└── pyproject.toml # Dependencies and configuration
- Python 3.12+
- Node.js 18+ (for frontend)
- OpenAI API key
git clone <repository-url>
cd evalmate
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies with uv (recommended)
pip install uv
uv sync
# Or install with pip
pip install -e .# Copy environment template
cp .env.example .env
# Edit .env file with your settings
OPENAI_API_KEY=sk-your-openai-key-here
EVALMATE_STORAGE_MODE=sqlite
OPENAI_MODEL=gpt-4o# Initialize SQLite database and data directories
uv run python app/main.pycd frontend
# Install dependencies
npm install
# Build for production
npm run build
# Or run development server
npm run dev# Start FastAPI server
uv run uvicorn app.api.server:app --host 0.0.0.0 --port 8000
# Server will be available at http://localhost:8000
# API documentation at http://localhost:8000/docs# Terminal 1: Start backend
uv run uvicorn app.main:app --host 0.0.0.0 --port 8000
# Terminal 2: Start frontend
cd frontend
npm run start# Check system status
uv run python evalmate_cli.py status
# Run interactive evaluation
uv run python evalmate_cli.py runAll data is stored in the data/ directory:
data/rubrics/- Grading rubricsdata/questions/- Assignment questionsdata/submissions/- Student submissionsdata/evals/- Evaluation resultsdata/db.sqlite3- SQLite database (when using SQLite mode)
Run the complete test suite:
pytest app/tests/ -vRun with coverage:
pytest app/tests/ --cov=app --cov-report=htmlStart the FastAPI server:
# Start the server
uvicorn app.api.server:app --reload --port 8000
# Health check
curl http://localhost:8000/healthRubrics:
# Upload a rubric file
curl -X POST "http://localhost:8000/rubrics/upload" \
-F "file=@rubric.pdf" \
-F "params={\"course\": \"CS101\", \"assignment\": \"A1\", \"version\": \"v1\"}"
# List all rubrics
curl "http://localhost:8000/rubrics/"
# Get specific rubric
curl "http://localhost:8000/rubrics/{rubric_id}"Questions:
# Upload a question file
curl -X POST "http://localhost:8000/questions/upload" \
-F "file=@question.docx" \
-F "params={\"title\": \"Problem 1\", \"rubric_id\": \"rubric_123\"}"
# List questions (with optional filtering)
curl "http://localhost:8000/questions/?rubric_id=rubric_123"
# Get specific question
curl "http://localhost:8000/questions/{question_id}"Submissions:
# Upload a submission file
curl -X POST "http://localhost:8000/submissions/upload" \
-F "file=@submission.pdf" \
-F "params={\"rubric_id\": \"rubric_123\", \"question_id\": \"question_456\", \"student_handle\": \"alice\"}"
# List submissions (with optional filtering)
curl "http://localhost:8000/submissions/?student=alice&rubric_id=rubric_123"
# Get specific submission
curl "http://localhost:8000/submissions/{submission_id}"# Show help
python -m app.ui.cli --help
# Show rubric commands
python -m app.ui.cli rubrics --help# Upload and parse a rubric
python -m app.ui.cli rubrics upload --file rubric.pdf --course CS101 --assignment A1 --version v1
# List all rubrics
python -m app.ui.cli rubrics list
# Get rubric details
python -m app.ui.cli rubrics get --id rubric_123# Upload a question
python -m app.ui.cli questions upload --file question.docx --title "Problem 1" --rubric-id rubric_123
# List questions (optionally filtered by rubric)
python -m app.ui.cli questions list --rubric-id rubric_123
# Get question details
python -m app.ui.cli questions get --id question_456# Upload a submission
python -m app.ui.cli submissions upload --file submission.pdf --rubric-id rubric_123 --question-id question_456 --student alice
# List submissions (optionally filtered)
python -m app.ui.cli submissions list --student alice --rubric-id rubric_123
# Get submission details
python -m app.ui.cli submissions get --id submission_789- Multi-format Ingestion: PDF, DOCX, and image files
- Text Extraction: Clean text extraction with preprocessing
- Visual Element Detection: Extract images and tables from documents
- OCR Support: Text extraction from images and visual content
- Intelligent Parsing: Handle bulleted lists, numbered lists, and tables
- Weight Normalization: Automatic percentage/point conversion
- Criterion Mapping: Smart classification of evaluation criteria
- Multi-format Support: PDF and DOCX rubric documents
- OpenAI Integration: GPT-4o multimodal evaluation
- Per-criterion Assessment: Individual scoring with evidence
- Weighted Scoring: Automatic total score calculation
- JSON Output: Structured evaluation results
- Dual Backend: SQLite or JSON file storage
- REST API: Complete FastAPI endpoints
- CLI Interface: Rich command-line tools
- File Management: Organized upload and asset storage
# OpenAI API (required for evaluation)
export OPENAI_API_KEY=sk-your-key-here
export OPENAI_MODEL=gpt-4o # optional override
# Storage backend (default: sqlite)
export EVALMATE_STORAGE_MODE=sqlite # or 'json'
# Data directory (default: ./data)
export EVALMATE_DATA_DIR=/path/to/data- SQLite (default): Single file database with full SQL support
- JSON: File-based storage for simple deployment scenarios
- Pydantic Models: Comprehensive type-safe schemas for all entities
- ID Management: Deterministic URL-safe ID generation and validation
- Cross-Model Validation: Business rule enforcement and data integrity
- JSON Serialization: Round-trip compatible serialization utilities
- Comprehensive Testing: Validation and serialization test suites
- Core Blocks:
VisualBlock,DocBlockwith strict content validation - Documents:
CanonicalDocwith structured content blocks - Evaluation:
Rubric,RubricItemwith weight validation - Assignment:
Question,Submissionwith referential integrity - Results:
EvalResult,ScoreItemwith evidence tracking
Run the complete schema test suite:
# Test round-trip JSON serialization
python app/tests/test_schemas_roundtrip.py
# Test validation rules (pass/fail scenarios)
python app/tests/test_validations.py
# Run all tests with pytest (once dependencies installed)
python -m pytest app/tests/ -v- PDF Text Extraction: Extract and clean text from PDF files using pdfplumber
- DOCX Processing: Parse Word documents and extract paragraphs
- Image Support: Placeholder for future OCR capabilities
- Text Utilities: Comprehensive text cleaning and preprocessing
- Batch Processing: Ingest multiple files in one operation
- Auto-Detection: Automatic file type detection and routing
- Document Formats: PDF, DOCX, and image files (images return empty blocks for now)
- Text Processing: Clean extracted text, remove artifacts, normalize whitespace
- Structured Output: Convert documents into
CanonicalDocwithDocBlocksegments - Error Handling: Comprehensive error handling with informative messages
- Testing: Complete test suite with mocking for external dependencies
from app.core.io.ingest import ingest_any, batch_ingest
# Ingest a single document
doc = ingest_any("student_essay.pdf")
print(f"Extracted {len(doc.blocks)} text blocks")
# Batch ingest multiple files
docs = batch_ingest(["essay1.pdf", "report.docx", "image.jpg"])
print(f"Processed {len(docs)} documents")pip install pdfplumber python-docx # Install document processing libraries- Unified Repository Interface: Single API for all CRUD operations across entities
- Dual Backend Support: Seamless switching between JSON and SQLite storage
- Complete CRUD Operations: Save, retrieve, list, and delete for rubrics, questions, submissions, and evaluation results
- Backend Abstraction: Pluggable storage system with consistent interface
- Automatic Database Initialization: SQLite schema creation and migration handling
- Comprehensive Testing: Full test coverage for both storage backends and error scenarios
- Repository Pattern: Clean separation between business logic and data persistence
- Backend Selection: Environment-driven backend switching (
EVALMATE_STORAGE_MODE) - Metadata Extraction: Efficient listing with extracted metadata (no full object loading)
- Error Handling: Robust error handling with informative logging
- Data Integrity: Referential integrity and validation across storage backends
- Atomic Operations: Safe atomic writes for JSON backend, transaction support for SQLite
from app.core.store import repo
# Save entities
rubric_id = repo.save_rubric(my_rubric)
question_id = repo.save_question(my_question)
submission_id = repo.save_submission(my_submission)
eval_id = repo.save_eval_result(my_eval_result)
# List entities with metadata
rubrics = repo.list_rubrics()
questions = repo.list_questions(rubric_id="RUB001") # Filter by rubric
submissions = repo.list_submissions(rubric_id="RUB001") # Filter by rubric
evaluations = repo.list_eval_results(submission_id="SUB001") # Filter by submission
# Retrieve full entities
rubric = repo.get_rubric("RUB001")
question = repo.get_question("Q001")
submission = repo.get_submission("SUB001")
evaluation = repo.get_eval_result("EVAL001")# Use SQLite backend (default)
export EVALMATE_STORAGE_MODE=sqlite
# Use JSON file backend
export EVALMATE_STORAGE_MODE=json# Run repository tests
pytest tests/test_repo_storage.py -v
# Test with different backends
EVALMATE_STORAGE_MODE=json pytest tests/test_repo_storage.py -v
EVALMATE_STORAGE_MODE=sqlite pytest tests/test_repo_storage.py -v
# Run repository smoke test
python app/main.pyPhase 4 implements a robust rubric structuring engine that converts CanonicalDoc objects (from Phase 2 ingestion) into normalized Rubric.items lists with intelligent parsing, weight normalization, and criterion mapping.
- Multi-format Rubric Parsing: Handle rubrics authored as bulleted lists, numbered lists, and tables
- Table Extraction: Extract tables from DOCX and PDF documents using camelot-py and tabula-py
- Weight Normalization: Automatic normalization of percentages, points, or equal distribution
- Criterion Mapping: Intelligent classification of rubric items to RubricCriterion enum values
- Graceful Fallbacks: Resilient parsing with deterministic output even for edge cases
- Comprehensive Testing: 22 test cases covering all parsing scenarios and edge cases
from app.core.io.rubric_parser import parse_rubric_to_items, ParseConfig
from app.core.models.schemas import CanonicalDoc
# Basic usage - parse any rubric structure
canonical_doc = load_canonical_doc("rubric.docx") # From Phase 2
rubric_items = parse_rubric_to_items(canonical_doc)
# Advanced configuration
config = ParseConfig(
prefer_tables=True, # Prioritize table parsing
normalize_missing_weights=True, # Equal distribution for missing weights
criterion_keywords={ # Custom criterion mapping
"accuracy": ["correct", "precise"],
"content": ["depth", "understanding"]
},
default_criterion="content"
)
items = parse_rubric_to_items(canonical_doc, config)
# Each item has normalized weights and mapped criteria
for item in items:
print(f"{item.title}: {item.weight:.1%} ({item.criterion.value})")
print(f" {item.description}")- camelot-py[cv]>=0.11.0: PDF table extraction with computer vision
- tabula-py>=2.9.0: Alternative PDF table extraction
- regex>=2024.4.28: Advanced pattern matching for text parsing
- Text Utilities: Bullet/numbered detection, weight parsing, heading/body splitting (7 tests)
- Bullet/Numbered Parsing: List-based rubrics with various weight formats (2 tests)
- Table Parsing: DOCX/PDF tables with and without headers (2 tests)
- Weight Normalization: Percentage/point conversion, missing weight handling (2 tests)
- Criterion Mapping: Keyword-based classification, custom mappings (2 tests)
- Fallback Behaviors: Unstructured text, empty documents, configuration preferences (3 tests)
- Data Integrity: Weight validation, description completeness, enum validity (3 tests)
- Graceful Degradation: External dependency handling (1 test)
Phase 5 implements complete user flows for uploading and managing rubrics, questions, and submissions through both REST API and command-line interfaces, providing full parity between programmatic and interactive access.
- FastAPI Server: Complete REST API with automatic documentation and validation
- CLI Interface: Full-featured command-line interface with rich table output
- File Upload Management: Secure file handling with sanitization and organized storage
- Metadata Inference: Auto-detection of entity metadata from filenames and content
- Type Detection: Automatic file type detection and appropriate ingestion routing
- Validation & Error Handling: Comprehensive validation with informative error messages
- Repository Integration: Seamless integration with Phase 3 storage layer
- Comprehensive Testing: 42 test cases covering API endpoints and CLI commands
- Rubrics:
POST /rubrics/upload,GET /rubrics/,GET /rubrics/{id} - Questions:
POST /questions/upload,GET /questions/,GET /questions/{id} - Submissions:
POST /submissions/upload,GET /submissions/,GET /submissions/{id} - Health Check:
GET /healthfor service monitoring
- Rubrics:
rubrics upload,rubrics list,rubrics get - Questions:
questions upload,questions list,questions get - Submissions:
submissions upload,submissions list,submissions get
- fastapi>=0.110.0: Modern async web framework with automatic validation
- uvicorn>=0.29.0: High-performance ASGI server for FastAPI
- python-multipart>=0.0.9: Multipart form data support for file uploads
- typer>=0.12.0: Modern CLI framework with rich output formatting
- httpx>=0.24.0: HTTP client for testing API endpoints
# Run API tests
pytest tests/test_api_phase5.py -v
# Run CLI tests
pytest tests/test_cli_phase5.py -v
# Run both together
pytest tests/test_api_phase5.py tests/test_cli_phase5.py -v- ✅ PDF/DOCX visual element detection
- ✅ Image and table extraction from documents
- ✅ OCR processing for visual text content
- ✅ Asset management and storage
- ✅ Visual metadata enrichment
- ✅ OpenAI GPT-4o multimodal API integration
- ✅ Image captioning with semantic understanding
- ✅ Visual content description generation
- ✅ Environment configuration management
- ✅ Token estimation and cost optimization
This phase merges rubric, question, and submission into a single structured JSON called FusionContext — ready for the LLM evaluator in Phase 9.
- ✅ FusionContext schema with all required fields for evaluation
- ✅ Builder assembles and validates rubric, question, submission
- ✅ Token estimation via tiktoken for cost management
- ✅ FusionContext JSON saved under
data/fusion/for reuse - ✅ API endpoints for fusion creation and inspection
- ✅ CLI commands for building and managing fusion contexts
- ✅ Comprehensive unit tests ensuring completeness
# Build fusion context
uv run python -m app.ui.cli_fusion build --rubric-id R1 --question-id Q1 --submission-id S1
# List all fusion contexts
uv run python -m app.ui.cli_fusion list
# View specific fusion context
uv run python -m app.ui.cli_fusion view FUSION-S1
# Validate fusion context
uv run python -m app.ui.cli_fusion validate FUSION-S1
# Get statistics
uv run python -m app.ui.cli_fusion stats# Build fusion context
POST /fusion/build
GET /fusion/build?rubric_id=R1&question_id=Q1&submission_id=S1
# Get fusion context
GET /fusion/FUSION-S1
# List all fusion contexts
GET /fusion/
# Get fusion summary
GET /fusion/FUSION-S1/summary
# Validate fusion context
GET /fusion/FUSION-S1/validate
# Get text content
GET /fusion/FUSION-S1/text
# Get statistics
GET /fusion/stats/overviewuv run pytest tests/test_fusion_builder.py -vThis phase introduces the LLM grader that evaluates each rubric item and returns strict JSON results with per-criterion scores, justifications, and evidence block IDs, plus a weighted total.
- OpenAI Integration: Uses GPT models for intelligent evaluation
- Strict JSON Output: Enforced schema compliance with repair mechanisms
- Per-Criterion Assessment: Individual evaluation of each rubric item
- Evidence Citation: Links scores to specific submission blocks
- Weighted Scoring: Computes total scores based on rubric weights
- Token Management: Efficient content chunking and cost control
- Retry Logic: Robust error handling with exponential backoff
export OPENAI_API_KEY=sk-your-key-here
export OPENAI_MODEL=gpt-4o # optional override# Evaluate a submission
uv run python -m app.ui.cli_evaluate run --rubric-id R1 --question-id Q1 --submission-id S1
# Check evaluation status
uv run python -m app.ui.cli_evaluate status --submission-id S1
# View evaluation results
uv run python -m app.ui.cli_evaluate result --submission-id S1# Start the server
uv run python -m app.api.server
# Evaluate submission
POST /evaluate?rubric_id=R1&question_id=Q1&submission_id=S1
# Get evaluation result
GET /evaluate/result/S1
# Check status
GET /evaluate/status/S1- Main evaluation pipeline that consumes FusionContext
- Per-item LLM calls with keyword-based content slicing
- Weighted total computation and result validation
- System prompts that enforce JSON-only outputs
- Structured templates for rubric and submission data
- Schema instructions for consistent formatting
- Strict JSON parsing with repair mechanisms
- Handles common LLM output formatting issues
- Fallback strategies for malformed responses
- Exponential backoff for API rate limits
- Retry logic for transient failures
- Request logging and monitoring
- Deterministic: temperature=0.0 for consistent results
- Cost Optimized: Per-criterion chunking reduces token usage
- Validated: Evidence block IDs must reference actual submission content
- Persistent: Results saved via
repo.save_eval_result() - Robust: Graceful fallbacks for API failures
uv run pytest tests/test_evaluator_phase9.py -v- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Ensure all tests pass (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues or have questions:
- Check the Issues section
- Review the test suite for examples
- Ensure your environment meets the requirements
- Verify the initialization completed successfully
- Built with modern Python best practices
- Designed for educational technology applications
- Inspired by the need for intelligent, scalable assessment tools