ModelTriage is an LLM decision and verification layer that intelligently routes prompts to the most appropriate model and optionally runs multiple models in parallel for comparison. Instead of guessing which model to use, ModelTriage analyzes your prompt and explains why it selected a particular model (e.g., analytical tasks get routed to quality-focused models, code tasks to code-specialized models). Verify Mode allows side-by-side comparison of 2-3 models with automatic diff analysis to highlight agreements, disagreements, and conflicting assumptions. The system streams responses progressively using Server-Sent Events (SSE) for a responsive, real-time experience.
This is the MVP (Minimum Viable Product) implementation. The following features are fully implemented:
- Smart rules-based routing with human-readable explanation
- Real-time SSE streaming (no buffering)
- Loading states and partial output preservation
- Model metadata (latency, tokens, provider)
- Cancel functionality
- Error handling with "Try again" action
- Clear button to reset UI
- Input validation (4,000 character max)
- Toggle to enable multi-model comparison (default: OFF)
- Parallel execution of 2-3 models simultaneously
- Side-by-side streaming panels (each model streams independently)
- Per-panel error isolation (one failure doesn't affect others)
- Diff summary showing agreement, disagreement, omissions, and conflicts
- Cost warning displayed only when Verify Mode is ON
- localStorage persistence for Verify Mode settings and last prompt
- Every request shows which model was selected and why
- Priority order: analytical → code → creative → long prompts → short prompts → general (fallback)
- Example: "Compare React and Vue" routes to
mock-quality-1because it's an analytical task
- Automatically compares outputs from multiple models in Verify Mode
- Highlights:
- Agreement (what all models agree on)
- Disagreement (where models differ)
- Omissions (what some models include that others don't)
- Conflicting assumptions (different foundational approaches)
The following are explicitly out of scope for MVP v1:
- ❌ Real LLM providers (OpenAI, Anthropic, Google) - using MockProvider only
- ❌ Database persistence (no user accounts, no saved sessions)
- ❌ Feedback/rating system
- ❌ Rate limiting UI
- ❌ Retry logic (errors require manual "Try again")
- ❌ Authentication or user management
- ❌ Cost tracking or billing
- ❌ Advanced diff features (semantic analysis, syntax highlighting)
- ❌ Model performance benchmarking
- ❌ Export/share functionality
- Node.js 18+ and npm
-
Install dependencies:
npm install
-
Run the development server:
npm run dev
-
Open the app:
- Navigate to http://localhost:3000 in your browser
Run unit tests:
npm run test:mock # Test MockProvider
npm run test:routing # Test routing logicRun integration tests (requires dev server running):
npm run test:stream # Test streaming APIModelTriage is ready for zero-config deployment on Vercel.
-
Import your repository:
- Go to vercel.com
- Click "New Project"
- Import your Git repository
-
Configure (no environment variables needed):
- Framework Preset: Next.js (auto-detected)
- Build Command:
npm run build(default) - Output Directory:
.next(default) - Environment Variables: None required for MVP
-
Deploy:
- Click "Deploy"
- Wait for build to complete (~2-3 minutes)
After deployment, test the following:
✅ Single-Answer Mode:
- Enter a prompt (e.g., "Explain React hooks")
- Verify response streams progressively (not all at once)
- Check routing explanation displays correctly
- Verify metadata shows (model, latency, tokens)
✅ Verify Mode:
- Enable Verify Mode toggle
- Select 2 or 3 models
- Enter a prompt
- Verify both/all panels stream independently
- Check diff summary appears after streaming completes
✅ Error Handling:
- Test empty prompt (should show validation)
- Test prompt > 4,000 characters (should show error)
- Cancel a streaming request (should preserve partial output)
Runtime:
- The
/api/streamendpoint uses Node.js runtime (not Edge) for optimal SSE streaming - This is configured in
src/app/api/stream/route.tswithexport const runtime = "nodejs"
MockProvider:
- The app uses MockProvider by default (zero external API calls)
- No API keys or secrets required
- All responses are deterministic and work offline
Environment Variables:
- None required for MVP
- Future flags (
USE_LIVE_PROVIDERS,ENABLE_DB_WRITES) are not yet implemented
Streaming:
- SSE streaming works out-of-the-box on Vercel
- No additional configuration needed
- Chunks are not buffered (progressive rendering confirmed)
Limitations:
- Serverless function timeout: 10 seconds on free tier (sufficient for MockProvider)
- Concurrent executions: 100 on free tier (ample for testing)
Problem: Streaming appears buffered
- Cause: Browser caching or CDN edge caching
- Solution: Hard refresh (Cmd+Shift+R / Ctrl+Shift+R)
- Verification: Open DevTools → Network → check EventStream tab
Problem: Build fails with TypeScript errors
- Cause: Type errors in code
- Solution: Run
npm run buildlocally to identify issues - Fix: Run
npm run lintand resolve errors
Problem: 404 on API routes
- Cause: Incorrect file structure
- Solution: Verify
src/app/api/stream/route.tsexists - Verification: Check deployment logs for file tree
Problem: Application crashes
- Cause: Missing dependencies
- Solution: Verify
package.jsonincludes all dependencies - Check: Deployment logs for "Module not found" errors
ModelTriage uses Server-Sent Events (SSE) to stream LLM responses in real-time:
- Client sends prompt →
POST /api/streamwith prompt text - Server routes request → Rules-based router selects appropriate model
- Provider streams chunks → MockProvider generates response chunks asynchronously
- SSE delivers chunks → Server forwards chunks to client as
data: {...}events - UI renders progressively → Client appends each chunk to the display in real-time
- Metadata sent on completion → Final event includes latency, tokens, and cost
Key benefits:
- No buffering delays (chunks appear immediately)
- Partial output preserved if stream is cancelled or errors
- Clean stream closure on completion
- Multiple models can stream in parallel (Verify Mode)
When Verify Mode is enabled, the workflow changes:
- Client sends prompt with model list →
POST /api/streamwithmodels: ["model-1", "model-2"] - Server starts parallel streams → Each model gets its own provider instance
- Events are multiplexed → SSE events include
modelIdto identify the source panel - Panels stream independently → Each panel updates as its model streams
- Error isolation → If one model fails, others continue (failed panel shows error card)
- Diff analysis runs after completion → Compares successful outputs to generate summary
Key benefits:
- See how different models approach the same prompt
- Identify which model provides the most complete or accurate answer
- Catch hallucinations or omissions by comparing outputs
- Understand trade-offs between speed, quality, and cost
- Framework: Next.js 15 (App Router)
- Language: TypeScript
- Styling: Tailwind CSS
- Runtime: Node.js (SSE streaming)
- Testing: Jest + ts-jest
modeltriage/
├── src/app/ # Next.js App Router
│ ├── api/stream/ # SSE streaming endpoint
│ ├── page.tsx # Main UI with Verify Mode
│ ├── layout.tsx # Root layout
│ └── globals.css # Global styles
├── lib/ # Core library modules
│ ├── providers/ # Provider interface + MockProvider
│ ├── routing/ # Rules-based router
│ └── diff/ # Diff analyzer for Verify Mode
├── __tests__/ # Unit tests
├── docs/ # Technical documentation
├── .specify/ # Product specifications (source of truth)
└── package.json # Dependencies and scripts
Provider Interface (lib/providers/):
- Defines the contract for all LLM providers
MockProviderimplements this interface for development- Real providers (OpenAI, Anthropic) will implement the same interface
Router (lib/routing/):
- Rules-based model selection
- Returns model name, reason, and confidence
- Prioritizes analytical intents over code keyword matches
Diff Analyzer (lib/diff/):
- Compares outputs from multiple models
- Identifies agreement, disagreement, omissions, and conflicts
- Runs client-side to avoid blocking streaming
Streaming API (src/app/api/stream/):
- Single endpoint for both single-answer and Verify Mode
- Validates input (prompt length, model count)
- Streams SSE events with proper multiplexing for Verify Mode
- Per-model error isolation (uses
Promise.allSettled)
The following environment variables are reserved for future use but do NOT currently work:
When implemented, this will:
- Enable OpenAI, Anthropic, and Google providers
- Require API keys in environment variables
- Incur real API costs
Current behavior: Only MockProvider is available regardless of this flag.
When implemented, this will:
- Enable session persistence to database
- Save prompts, responses, and user ratings
- Require database connection string
Current behavior: No database writes occur regardless of this flag. Only localStorage is used for UI settings.
By default, the application uses MockProvider to ensure:
- ✅ No accidental API costs during development
- ✅ Deterministic, reproducible testing
- ✅ Offline development capability
- ✅ Fast response times
This project is built strictly according to specifications in .specify/:
product.md- Product definition and scopeconventions.md- Technical conventions and limitsuser-stories.md- User stories and acceptance criteriarequirements.md- Functional requirements
See docs/ for detailed technical documentation:
architecture.md- System architecture, folder structure, SSE event contract, and MockProvider rationaledevelopment.md- Development commands, workflow, and troubleshooting guidedeployment-checklist.md- Vercel deployment verification and post-deployment testing
streaming-api.md- SSE endpoint referenceverify-mode.md- Verify Mode implementation detailsrouting.md- Routing logic and priority rulespersistence.md- localStorage usageui-states.md- Empty, loading, and error statesstreaming-controls.md- Control locking and cancel behaviorexecution-correctness.md- Concurrent run prevention and error isolationhardening-summary.md- Robustness improvements overview