Black-box test harness for LLM-powered systems
Promptly is a comprehensive testing platform for LLM applications. It captures LLM interactions through HTTP endpoints, evaluates responses against expectations using both deterministic rules and LLM judges, and provides detailed test results with full trace visibility.
- Multi-Environment Testing: Test across development, staging, and production environments
- Flexible Response Mapping: Define JSONPath-based mappings to extract canonical traces from any API response format
- Deterministic & LLM-Based Evaluations: Combine regex, text matching, tool call verification with AI-powered judgement
- Test Suite Management: Organize tests with YAML import/export, Git integration, and historical tracking
- Comprehensive Reporting: View pass rates, failure reasons, latency, token usage, and cost metrics
- Background Test Execution: Queue runs and process them asynchronously with configurable concurrency
- SDK & CLI: Integrate Promptly into CI/CD pipelines with the .NET SDK and CLI tool
┌─────────────────────┐ ┌──────────────────────┐ ┌────────────────────┐
│ React Frontend │────▶│ ASP.NET Core API │────▶│ PostgreSQL DB │
│ (Material UI) │ │ (.NET 10) │ │ │
└─────────────────────┘ └──────────────────────┘ └────────────────────┘
│
│ HTTP
▼
┌──────────────────────┐
│ Python Worker │
│ (FastAPI) │
│ - LLM Judges │
│ - Mapping Proposals │
└──────────────────────┘
C# Control Plane: Authentication, CRUD APIs, deterministic evaluation, test orchestration, background worker Python Worker: LLM judges, groundedness scoring, mapping proposals via OpenAI API React Frontend: Material UI, mapping wizard, test management, results visualization Database: PostgreSQL with EF Core
- Docker & Docker Compose
- Azure OpenAI resource (or OpenAI API key)
- .NET 10 SDK (for local development - optional)
- Node.js 18+ (for local frontend development - optional)
-
Clone the repository
git clone <repository-url> cd Promptly
-
Configure Azure OpenAI (or OpenAI)
Create a
.envfile in thedockerdirectory:For Azure OpenAI (Recommended):
# Azure OpenAI Configuration - REQUIRED PROMPTLY_LLM_PROVIDER=azureopenai PROMPTLY_LLM_API_KEY=your_azure_openai_api_key PROMPTLY_LLM_AZURE_ENDPOINT=https://your-resource-name.openai.azure.com PROMPTLY_LLM_API_VERSION=2024-08-01-preview PROMPTLY_LLM_MODEL_DEFAULT=your-deployment-name
For standard OpenAI:
PROMPTLY_LLM_PROVIDER=openai PROMPTLY_LLM_API_KEY=sk-your-openai-api-key PROMPTLY_LLM_MODEL_DEFAULT=gpt-4o-mini
Database and other settings are pre-configured - see
SETUP_CHECKLIST.mdfor details. -
Start all services
cd docker docker-compose up -d -
Access the application
- Web UI: http://localhost:3000
- API: http://localhost:5000
- Swagger: http://localhost:5000/swagger
- Python Worker: http://localhost:8000
-
Initialize database
The database will be automatically migrated on first startup.
Configuration via appsettings.json or environment variables:
- ConnectionStrings:Default: PostgreSQL connection string
- JWT:Key: Secret key for JWT signing (min 32 characters)
- JWT:Issuer: JWT issuer
- JWT:Audience: JWT audience
- JWT:ExpiryMinutes: Token expiry time (default: 60)
- DATA_PROTECTION_PATH: Path for Data Protection keys persistence
- PROMPTLY_EVAL_BASE_URL: Python worker base URL
- TestRunner:PollingIntervalSeconds: Background worker polling interval (default: 5)
- TestRunner:MaxConcurrentRuns: Max concurrent test runs (default: 2)
Configuration via environment variables:
For Azure OpenAI:
- PROMPTLY_LLM_PROVIDER:
azureopenai - PROMPTLY_LLM_API_KEY: Your Azure OpenAI API key
- PROMPTLY_LLM_AZURE_ENDPOINT: Your Azure endpoint (e.g.,
https://your-resource.openai.azure.com) - PROMPTLY_LLM_API_VERSION: API version (default:
2024-08-01-preview) - PROMPTLY_LLM_MODEL_DEFAULT: Your deployment name (NOT model ID - use the name you gave the deployment in Azure AI Studio)
For standard OpenAI:
- PROMPTLY_LLM_PROVIDER:
openai - PROMPTLY_LLM_API_KEY: Your OpenAI API key (starts with
sk-) - PROMPTLY_LLM_MODEL_DEFAULT: Model ID (e.g.,
gpt-4o-mini)
Important for Azure: Use your deployment name, not the model name. If you deployed GPT-4o and named it "my-gpt4-deployment", use my-gpt4-deployment as the model default.
Configuration via environment variables:
- VITE_API_BASE_URL: Base URL for C# API (default:
http://localhost:5000)
Navigate to http://localhost:3000 and create an account.
Projects organize your testing efforts. Create a project for each application you're testing.
Define environments (dev, staging, prod) with base URLs and headers:
- Base URL:
http://localhost:5000/demo - Headers: Add authentication headers if needed (encrypted at rest)
Create an endpoint pointing to your LLM application's API:
- Path:
/chat - Method:
POST - Timeout: 30s
Use the Mapping Wizard to define how to extract canonical traces from responses:
- Provide a sample response JSON
- AI proposes a mapping spec with JSONPath expressions
- Validate the mapping against the sample
- Save as default
Organize tests into suites. Import tests from YAML:
- id: test-1
name: Greeting Test
description: Verify the assistant greets users properly
input:
messages:
- role: user
content: Hello!
expectations:
- type: contains_text
text: "Hello"
case_insensitive: true
- type: banned_text
text: "error"
case_insensitive: trueExpectation types:
- contains_text: Substring search
- banned_text: Ensure text is NOT present
- regex_match: Pattern matching
- link_pattern: Verify URLs match pattern
- tool_called: Check if specific tool was called
- tool_sequence: Verify tool call order
- llm_judge: AI-based scoring with custom rubric
- groundedness: Check response is grounded in retrieved docs
Queue a test run:
- Select environment, endpoint, and mapping spec
- Optionally tag with Git commit hash
- View live progress and results
View detailed results:
- Summary: Pass rate, latency, tokens, cost
- Per-Test Results: Status, expectations breakdown, failure reasons
- Canonical Trace: Messages, tool calls, usage, retrieved docs
- Raw Response: Original JSON for debugging
Define how to parse arbitrary JSON into canonical traces:
{
"version": 1,
"messages": {
"itemsPath": "$.choices[*].message",
"rolePath": "$.role",
"contentPath": "$.content"
},
"toolCalls": {
"itemsPath": "$.tool_calls[*]",
"namePath": "$.function.name",
"argumentsPath": "$.function.arguments"
},
"usage": {
"objectPath": "$.usage",
"promptTokensPath": "$.prompt_tokens",
"completionTokensPath": "$.completion_tokens",
"totalTokensPath": "$.total_tokens"
},
"retrievedDocs": {
"itemsPath": "$.retrieved_docs[*]",
"contentPath": "$.content",
"titlePath": "$.title",
"idPath": "$.id"
},
"fallback": {
"singleAssistantContentPath": "$.response"
}
}Install the CLI tool:
dotnet tool install --global Promptly.CliTrigger runs from CI/CD:
promptly trigger \
--base-url http://localhost:5000 \
--api-key <your-api-key> \
--suite <suite-id> \
--env <environment-id> \
--endpoint <endpoint-id> \
--mapping <mapping-id> \
--commit $(git rev-parse HEAD)
promptly wait --base-url http://localhost:5000 --api-key <your-api-key> --run-id <run-id>Exit code:
- 0: All tests passed
- 1: One or more tests failed
C# API:
cd Promptly.Server
dotnet ef database update
dotnet runPython Worker:
cd Promptly.Worker
pip install -r requirements.txt
uvicorn main:app --reload --port 8000React Frontend:
cd Promptly.Web/src/web
npm install
npm run devCreate a new migration:
cd Promptly.Server
dotnet ef migrations add <MigrationName>
dotnet ef database updateSwagger UI is available at http://localhost:5000/swagger with JWT bearer authentication.
Key endpoints:
- POST /api/auth/register: Create account
- POST /api/auth/login: Get JWT token
- POST /api/projects: Create project
- POST /api/environments: Create environment
- POST /api/endpoints/{id}/mapping/propose: AI-powered mapping proposal
- POST /api/suites: Create test suite
- POST /api/suites/{id}/tests/import: Import tests from YAML
- POST /api/runs: Queue test run
- GET /api/runs/{id}/results: Get run results
Ensure PostgreSQL is running:
docker-compose ps postgresCheck logs:
docker-compose logs postgresCheck LLM API key is set:
docker-compose exec promptly-eval printenv PROMPTLY_LLM_API_KEYView logs:
docker-compose logs promptly-evalEnsure VITE_API_BASE_URL points to the correct API URL. Check browser console for errors.
Check worker logs:
docker-compose logs promptly-server | grep TestRunWorkerServiceVerify TestRunner configuration in appsettings.json.
- Backend: ASP.NET Core 10, Entity Framework Core, PostgreSQL
- Worker: Python 3.11+, FastAPI, OpenAI SDK
- Frontend: React 19, TypeScript, Material-UI, Vite
- Auth: ASP.NET Identity, JWT, API Keys
- Security: Data Protection API for encrypted storage
- Mapping: JsonPath.Net for deterministic parsing
- Evaluation: Regex, LLM judges (OpenAI/Azure)
[Specify your license here]
[Specify contribution guidelines]
For issues and questions, please open a GitHub issue.