Skip to content

Jacob-J-Thomas/Promptly

Repository files navigation

Promptly v1

Black-box test harness for LLM-powered systems

Promptly is a comprehensive testing platform for LLM applications. It captures LLM interactions through HTTP endpoints, evaluates responses against expectations using both deterministic rules and LLM judges, and provides detailed test results with full trace visibility.

Features

  • Multi-Environment Testing: Test across development, staging, and production environments
  • Flexible Response Mapping: Define JSONPath-based mappings to extract canonical traces from any API response format
  • Deterministic & LLM-Based Evaluations: Combine regex, text matching, tool call verification with AI-powered judgement
  • Test Suite Management: Organize tests with YAML import/export, Git integration, and historical tracking
  • Comprehensive Reporting: View pass rates, failure reasons, latency, token usage, and cost metrics
  • Background Test Execution: Queue runs and process them asynchronously with configurable concurrency
  • SDK & CLI: Integrate Promptly into CI/CD pipelines with the .NET SDK and CLI tool

Architecture

┌─────────────────────┐     ┌──────────────────────┐     ┌────────────────────┐
│   React Frontend    │────▶│  ASP.NET Core API    │────▶│   PostgreSQL DB    │
│  (Material UI)      │     │  (.NET 10)           │     │                    │
└─────────────────────┘     └──────────────────────┘     └────────────────────┘
                                      │
                                      │ HTTP
                                      ▼
                            ┌──────────────────────┐
                            │  Python Worker       │
                            │  (FastAPI)           │
                            │  - LLM Judges        │
                            │  - Mapping Proposals │
                            └──────────────────────┘

C# Control Plane: Authentication, CRUD APIs, deterministic evaluation, test orchestration, background worker Python Worker: LLM judges, groundedness scoring, mapping proposals via OpenAI API React Frontend: Material UI, mapping wizard, test management, results visualization Database: PostgreSQL with EF Core

Quick Start

Prerequisites

  • Docker & Docker Compose
  • Azure OpenAI resource (or OpenAI API key)
  • .NET 10 SDK (for local development - optional)
  • Node.js 18+ (for local frontend development - optional)

Running with Docker Compose

  1. Clone the repository

    git clone <repository-url>
    cd Promptly
  2. Configure Azure OpenAI (or OpenAI)

    Create a .env file in the docker directory:

    For Azure OpenAI (Recommended):

    # Azure OpenAI Configuration - REQUIRED
    PROMPTLY_LLM_PROVIDER=azureopenai
    PROMPTLY_LLM_API_KEY=your_azure_openai_api_key
    PROMPTLY_LLM_AZURE_ENDPOINT=https://your-resource-name.openai.azure.com
    PROMPTLY_LLM_API_VERSION=2024-08-01-preview
    PROMPTLY_LLM_MODEL_DEFAULT=your-deployment-name

    For standard OpenAI:

    PROMPTLY_LLM_PROVIDER=openai
    PROMPTLY_LLM_API_KEY=sk-your-openai-api-key
    PROMPTLY_LLM_MODEL_DEFAULT=gpt-4o-mini

    Database and other settings are pre-configured - see SETUP_CHECKLIST.md for details.

  3. Start all services

    cd docker
    docker-compose up -d
  4. Access the application

  5. Initialize database

    The database will be automatically migrated on first startup.

Configuration

C# Control Plane (Promptly.Server)

Configuration via appsettings.json or environment variables:

  • ConnectionStrings:Default: PostgreSQL connection string
  • JWT:Key: Secret key for JWT signing (min 32 characters)
  • JWT:Issuer: JWT issuer
  • JWT:Audience: JWT audience
  • JWT:ExpiryMinutes: Token expiry time (default: 60)
  • DATA_PROTECTION_PATH: Path for Data Protection keys persistence
  • PROMPTLY_EVAL_BASE_URL: Python worker base URL
  • TestRunner:PollingIntervalSeconds: Background worker polling interval (default: 5)
  • TestRunner:MaxConcurrentRuns: Max concurrent test runs (default: 2)

Python Worker (Promptly.Worker)

Configuration via environment variables:

For Azure OpenAI:

  • PROMPTLY_LLM_PROVIDER: azureopenai
  • PROMPTLY_LLM_API_KEY: Your Azure OpenAI API key
  • PROMPTLY_LLM_AZURE_ENDPOINT: Your Azure endpoint (e.g., https://your-resource.openai.azure.com)
  • PROMPTLY_LLM_API_VERSION: API version (default: 2024-08-01-preview)
  • PROMPTLY_LLM_MODEL_DEFAULT: Your deployment name (NOT model ID - use the name you gave the deployment in Azure AI Studio)

For standard OpenAI:

  • PROMPTLY_LLM_PROVIDER: openai
  • PROMPTLY_LLM_API_KEY: Your OpenAI API key (starts with sk-)
  • PROMPTLY_LLM_MODEL_DEFAULT: Model ID (e.g., gpt-4o-mini)

Important for Azure: Use your deployment name, not the model name. If you deployed GPT-4o and named it "my-gpt4-deployment", use my-gpt4-deployment as the model default.

React Frontend (Promptly.Web)

Configuration via environment variables:

  • VITE_API_BASE_URL: Base URL for C# API (default: http://localhost:5000)

Usage

1. Register & Login

Navigate to http://localhost:3000 and create an account.

2. Create a Project

Projects organize your testing efforts. Create a project for each application you're testing.

3. Set Up Environment

Define environments (dev, staging, prod) with base URLs and headers:

  • Base URL: http://localhost:5000/demo
  • Headers: Add authentication headers if needed (encrypted at rest)

4. Add Endpoint & Mapping

Create an endpoint pointing to your LLM application's API:

  • Path: /chat
  • Method: POST
  • Timeout: 30s

Use the Mapping Wizard to define how to extract canonical traces from responses:

  1. Provide a sample response JSON
  2. AI proposes a mapping spec with JSONPath expressions
  3. Validate the mapping against the sample
  4. Save as default

5. Create Test Suite

Organize tests into suites. Import tests from YAML:

- id: test-1
  name: Greeting Test
  description: Verify the assistant greets users properly
  input:
    messages:
      - role: user
        content: Hello!
  expectations:
    - type: contains_text
      text: "Hello"
      case_insensitive: true
    - type: banned_text
      text: "error"
      case_insensitive: true

Expectation types:

  • contains_text: Substring search
  • banned_text: Ensure text is NOT present
  • regex_match: Pattern matching
  • link_pattern: Verify URLs match pattern
  • tool_called: Check if specific tool was called
  • tool_sequence: Verify tool call order
  • llm_judge: AI-based scoring with custom rubric
  • groundedness: Check response is grounded in retrieved docs

6. Run Tests

Queue a test run:

  • Select environment, endpoint, and mapping spec
  • Optionally tag with Git commit hash
  • View live progress and results

7. Analyze Results

View detailed results:

  • Summary: Pass rate, latency, tokens, cost
  • Per-Test Results: Status, expectations breakdown, failure reasons
  • Canonical Trace: Messages, tool calls, usage, retrieved docs
  • Raw Response: Original JSON for debugging

MappingSpec Format

Define how to parse arbitrary JSON into canonical traces:

{
  "version": 1,
  "messages": {
    "itemsPath": "$.choices[*].message",
    "rolePath": "$.role",
    "contentPath": "$.content"
  },
  "toolCalls": {
    "itemsPath": "$.tool_calls[*]",
    "namePath": "$.function.name",
    "argumentsPath": "$.function.arguments"
  },
  "usage": {
    "objectPath": "$.usage",
    "promptTokensPath": "$.prompt_tokens",
    "completionTokensPath": "$.completion_tokens",
    "totalTokensPath": "$.total_tokens"
  },
  "retrievedDocs": {
    "itemsPath": "$.retrieved_docs[*]",
    "contentPath": "$.content",
    "titlePath": "$.title",
    "idPath": "$.id"
  },
  "fallback": {
    "singleAssistantContentPath": "$.response"
  }
}

CLI Tool

Install the CLI tool:

dotnet tool install --global Promptly.Cli

Trigger runs from CI/CD:

promptly trigger \
  --base-url http://localhost:5000 \
  --api-key <your-api-key> \
  --suite <suite-id> \
  --env <environment-id> \
  --endpoint <endpoint-id> \
  --mapping <mapping-id> \
  --commit $(git rev-parse HEAD)

promptly wait --base-url http://localhost:5000 --api-key <your-api-key> --run-id <run-id>

Exit code:

  • 0: All tests passed
  • 1: One or more tests failed

Development

Running Locally (Without Docker)

C# API:

cd Promptly.Server
dotnet ef database update
dotnet run

Python Worker:

cd Promptly.Worker
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

React Frontend:

cd Promptly.Web/src/web
npm install
npm run dev

Database Migrations

Create a new migration:

cd Promptly.Server
dotnet ef migrations add <MigrationName>
dotnet ef database update

API Documentation

Swagger UI is available at http://localhost:5000/swagger with JWT bearer authentication.

Key endpoints:

  • POST /api/auth/register: Create account
  • POST /api/auth/login: Get JWT token
  • POST /api/projects: Create project
  • POST /api/environments: Create environment
  • POST /api/endpoints/{id}/mapping/propose: AI-powered mapping proposal
  • POST /api/suites: Create test suite
  • POST /api/suites/{id}/tests/import: Import tests from YAML
  • POST /api/runs: Queue test run
  • GET /api/runs/{id}/results: Get run results

Troubleshooting

Database Connection Issues

Ensure PostgreSQL is running:

docker-compose ps postgres

Check logs:

docker-compose logs postgres

Python Worker Errors

Check LLM API key is set:

docker-compose exec promptly-eval printenv PROMPTLY_LLM_API_KEY

View logs:

docker-compose logs promptly-eval

Frontend Connection Issues

Ensure VITE_API_BASE_URL points to the correct API URL. Check browser console for errors.

Background Worker Not Processing Runs

Check worker logs:

docker-compose logs promptly-server | grep TestRunWorkerService

Verify TestRunner configuration in appsettings.json.

Tech Stack

  • Backend: ASP.NET Core 10, Entity Framework Core, PostgreSQL
  • Worker: Python 3.11+, FastAPI, OpenAI SDK
  • Frontend: React 19, TypeScript, Material-UI, Vite
  • Auth: ASP.NET Identity, JWT, API Keys
  • Security: Data Protection API for encrypted storage
  • Mapping: JsonPath.Net for deterministic parsing
  • Evaluation: Regex, LLM judges (OpenAI/Azure)

License

[Specify your license here]

Contributing

[Specify contribution guidelines]

Support

For issues and questions, please open a GitHub issue.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors