PromptOps

PromptOps is a production-minded prompt versioning, evaluation, and testing platform for teams shipping AI features. It combines Git-like prompt lifecycle management, a runnable playground, batch evaluations on datasets, pluggable scoring, queue-backed execution, and an operational UI for comparing versions and promoting winners safely.

Why it matters

Shipping LLM features without versioning and evaluation creates silent regressions: prompts drift, models change, and nobody can prove which variant actually worked. PromptOps makes prompt changes reviewable, measurable, and reversible with audit trails, structured runs, and explicit promotion rules.

Key features

Email/password auth with JWT access tokens and rotating refresh sessions
Prompt registry with tags, categories, and ownership
Versioned prompts with variables, model settings, changelog, and lifecycle states
Datasets and test cases with optional expectations (keywords, JSON schema, reference text)
Playground runs with compiled preview, async execution via BullMQ, persisted runs
Batch evaluations across many cases and prompt versions with progress tracking
Pluggable evaluators (keyword, regex, JSON schema, similarity, heuristic, LLM-judge adapter)
Manual review scores stored per case result
Experiment views with charts and per-case inspection
Single active version per prompt with promotion history and audit logs
Health, readiness, and liveness endpoints; structured logging; centralized API errors

Architecture (ASCII)

                    +------------------+
                    |   Next.js (web)  |
                    +--------+---------+
                             |  REST + JWT
                             v
+----------------+ +-------+--------+ +----------------+
|     Redis      <----+  Fastify API   +---->| PostgreSQL    |
| (BullMQ broker)| +-------+--------+     |  (Prisma)      |
+-------+--------+ | +----------------+
        ^                     |
        |            +--------v---------+
        +------------+  Worker service |
                     |  (BullMQ cons.)   |
                     +--------+---------+
                              |
 +--------v---------+
                     | OpenAI-compatible|
                     | provider (HTTP)  |
                     +------------------+

Monorepo layout

apps/api          Fastify REST API (/api/v1)
apps/worker       BullMQ workers (prompt-run, evaluation-run, evaluation-case)
apps/web          Next.js App Router UI (shadcn/ui, Recharts)
packages/db       Prisma schema + client export
packages/shared   Template compilation, diffs, promotion helpers, scoring utils
packages/evals    Evaluator implementations + runner
packages/ai       OpenAI-compatible HTTP provider abstraction
packages/config   Zod-backed env loading + pagination helpers
packages/logger   Pino logger factory
packages/types    Shared API typing helpers

Prerequisites

Node.js 20+
pnpm 9+
Docker (optional, for compose stack)

Setup

cp .env.example .env
pnpm install
docker compose up -d postgres redis
pnpm db:generate
pnpm db:migrate
pnpm dev

API: http://localhost:4000
Web: http://localhost:3000

Scripts

Script	Purpose
`pnpm dev`	API + worker + web concurrently
`pnpm build`	Production build for all packages
`pnpm lint` / `pnpm typecheck`	Quality gates
`pnpm test`	Unit tests (Vitest)
`pnpm test:e2e`	Playwright (install browsers first)
`pnpm db:migrate`	Prisma migrate (dev)
`pnpm db:migrate:deploy`	Prisma migrate (CI/prod)
`pnpm db:generate`	Regenerate Prisma client

Environment variables

See .env.example. Critical values:

DATABASE_URL, REDIS_URL
JWT_ACCESS_SECRET, JWT_REFRESH_SECRET (each ≥ 32 characters)
OPENAI_API_KEY / OPENAI_BASE_URL for real model calls (worker + playground)
NEXT_PUBLIC_API_URL for the browser-facing API base URL
CORS_ORIGIN (comma-separated allowed origins)

Database migrations

Prisma migrations live in packages/db/prisma/migrations. For a fresh database:

pnpm db:migrate:deploy

Testing strategy

Unit tests: template compilation, weighted scoring, promotion rules, diff helpers, password hashing, JWT signing, evaluator primitives (packages/shared, packages/evals, apps/api).
Integration tests: run against a real Postgres + Redis instance (see CI workflow). Extend apps/api with fastify.inject suites wired to DATABASE_URL when you add full HTTP integration coverage.
E2E: Playwright smoke under apps/web/e2e (pnpm test:e2e after pnpm exec playwright install).

API surface

Base path: /api/v1

Auth: register, login, refresh, logout, me
Prompts + versions: CRUD, clone, promote, diff
Datasets + cases: CRUD
Playground run, runs list/detail
Evaluations: create, detail, results pagination, manual review
System: /health, /live, /ready
Dashboard aggregate: /stats

All list endpoints accept pagination (page, pageSize, sort, order, q where applicable).

Evaluation engine

Evaluators implement a small interface: they accept structured context (output, case input, expectations) and return weighted score components. runEvaluators aggregates with configurable weights stored on each EvaluationRun in evaluatorConfig. Adding a new evaluator means a new module under packages/evals plus registration in the worker case processor.

Sequence flows

Create prompt version

sequenceDiagram
  participant U as User
  participant API as Fastify API
  participant DB as Postgres
  U->>API: POST /prompts/:id/versions
  API->>DB: insert PromptVersion + variables
  API->>DB: audit VERSION_CREATE
  API-->>U: 201 + version payload

Run evaluation job

sequenceDiagram
  participant U as User
  participant API as API
  participant Q as Redis/BullMQ
  participant W as Worker
  participant P as Provider
  U->>API: POST /evaluations
  API->>Q: enqueue evaluation-run
  API-->>U: 201 + run id
  W->>Q: fan out evaluation-case jobs
  W->>P: completions per case/version
  W->>W: run evaluators + persist EvaluationCaseResult

Score outputs

sequenceDiagram
  participant W as Worker
  participant E as Evaluators
  participant DB as Postgres
  W->>E: runEvaluators(...)
  E-->>W: per-evaluator scores + weighted total
  W->>DB: upsert EvaluationCaseResult (autoScores, total)

Promote version

sequenceDiagram
  participant U as User
  participant API as API
  participant DB as Postgres
  U->>API: POST /prompt-versions/:id/promote
  API->>DB: transaction demote other ACTIVE, set target ACTIVE
  API->>DB: insert PromotionHistory + audit VERSION_PROMOTE
  API-->>U: updated version

Docker

docker compose up --build starts Postgres, Redis, API, worker, and web. Run migrations inside the API container on first boot:

docker compose run --rm api sh -lc "pnpm db:migrate:deploy"

Security notes

Passwords hashed with bcrypt; refresh tokens stored hashed (SHA-256) server-side
JWT secrets must be strong and unique per environment
Rate limiting + request size limits on the API
Structured logging redacts common secret-bearing fields
Provider keys belong in environment or a future secrets vault—never in prompts or datasets

Scaling considerations

Horizontally scale apps/worker behind the same Redis URL; BullMQ coordinates concurrency
Read replicas can back analytics-style queries (dashboard aggregates) as traffic grows
Partition evaluation case jobs by tenant or prompt for noisy-neighbor isolation
Move provider calls to dedicated inference gateways for token accounting and policy enforcement

Tradeoffs

Postgres JSON fields for flexible evaluator payloads instead of over-normalized score tables—fast to ship, easy to query for v1, can be split later
OpenAI-compatible HTTP keeps the dependency surface small; native SDKs can wrap the same interface later
LocalStorage tokens in the web app simplify the portfolio demo; production would move access tokens to HTTP-only cookies

Future improvements

Multi-tenant workspaces and RBAC
Encrypted provider credential storage per user/team
Real-time evaluation progress via WebSocket or SSE
Golden-run baselines and automatic regression alerts
Durable idempotency keys for provider calls

Screenshots

Add screenshots under docs/screenshots/ (dashboard, evaluation detail, playground) when you capture them.

CI

GitHub Actions workflow (.github/workflows/ci.yml) installs dependencies, applies migrations to a service container database, runs lint, typecheck, tests, and production builds.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
apps		apps
packages		packages
.DS_Store		.DS_Store
README.md		README.md
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PromptOps

Why it matters

Key features

Architecture (ASCII)

Monorepo layout

Prerequisites

Setup

Scripts

Environment variables

Database migrations

Testing strategy

API surface

Evaluation engine

Sequence flows

Create prompt version

Run evaluation job

Score outputs

Promote version

Docker

Security notes

Scaling considerations

Tradeoffs

Future improvements

Screenshots

CI

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PromptOps

Why it matters

Key features

Architecture (ASCII)

Monorepo layout

Prerequisites

Setup

Scripts

Environment variables

Database migrations

Testing strategy

API surface

Evaluation engine

Sequence flows

Create prompt version

Run evaluation job

Score outputs

Promote version

Docker

Security notes

Scaling considerations

Tradeoffs

Future improvements

Screenshots

CI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages