Prompt control and observability platform for LLMs
"To imprint a mental pattern." β Inspired by Minsky's The Society of Mind: a prompt doesn't instruct a unified intelligence, it activates a specific configuration of the model's internal society. Imprimer makes that activation measurable, comparable, and improvable.
Most prompt engineering is trial and error. Imprimer treats it as a control problem: given two prompt variants, which gives you more control over the model's output distribution?
It measures this with a Reachability Index grounded in "What's the Magic Word? A Control Theory of LLM Prompting" (Bhargava et al., 2023)βthe first rigorous analysis of prompt controllability over autoregressive models. Every evaluation is persisted; the system learns which prompts control each task most effectively, surfaced via best command and /best endpoint.
Demo: imprimer (Reflective Prompt Evolution). For heavy use, run locally: python -m demo.app or use the CLI.
An LLM is a stochastic dynamical system over token sequences. A prompt is a control input
For each generated token with logprob
-
$\tau = \log(0.1)$ β token needs β₯10% probability to be naturally reachable -
$\alpha$ β sharpness of reachable/unreachable separation
| Score | Meaning |
|---|---|
~1.0 |
Follows model's natural trajectory |
~0.6β0.8 |
Good prompt-model alignment |
~0.3β0.5 |
Partial control, fighting the model |
<0.3 |
Largely unnatural output |
Maximize semantic alignment with
Two services connected by gRPC. The proto file is the single source of truthβGo and Python share no code, only the contract. The orchestrator cycle integrates a reflective agent pattern: the evaluation node analyzes outputs and returns a feedback signal, creating the improvement loop.
| Layer | Responsibility |
|---|---|
| Go | HTTP ingress, auth, audit logging, Prometheus metrics, gRPC routing |
| Python | LLM inference (Ollama/OpenAI/HuggingFace), logprob extraction, reachability computation, reflections, Optuna/RPE optimization, injection scanning, registry persistence |
| Contract | proto/imprimer.proto β three RPCs, minimal surface |
CLI integrated for immediate use. See Imprimer CLI.
Qwen2.5:1.5b (no fine-tuning) classifying spam via Reflective Prompt Evolutionβthe system discovers the optimal prompt autonomously.
Scoring is task-aware and backend-adaptive, routing through different strategies depending on available signals (logprobs, embeddings, etc.).
imprimer optimize β searches structured linguistic mutations via spaCy dependency parsing:
- VerbMutator: root verb rewrites (
summarizeβdistill,condense) - NounMutator: primary object noun chunk rewrites
- ModalityMutator: mood shifts (imperative β directive β interrogative)
One dimension at a time across graph iterations. Optuna's TPE builds a surrogate over the mutation space.
The LLM generates its own variant prompts based on current best + verbal feedbackβopen-ended search discovering transformations spaCy cannot.
Semantic Self-Consistency (SSC): same prompt run K times at temperature > 0, measuring average pairwise semantic similarity. High SSC = reliable steering. Low SSC = too much variance.
Both paths share the same LangGraph outer loop: generator β evaluator β controller, cycling until score exceeds baseline or iteration cap.
| Step | Per iteration | 3 iterations |
|---|---|---|
| Optuna trials | 20 | 60 |
| Evaluator + Feedback | 2 | 6 |
| Total | 22 | ~66 |
| Step | Per iteration | 3 iterations |
|---|---|---|
| Variant generation | 1 | 3 |
| SSC scoring (NΓK, parallel) | 6 | 18 |
| Evaluator + Feedback | 2 | 6 |
| Total | 9 | ~27 |
| Backend | Cost | Logprobs | Notes |
|---|---|---|---|
| Ollama (local) | Free | β Full |
qwen2.5:1.5b runs on CPU |
OpenAI gpt-4o-mini |
~$0.15i / $0.60o per 1M | β Full | UI: ~$0.001β0.003 per run |
| HuggingFace Inference | Free (rate-limited) / ~$9/mo | β None | Falls back to similarity scoring |
Reduce cost: lower --trials, n_variants, ssc_runs, or --max-iterations. With Ollama, cost is zero.
- Docker Desktop
- Ollama with
qwen2.5:1.5b:ollama pull qwen2.5:1.5b - Ollama on all interfaces (required for Docker):
export OLLAMA_HOST=0.0.0.0
# Then restart Ollama from system traydocker compose up --buildGateway on :8080. Engine on :50051 (internal).
go install github.com/BalorLC3/Imprimer/gateway/cmd/imprimer@latest
# Or locally:
go install ./gateway/cmd/imprimer/Every request passes through the security layer before any LLM interaction:
- Prompt injection detection: 9 regex patterns (OWASP LLM Top 10 LLM01)
- PII detection: SSN, credit card, email flagged in audit log
- Auth middleware: Bearer token validation (
IMPRIMER_API_KEY) - Least privilege: engine container has no host write access
ISO 27001 alignment: A.9, A.12.6, A.14.2.
# Terminal 1: Python engine
cd engine && python main.py
# Terminal 2: Go gateway
go run ./gateway/cmd/main.go
# Terminal 3: CLI
imprimer evaluate --task summarize --input "..." --a "..." --b "..."# Python
python -m grpc_tools.protoc \
-I proto --python_out=engine --grpc_python_out=engine proto/imprimer.proto
# Go
mkdir -p gateway/gen
protoc -I proto \
--go_out=gateway/gen --go-grpc_out=gateway/gen \
--go_opt=paths=source_relative --go-grpc_opt=paths=source_relative \
proto/imprimer.protoimprimer ui: TensorBoard-style dashboard reading from the registry- Fine-tuning escalation: LoRA when optimizer plateaus on complex tasks
- What's the Magic Word? A Control Theory of LLM Prompting, Bhargava et al., 2023 Β· arxiv.org/abs/2310.04444
- The Society of Mind, Marvin Minsky, 1986
- OWASP Top 10 for LLM Applications Β· owasp.org
MIT




