Synthed is a TypeScript toolkit for generating deterministic synthetic test data across finance, healthcare, EDI, telecom, and logging formats. It includes:
- A CLI for local and CI pipelines
- An API server for service-to-service generation and validation
- A registry/orchestrator model for composing repeatable generation jobs
- Optional corruption and manifests for "bad data" test suites
This document is designed as a practical, implementation-level guide for repository users and contributors.
Use Synthed when you need test fixtures that are:
- Safe: no production data exposure
- Repeatable: same
seedproduces same data - Domain-shaped: output resembles real format conventions
- Stress-ready: corruption support for validator/parser testing
Typical use cases:
- Integration testing against HL7/FIX/OFX/X12 payloads
- ETL test fixture generation
- Parser hardening and negative testing
- Validator regression checks using manifests
| Feature | AI Generators | Synthed |
|---|---|---|
| Requires Real Data | Yes (Training) | No (Schema-based) |
| Deterministic Output | Rare | Always (via Seed) |
| Intentional Corruption | No | Yes (Chaos Mode) |
| Cost | High $$$ | Free (MIT) |
Registered generators include:
apache-access-loghl7v2fixofxx12edifactswift-mtcdrsyslog-rfc5424windows-evtxfhir-r4
Registered validators include:
hl7v2fixofxx12swift-mtcdr
List available generators at runtime:
npm run dev -- list- Node.js 20+ recommended
- npm 10+ recommended
npm cinpm run buildnpm run dev -- listThe CLI entrypoint is src/cli/index.ts.
Generate one dataset quickly from a single generator.
npm run dev -- generate <generator> --count <N> --seed <S> --output <path>Key options:
--count: record count (default100)--seed: deterministic seed (default42)--output: output file path; omitted means stdout--corrupt: enable corruption injection--corrupt-rate: fraction to corrupt (0 to 1)
Examples:
npm run dev -- generate hl7v2 --count 100 --seed 42 --output output/hl7-clean.hl7
npm run dev -- generate fix --count 100 --seed 42 --corrupt --corrupt-rate 0.15 --output output/fix-bad.fixRun one or more jobs from YAML.
npm run dev -- run --config examples/jobs/full-suite.yamlThis is the best approach for repeatable suites in CI.
Validate a generated/input file using a matching validator.
npm run dev -- validate --generator hl7v2 --input output/hl7-clean.hl7 --format json --output output/hl7.validation.jsonOptions:
--format text|json|junit(defaulttext)--strictexits with code1when validation fails--suppress <RULE1,RULE2,...>suppresses selected rules
Run built-in self-tests.
npm run dev -- selftest
npm run dev -- selftest fix --format jsonCompare corruption manifest results with validator report output.
npm run dev -- manifest-check --manifest output/hl7-bad.manifest.json --report output/hl7.validation.jsonYAML configs are parsed and validated with Zod (src/orchestrator/config.ts).
Job fields:
id: unique job namegenerator: generator idrecordCount,seed,locale,extrascorrupt,corruptRate, optionalcorruptStrategiesoutput:type:stdoutorfilepath: file path when usingfilecompress: optional gzip output
manifest:type: currentlyfilepath: where injected corruption metadata is written
Minimal example:
version: "1"
seed: 42
jobs:
- id: hl7-clean
generator: hl7v2
recordCount: 100
output:
type: file
path: ./output/hl7-clean.hl7Bad-data example:
version: "1"
seed: 42
jobs:
- id: ofx-bad
generator: ofx
recordCount: 60
corrupt: true
corruptRate: 0.2
output:
type: file
path: ./output/ofx-bad.ofx
manifest:
type: file
path: ./output/ofx-bad.manifest.jsonCorruption is applied by the orchestrator corruption layer during streaming generation.
Two ways to enable:
- CLI:
--corrupt --corrupt-rate <0..1> - YAML job:
corrupt: true+corruptRate: <0..1>
Recommended rates:
- Smoke tests:
0.05 - Validator stress:
0.1to0.2 - Fuzz-like scenarios:
0.25+(expect high parse failure rates)
Current repository includes a multi-format bad-data config:
examples/jobs/bad-data.yaml
Run all bad-data jobs:
npm run dev -- run --config examples/jobs/bad-data.yamlBy convention, generated files are stored under output/.
Examples:
- Healthcare:
output/hl7-bad.hl7,output/fhir-bad.json - Financial:
output/fix-bad.fix,output/ofx-bad.ofx - Manifests:
output/*-bad.manifest.json
Inspect quickly:
# PowerShell
Get-ChildItem .\output
Get-Content .\output\hl7-bad.hl7 -TotalCount 30
Get-Content .\output\ofx-bad.ofx -TotalCount 30Note on FIX readability: FIX uses SOH (0x01) delimiters, so output appears compact in plain text viewers.
Server implementation lives at src/api/server.ts.
npm run build
node dist/api/server.js --startDefault listener:
- Host:
0.0.0.0 - Port:
3000
GET /api/v1/generatorsGET /api/v1/generators/:idPOST /api/v1/generatePOST /api/v1/validate/:idPOST /api/v1/selftest/:id
curl -X POST http://localhost:3000/api/v1/generate \
-H "content-type: application/json" \
-d '{"generator":"hl7v2","options":{"seed":42,"recordCount":10}}'curl -X POST http://localhost:3000/api/v1/validate/hl7v2 \
-H "content-type: text/plain" \
--data-binary @output/hl7-bad.hl7Error semantics:
400: payload validation error (Zod)404: unknown generator/validator500: internal server error
Determinism depends on:
- same generator id
- same
seed - same
recordCount - same corruption configuration
- same library version
Best practice:
- Store YAML job configs in version control
- Fix seed values in CI
- Keep manifest artifacts for negative test investigations
npm run lint
npm run build
npm testOptional benchmark:
npm run benchmark -- 10000Suggested CI sequence:
npm cinpm run lintnpm run buildnpm test- Generate one clean and one bad-data fixture smoke set
src/
api/ # Fastify HTTP server
cli/ # CLI command definitions
core/ # Shared interfaces, RNG, corruption, utility engines
generators/ # Domain/format-specific generators
orchestrator/ # Registry, config loading, run pipeline, sinks
validators/ # Validator implementations and manifest matching
tests/ # Vitest suites
examples/jobs/ # Reusable YAML job configurations
benchmarks/ # Throughput/benchmark scripts
- Run
npm run dev -- list - Confirm id matches exactly
- Confirm build artifacts are current (
npm run build)
- Check job
output.typeisfile - Verify output path exists or parent folder can be created
- Check command exited successfully
- Confirm matching validator for format
- Check whether corruption was enabled
- Compare manifest (
*-manifest.json) with validator report
- Convert SOH to a visible delimiter (
|) in your editor or script
- Generated data is synthetic and intended for testing
- Do not assume clinical/financial standards certification completeness
- Validate assumptions before using outputs in compliance-sensitive workflows
When adding generators/validators:
- Register in
createDefaultRegistry()(src/orchestrator/defaultRegistry.ts) - Add tests under
tests/ - Add at least one reproducible YAML sample job where relevant
- Document any new CLI/API option changes in this file