High-performance, secure authentication service built with Go, following the Jedi Architecture principles.
- Blazing Fast Registration: Best recent local registration-only benchmark reached ~41k RPS; the current tuned defaults deliver stable mid-30k RPS on full reset-and-profile runs.
- Asynchronous Password Hardening: Background workers upgrade weak hashes to Argon2id (OWASP recommended) without affecting user-facing latency.
- High-Performance Web Framework: Powered by Atreugo (fasthttp-based) with Prefork mode enabled.
- Hot Path Optimized for Writes: Registration uses real bulk insert into
usersplus queue enqueue intopassword_upgrade_queue. - Worker Isolation: The hardening worker runs with a smaller DB pool and tighter CPU/memory limits to reduce contention with the API.
- Secure Token Management: Uses Paseto (Platform-Agnostic Security Tokens) instead of JWT for improved security.
- Clean Architecture: Strict separation of concerns (Domain, Application, Infrastructure, Delivery).
- Database Integrity: Schema-first approach with sqlc and golang-migrate.
- Observability: Structured JSON logging with zerolog and built-in pprof for profiling.
- Language: Go 1.25+
- Web Framework: Atreugo (High performance)
- Database: PostgreSQL 16
- Token: Paseto V2
- Hashing: Argon2id + SHA-256 (Fast Path)
- Migrations: golang-migrate
- SQL Gen: sqlc
- Load Testing: k6
- Request: User sends registration data.
- Fast Path: API computes a fast SHA-256 hash (
v1$...) and batches registration requests. - Bulk Write: The batcher does a real bulk insert into
usersand enqueues weak hashes intopassword_upgrade_queue. - Response: The API returns the created user immediately; password hardening stays off the request path.
- Hardening: A background
workerpollspassword_upgrade_queue, re-hashes withArgon2id(64 MiB, 1 iteration, 4 threads), writes the strong hash intouser_passwords, and removes the queue row. - Smart Login: Login looks up the user by
emailand verifies either the strong hash or the pending weak hash, depending on upgrade state.
- Docker & Docker Compose
docker compose up --build -dcurl -sf http://localhost:8080/health# k6 does NOT auto-start on `docker compose up`; it is kept out of the default stack
# Registration stress test used in recent profiling runs
docker compose run --rm -v $(pwd)/load-test-reg-only.js:/load-test.js k6 run --vus 500 --duration 20s /load-test.js
# One-shot reset + rebuild + eBPF/k6 profile run with artifacts
./scripts/profile_k6.sh fullSERVER_PREFORK=trueAPP_DB_POOL_MAX_CONNS=128POSTGRES_MAX_CONNECTIONS=256POSTGRES_SHARED_BUFFERS=512MBPOSTGRES_CHECKPOINT_TIMEOUT=15minPOSTGRES_CHECKPOINT_COMPLETION_TARGET=0.95POSTGRES_WAL_BUFFERS=64MBPOSTGRES_EFFECTIVE_IO_CONCURRENCY=32PGBOUNCER_DEFAULT_POOL_SIZE=100REGISTRATION_BATCH_SIZE=100REGISTRATION_BATCH_WAIT=10ms- worker isolation remains enabled (
WORKER_DB_POOL_MAX_CONNS=4,WORKER_GOMAXPROCS=1,WORKER_GOMEMLIMIT=256MiB)
Recent measured results on local Docker Compose runs:
| Scenario | Registration RPS | Avg Latency | p95 Latency |
|---|---|---|---|
| Before the hot-path fixes (worker contending with API) | ~3,875.8/s | n/a | n/a |
| Clean-slate optimized stack | 34,023.7/s | 14.56ms | 21.89ms |
After dropping DB-level username uniqueness |
34,557.6/s | 14.32ms | 20.67ms |
Best measured sweep peak after CopyFrom(users) plus tuned API DB pool (pool_max_conns=128) |
41,227.2/s | 11.87ms | 22.41ms |
After reducing Postgres max_connections to 256 (with API pool 128) |
35,156.0/s | 14.05ms | 20.53ms |
Current tuned defaults incl. heavy Postgres combo, confirmatory full run |
35,170.0/s | 14.03ms | 20.89ms |
| Same setup under eBPF profiling | 31,975.0/s | 15.38ms | 24.75ms |
Notes:
- Numbers above are from registration-only
k6runs (500 VUs,20s) against the local Docker Compose stack. - Current tuned default for the API path is
APP_DB_POOL_MAX_CONNS=128; larger defaults like500reduced throughput noticeably in the tuning sweep. - A later confirmatory A/B still kept
128ahead of500(35,885.0/svs33,456.5/s), even though absolute numbers varied between runs. - Current tuned Postgres default is
POSTGRES_MAX_CONNECTIONS=256; in isolated A/B it beat the previous1000setting (35,156.0/svs34,339.9/s). - Additional
shared_buffers/checkpoint_timeoutexperiments did not beat the current baseline, so defaults remain512MBand15min. - Heavy Postgres frontier tuning was fixed to
checkpoint_completion_target=0.95,wal_buffers=64MB, andeffective_io_concurrency=32; confirmatory A/B showed a repeatable gain over the prior baseline, though local Docker Desktop runs still show visible variance. emailremains unique and is used for login lookup.usernameis currently not unique at the DB level; removingusers_username_keyreduced write cost on the registration path.
.
├── cmd/
│ ├── api/ # Entry point for the Atreugo API
│ └── worker/ # Entry point for the Hardening Worker
├── internal/
│ ├── auth/ # Auth domain logic
│ │ ├── delivery/ # HTTP handlers & middleware
│ │ ├── repository/ # Database interactions
│ │ ├── service/ # Business logic (Usecases)
│ │ └── token/ # Paseto & Hashing implementation
│ ├── config/ # Configuration management
│ └── db/ # Generated sqlc code
├── migrations/ # SQL migration files
├── query.sql # SQL queries for sqlc
└── sqlc.yaml # sqlc configuration
- Paseto eliminates common JWT pitfalls (algorithm confusion, etc.).
- Argon2id is memory-hard and side-channel resistant.
- Prefork isolation: each worker process handles its own memory space.
- Current trade-off: login identity is
email;usernameis currently treated as display data, not as a unique login identifier.