Skip to content

v1.11.0

Latest

Choose a tag to compare

@Letdown2491 Letdown2491 released this 10 Jun 01:06

Security

  • Kill switch NIP-17 command forgery fixed: the NIP-17 (gift-wrap) admin-command path only checked the unsigned inner rumor's claimed author, never verifying the signed seal. Because NIP-44 conversation keys are symmetric, an attacker could forge a gift wrap from public information alone (the signer pubkey, the admin npub, an admin relay) and execute any kill-switch command — panic, resumeall, or alive. The seal's signature is now verified (verifyEvent) and bound to the admin pubkey, and the rumor author must match the verified seal author. (The NIP-04 path was already correctly signature-verified.)
  • Kill switch command replay protection: admin commands are now rejected unless their signed timestamp is within 60 seconds of now — the since relay filter is advisory and the dedup cache is in-memory (1h TTL, cleared on restart), so a genuine admin command captured from a relay could otherwise be replayed (e.g. replaying alive to indefinitely defeat the dead man's switch). Uses the kind-4 created_at for NIP-04 and the inner rumor created_at for NIP-17 (the gift-wrap timestamp is deliberately fuzzed per spec and is not trusted).
  • Key-unlock endpoint rate-limited: POST /keys/:name/unlock is the passphrase-verification endpoint and was the only sensitive key route without rate limiting, making it a full-speed passphrase brute-force oracle against the key at rest. It now carries the same rate-limit guard as the other key operations.
  • Dashboard is no longer framable (clickjacking): the UI server now sends X-Frame-Options: DENY and Content-Security-Policy: frame-ancestors 'none' (plus X-Content-Type-Options: nosniff and Referrer-Policy: no-referrer). Since the dashboard can approve signing requests, a malicious page the operator visits could otherwise iframe it and clickjack an approval — a browser-origin attack that network isolation does not mitigate. The proxy error response no longer leaks the daemon's internal address.
  • requireAuth fails loudly instead of silently locking out: application-layer JWT login is not implemented (no endpoint issues a token), so enabling requireAuth previously 401'd every request. The daemon now refuses to start with requireAuth: true, pointing the operator to network-level access control (loopback bind, Tailscale ACLs, firewall) — the supported deployment model. docs/SECURITY.md has been corrected accordingly.
  • Stronger NIP-46 replay protection: in addition to the existing dedup cache and absolute freshness window, each connected app now has a per-sender high-water mark — a request older than the newest one already seen from that app is rejected. This survives eviction of the (bounded) dedup cache, closing the gap where an attacker could flood the cache to evict a captured request's id and replay it within the freshness window.
  • Bounded pending authorizations: the number of concurrent manual-approval authorizations is now capped. Each pending authorization holds a polling promise and timer and triggers an outbound auth_url publish, so an unauthenticated connect flood from rotating pubkeys could previously grow memory/timers and amplify relay traffic without bound. The cap is far above any legitimate concurrent load.
  • Atomic trust-level changes: granting permissions on connect and changing an app's trust level now run in a database transaction. A crash mid-update could previously leave a downgraded (e.g. paranoid) app still holding the full-trust sign_event/kind:all permission row — which is evaluated before the trust level and would keep auto-approving everything.

Reliability

  • Graceful shutdown no longer hangs: the HTTP server is created with forceCloseConnections, and shutdown has a watchdog that forces exit if teardown overruns. Previously an open dashboard's SSE connection kept fastify.close() pending, so SIGTERM never completed and the supervisor eventually SIGKILLed — skipping clean teardown (locking keys, closing the relay pool, disconnecting the DB).
  • Crashes are now visible to the supervisor: the CLI parent propagates the daemon child's exit code and forwards SIGTERM/SIGINT to it. Previously a crashed daemon left the parent exiting 0 (so systemd Restart=on-failure wouldn't fire), and in Docker the daemon child was SIGKILLed by container teardown instead of shutting down gracefully.
  • Startup failures exit non-zero: a failure during daemon startup (port in use, DB/config error) now exits the process so the supervisor restarts it, instead of being swallowed by the global rejection handler and leaving the process alive but half-initialized.
  • nostrconnect apps reconnect after a restart: per-app relay subscriptions are now re-established when a key starts, not only at connect time. Previously, after a daemon restart or lock/unlock cycle, a nostrconnect:// app whose relays weren't in the shared pool would silently stop receiving requests until manually reconnected.
  • Kill switch never stops retrying a relay: the admin-command listener now caps its reconnect delay instead of giving up after ~10 attempts (~9.5 min). A longer relay outage previously disabled the emergency kill switch on that relay for the rest of the daemon's uptime.
  • Failed migrations abort startup: the local launch script now exits on a failed migration instead of starting the daemon against a possibly-mismatched schema.
  • Kill-switch websocket leak fixed: closed admin-relay sockets are now pruned from the tracking array (and their listeners dropped) in each socket's close handler. Combined with the retry-forever change above, an unreachable admin relay would otherwise have accumulated a dead WebSocket per reconnect for the life of the process; sockets are now released as they close.
  • Per-key cache teardown on lock: locking a key now destroys its NIP-46 backend's caches (conversation-key, rate-limiter, relay) instead of only clearing them, stopping each cache's cleanup interval so a lock/unlock cycle no longer leaks a timer.

Improved

  • SQLite WAL mode: the database now runs in WAL with synchronous=NORMAL and a 5s busy timeout, so readers no longer block the writer (and vice-versa) — relevant because the better-sqlite3 calls are synchronous and sit in the request hot path (ACL lookups, per-response key-user lookup, log writes).
  • Cached per-response relay lookup: outbound responses no longer hit the database on every send to fetch a nostrconnect app's custom relays. The list is cached per app (primed at connect time, so it is never stale, and re-fetched from the database on a miss), removing one DB round-trip per response.
  • Faster inbound request handling: the NIP-44 conversation key (an ECDH derivation) is now cached per peer instead of recomputed on every request. The ECDH dominates inbound crypto (~1.9ms vs ~20µs for the symmetric decrypt), and it is constant for a given peer, so memoizing it makes the inbound path roughly signature-verify-bound rather than ECDH-bound — about 2.6× higher full-inbound throughput in benchmarks (~310 → ~800 req/s). The cache is bounded (TTL + max size) and cleared when the key is locked. Applies to request decryption, response encryption, and the nip44_encrypt/nip44_decrypt methods.
  • Relay-publish resilience under request bursts: a connected app that fires a large burst of requests (e.g. a client decrypting a whole DM backlog) could make the daemon flood the relays with responses, tripping relay rate limits (too many events, slow down) and publish timed out errors — and a few bursts could lose responses entirely. The publish path is now paced and defensive:
    • Bounded publish concurrency so a backlog can't fire hundreds of EVENT frames at the relay sockets simultaneously.
    • Per-relay cooldown: a relay that rate-limits or times out is skipped for a short window so the daemon stops hammering a relay that asked it to slow down (falling back to the full relay set if every relay is cooling down, so responses are never dropped just because of cooldown).
    • No failure amplification: when a response publish fails outright, the daemon no longer immediately publishes a second error frame (which, mid-storm, also can't get through and only deepens the congestion). Publish failures are logged instead.
    • Per-app request throttle: requests from a single connected app are paced with a token bucket (burst + sustained rate), smoothing moderate bursts and shedding extreme floods, so one client can no longer saturate the shared relays for every other app on the same key.

Build

  • Crypto throughput benchmark: added npm run bench (src/daemon/lib/inbound-crypto.bench.ts) measuring each step of the inbound path (ECDH, symmetric decrypt, Schnorr verify, full inbound, and the cached path) so the optimization is quantified and regressions are visible.
  • ACL decision benchmark: added src/daemon/lib/acl.bench.ts (also run by npm run bench) measuring the indexed SQLite lookup cost of the per-request authorization decision (cached vs cold path), confirming the authorization/DB step sits well below the crypto path on the per-request budget.

Tested

  • Conversation-key cache regression coverage: added tests guarding the cache's correctness (a hit returns the same key a fresh ECDH would, keyed per peer) and its security invariant (derived keys are dropped from memory when the backend stops / the key is locked), plus an end-to-end inbound check that the cached key decrypts requests and encrypts responses correctly.
  • Publish-resilience coverage: added unit tests for the publish concurrency semaphore, the per-app token-bucket throttle (burst, shed, delay, and refill behaviour), the relay throttle-error detection, and an end-to-end check that a failed response publish no longer triggers a second publish (no amplification).
  • Kill-switch authentication coverage: added regression tests proving a genuine admin-sealed NIP-17 command is accepted, a command sealed by a non-admin key (with an admin-claiming rumor) is rejected, and stale/replayed NIP-04 and NIP-17 commands are rejected.
  • Deterministic rate-limiter refill test: the token-bucket refill test now uses fake timers instead of a wall-clock sleep, removing a flake under full-suite load.
  • Replay watermark coverage: added tests that a request older than the newest already seen from a sender is rejected (replay), while same-timestamp/newer requests are still admitted.
  • Pending-authorization cap coverage: added tests that a normal authorization completes and releases its slot, and that new authorizations are rejected once the concurrent pending limit is reached.
  • Authorization-gate coverage: added direct tests for checkRequestPermission (unknown-client default-deny, connect→manual, revoked/suspended denial, suspension expiry, blanket and per-method deny conditions, explicit-allow overriding paranoid, and the full/reasonable/paranoid trust-level matrix) and for updateTrustLevel's transactional reconciliation (downgrade removes the full-trust grants; upgrade adds them idempotently).

Fixed

  • Hourly activity over-counted: the dashboard's 24-hour activity query compared ISO-8601 timestamps as text against a space-separated datetime('now'), which widened the window to as much as ~48 hours. It now compares on unixepoch().
  • Dashboard stats could stall during bursts: the stats-emission debounce reset its timer on every call, so sustained activity faster than the debounce interval could starve updates indefinitely. A maximum-wait cap now forces an emission during continuous activity.