An AI bookkeeping platform for small businesses — autonomous, multi-tenant, and built for speed.
This repo is being rebuilt in phases. Each phase lands as a clean, pushable slice so progress is easy to follow.
- 📥 Reads invoices — pulls text out of PDFs and images, checks the numbers, and sorts them.
- 🏦 Matches bank transactions — lines up what’s in your bank with what’s in your books.
- 📊 Writes financial reports — P&L, balance sheet, cash flow, aging — on demand.
- 💳 Pays bills — approves invoices and kicks off payments (Stripe, ACH).
- 🔄 Syncs with tools you already use — QuickBooks, Xero, Plaid.
- 🛡️ Keeps every business separate — strict multi-tenant isolation at the database layer.
- 🧾 Logs everything — full audit trail for compliance.
- 🟡 Bun + TypeScript (strict mode) — backend runtime and language
- ⚡ Hono — small, fast HTTP framework
- 🐘 PostgreSQL — with Row-Level Security (RLS) and Role-Based Access Control (RBAC)
- ⚛️ React + Vite + Tailwind + shadcn/ui — the dashboard
- 🧠 DeepSeek (primary LLM) + Google Document AI (OCR) + OpenAI SDK (fallback)
- 💰 Plaid, Stripe, QuickBooks, Xero — external integrations
- 🧑
✈️ 110 AI agents — 1 CEO, 9 orchestrators, 100 workers (hierarchical, skill-driven)
- 🧪 Copy the env template and fill in your keys:
cp .env.example .env(example file coming in a future phase) - 📦 Install dependencies:
bun install - 🗄️ Set up the database:
bun run db:setup - 🌱 Seed demo data:
bun run demo:quick ▶️ Start the API:bun run dev- 🖥️ Start the dashboard (in another terminal):
bun run dashboard:dev
- API runs on port 4004 by default (
PORTenv var to override). - Dashboard runs on port 3000.
This is a multi-tenant system. Company A’s invoices must never, ever, be visible to Company B. We enforce that in two places at once:
- 🛡️ In PostgreSQL — every tenant-scoped table has Row-Level Security (RLS) policies tied to session variables.
- 🧭 In the app layer — every query filters by
tenant_idexplicitly. Belt and braces.
The RLS plumbing is the riskier of the two because it depends on those session variables being set correctly on every single request. Phase P0-1 (below) fixes a real leak in that plumbing.
Each phase is a small, reviewable PR-sized change. One commit per phase, pushed to this repo.
| Phase | Status | Summary |
|---|---|---|
| P0-1 — Tenant context hardening | ✅ Done | Transaction-scoped RLS session vars. Fixes pool bleed + SET injection. |
| R — Drop "ClawKeeper" brand | ✅ Done | Single name everywhere: TransactionWonder. CEO agent ID is now ceo. |
| P0-2 — RLS isolation test suite | ✅ Done | 7 integration tests prove tenant A can’t read B, viewer can’t write, WITH CHECK blocks cross-tenant INSERTs, super_admin can. |
| P0-3 — CEO → orchestrator real delegation | ✅ Done | delegate_to() now actually calls the orchestrator via agent_runtime and propagates errors. |
| P0-4 — AP Lead → worker dispatch | ✅ Done | AP Lead routes each capability to the matching worker via agent_runtime. Reference impl for the other 8 orchestrators. |
| P0-5 — Skill executor MVP | ✅ Done | src/skills/ with registry, executor, Zod-validated input/output, PII redaction hook, plus invoice-processor and payment-gateway handlers. |
| P0-6 — Task timeout | ✅ Done | Promise.race wraps every execute() with a per-agent timeout (default 30s) so hung LLMs don’t block requests. |
| P0-7 — Memory store tenant context | ✅ Done | New withMemoryStore() helper runs memory operations inside withTenantContext so RLS GUCs fire. |
| P1-1 — Shared HTTP wrapper | ✅ Done | src/integrations/http.ts with retry + CB + rate-limit awareness. LLM client now uses it. |
| P1-12 — Fail-loud decomposition | ✅ Done | LLM task decomposition throws TaskDecompositionError instead of silently collapsing to "generate report". |
| P1-8 — Audit log persistence | ✅ Done | log_audit() now INSERTs into agent_runs via withTenantContext instead of console.log. |
| P1-9 — Parallel DAG | ✅ Done | execute_plan processes tasks level-by-level, with Promise.all inside each level. New topological_levels() helper. |
| P1-10 — Orchestration retry | ✅ Done | Each task wrapped in retryWithBackoff with exponential backoff, full jitter, and classifier that skips domain-validation errors. |
| P1-11 — PII redaction on LLM path | ✅ Done | llm.complete() redacts PII by default (redact_pii: true). Opt-out for callers that pre-redact. |
| P1-5 — Remaining orchestrators dispatch | ✅ Done | CFO, AR Lead, Reconciliation, Compliance, Reporting, Integration, Data/ETL, Support all route via shared _dispatch.ts helper. |
| P1-6 — Remaining 6 skills | ✅ Done | document-parser, bank-reconciliation, compliance-checker, financial-reporting, data-sync, audit-trail — all 8 skills now registered and callable via invokeSkill(). |
| P1-4 — Stripe idempotency wired | ✅ Done | StripeClient.request accepts + forwards Idempotency-Key; payment-gateway passes its deterministic key; vendor_name removed from Stripe metadata. |
| P1-2 — OAuth token persistence | ✅ Done | oauth_tokens table (RLS-enabled) + TokenManager with AES-256-GCM encryption and auto-refresh via pluggable refreshers. |
| P1-3 — Webhook router | ✅ Done | /webhooks/stripe verifies Stripe-Signature HMAC + timestamp window; /webhooks/plaid accepts signed events (JWKS verification tracked for P2). |
| P1-7 — JWT refresh + revocation | ✅ Done | 1h access + 30d refresh tokens with jti claim; /auth/refresh rotates; /auth/logout revokes; middleware checks revoked_tokens blacklist. |
| P2 batch | ✅ Done | .env.example, bcrypt 10→12, redundant invoice RLS policies removed, port warning fixed, Xero demo mode toggle, Opik span metadata. |
| Remaining P2 (deferred) | 🗓️ Deferred | Flesh out 100 worker AGENT.md files (P2-8); Document AI service-account auth (P2-14); branded types for IDs (P2-15); audit-trigger column-level PII redaction (P2-11); optional pgcrypto defense-in-depth (P2-12). |
- 📝
.env.exampleat the repo root — every required var documented, with a clear generator hint forOAUTH_ENCRYPTION_KEY. Future setup just needscp .env.example .env. - 🔐 bcrypt work factor 10 → 12 in the demo seeder. Modern baseline for a financial system; adds ~50ms per hash.
- 🧹 Redundant invoice RLS policies removed (
invoice_viewer_read,invoice_accountant_write,invoice_accountant_update). They duplicated the genericinvoice_tenant_isolation+ app-layer role checks. Single responsibility: RLS enforces tenant isolation, the API enforces roles. - ⚓ Port inconsistency resolved. The 9100 "EXPECTED_PORT" warning that fired for everyone on the default 4004 is gone.
4004is canonical;PORTenv overrides. - 🧬 Xero demo-mode toggle (
XERO_DEMO_MODE=1) logs a visible banner at module load so operators can tell demo from prod without reading auth logs. - 📡 Opik spans now carry metadata.
record_llm_usageandrecord_agent_resultstopped being no-ops and attach metrics to the span + emit structured log lines.
A handful of bigger P2 items aren’t in this commit because they’re not small hygiene — they’re their own phases:
- P2-8: Flesh out the 100 worker
AGENT.mdfiles with per-worker capabilities/instructions (100 files). - P2-14: Real service-account auth for Document AI (needs
google-auth-libraryand reconciling the client type surface). - P2-15: Branded
AgentId/TenantId/TaskIdtypes across the codebase. - P2-11: Column-aware PII redaction inside the
audit_logtrigger. - P2-12: pgcrypto-layer encryption on top of the app-level AES from P1-2 (defense-in-depth; app encryption is already authenticated).
- 🪝 P1-3:
/webhooks/*router mounted before the tenant-context middleware (webhooks are unauthenticated inbound; authenticity is per-provider signature).- Stripe: verifies
Stripe-Signature(HMAC-SHA256 oftimestamp.bodyagainstSTRIPE_WEBHOOK_SECRET), rejects outside a 5-minute window, timing-safe compare. - Plaid: checks for
Plaid-Verificationheader; full JWKS-based JWT verification is tracked in P2 (needs key cache).
- Stripe: verifies
- 🔐 P1-7: Refresh tokens + revocation.
- Migration
005_create_revoked_tokens.sql— blacklist keyed byjti, no RLS (lookup happens pre-tenant-context). auth.tsnow issues a pair on login: 1h access token + 30d refresh token, both carryingjti.POST /auth/refreshvalidates the refresh JWT, rejects if revoked, rotates (old jti goes on the blacklist), issues a fresh pair.POST /auth/logoutrevokes whatever token(s) were presented.server.tsmiddleware rejects refresh tokens used as bearer access tokens, and checks therevoked_tokensblacklist before opening the tenant transaction.
- Migration
- 💳 P1-4: Stripe idempotency.
StripeClient.request()now accepts and forwards anIdempotency-Keyheader.createPaymentIntent()has a newoptions.idempotencyKeyparameter.payment-gatewayskill (P0-5) passes its deterministichash(tenant_id | invoice_id | amount)key through, so a retry can never double-charge. - 🔒 PII removed from Stripe metadata.
vendor_nameno longer rides along; only tenant/invoice IDs go to Stripe metadata, which isn’t treated as private. - 🗝️ P1-2: OAuth token persistence.
- 🆕
db/migrations/004_create_oauth_tokens.sql—oauth_tokenstable with RLS (tenant_id = current_setting('app.current_tenant_id')), unique on (tenant, provider, realm). - 🆕
src/integrations/crypto.ts— AES-256-GCM helpers usingOAUTH_ENCRYPTION_KEY(base64, 32 bytes). Authenticated encryption; tampered ciphertext throws. - 🆕
src/integrations/token-manager.ts—TokenManagerwithget/upsert/revoke/registerRefresher. Auto-refreshes tokens within 60s of expiry via provider-specific refreshers; falls back to stale token on refresh failure.
- 🆕
- 🧭 How it plugs in. QuickBooks / Xero clients register their own refreshers on module load; their existing
exchangeCodeForTokens()+refreshAccessToken()methods become the refresh impl. Wiring those registrations is small and will follow in the webhook phase (P1-3) where real OAuth flows are exercised.
- 📄
document-parser— OCR shell. Returns synthetic output today; live Document AI wiring is tracked as P2 hygiene (integration client type reconciliation + real service-account auth). - 🏦
bank-reconciliation— Pulls transactions in range from the tenant-scoped pool, separates matched vs. unmatched, emits up to 50 discrepancies. RLS filters automatically; no tenant params in the SQL. - 🛡️
compliance-checker— Three rules today: existence, duplicate vendor+amount in last 90 days, approval-limit threshold. Returns typed issues with severities. - 📊
financial-reporting— Real SQL forprofit_loss,cash_flow,balance_sheet. Runs throughctx.sqlso RLS scopes results by tenant. - 🔄
data-sync— Returnsstatus: 'synthetic' | 'no_token' | 'ok'based on configured creds. Becomes fully live once P1-2 (OAuth token manager) lands. - 🧾
audit-trail— INSERTs intoaudit_logvia the tenant-scoped client. The table’s RLSWITH CHECKprevents forged cross-tenant entries even through this skill.
All 8 skills are registered in src/skills/index.ts; any worker / agent / route can call invokeSkill(name, input, ctx) and get Zod-validated results with PII redaction applied at the executor layer.
- 🔌 8 orchestrators now dispatch to workers: CFO, AR Lead, Reconciliation, Compliance, Reporting, Integration, Data/ETL, Support. Previously all of these were stubs returning
{ matched_count: 0, success: true, ... }. - 🧰 Shared helper
src/agents/orchestrators/_dispatch.ts— all 9 orchestrators now route through a singledispatchCapability()function:- Look through
required_capabilities. - Find the first one in that orchestrator’s capability→worker table.
- Delegate through
agent_runtime.get_agent(worker_id).execute_task(...). - Fall back to a local LLM call with the orchestrator’s role as the system prompt if nothing matches.
- Look through
- 🎛️ Uniform behavior. Every orchestrator file is now ~35 lines: config + capability table + a thin
execute()that calls the helper. No more divergent in-line LLM calls.
- The 100 workers are now reachable through the hierarchy for the first time.
- Worker bodies are still
WorkerAgentstubs (they return templated success) — wiring them to real skills is P1-6 and beyond.
📜 Previous phase: P1-8 + P1-9 + P1-10 + P1-11 — Audit persistence, parallel DAG, retry, PII redaction (2026-04-17)
- 🧾 P1-8: Audit log persistence.
base.tslog_audit()now INSERTs intoagent_runsthroughwithTenantContext()— the transaction-scoped pool we built in P0-1. Previously onlyconsole.log. Audit failures are caught and never take down the request. - 🚀 P1-9: Parallel DAG execution. New
topological_levels()helper groups tasks into levels where everything in levelNhas its dependencies in levels<N.execute_plannow runs each level withPromise.all. The old sequentialfor (task of order)is gone; independent tasks fan out. - 🔁 P1-10: Real retry in orchestration. Every task execution is wrapped in
retryWithBackoff(from P1-1): exponential backoff with full jitter, up to 3 attempts. Theretryableclassifier skips domain-validation errors ("invalid","validation","isolation") so bad inputs don’t get hammered. - 🛡️ P1-11: PII redaction by default on
llm.complete(). SSN, credit card, phone, email are scrubbed before the prompt reaches the LLM. Opt-out viaredact_pii: falseonly when the caller has already run its own redaction (e.g. the skill layer’sredactPIIhook). - 🧹 Type errors cleaned up.
src/agents/orchestration_service.tshad two pre-existing TS errors (missingTenantContextexport,string \| nullvsstring \| undefined). Both fixed as a side effect of this work.
- 🆕
src/integrations/http.ts— one place for all integration HTTP calls.- 🚧 Circuit breaker per service (Plaid, Stripe, QuickBooks, Xero, Document AI, LLM).
- 🔁 Exponential backoff retry (full jitter, configurable
maxRetries,baseDelayMs,maxDelayMs). - 🕰️ Retry-After header honored for 429 responses.
- 🧰 Exposes a generic
retryWithBackoff()for non-HTTP paths (LLM SDK calls, etc.).
- 🧠 LLM client hardened.
llm.complete()now goes through the circuit breaker + retry wrapper and classifies errors (408/425/429/5xxretry;4xxsurface immediately). The long-standing doubletrace.end()bug is fixed as a side-effect. - 📣 P1-12: LLM decomposition fails loudly.
decompose_financial_task()now throwsTaskDecompositionError(with context) on parse failure instead of silently collapsing every request to a single "generate report" task. Callers can catch and decide on recovery. - 🧪 Tests at
tests/integrations/http.test.tscover retry-on-429, Retry-After numeric parsing, retry on transient errors, non-retry on 4xx, exhaustion, andisRetryableopt-out.
- P1-10 (orchestration retry) reuses
retryWithBackoff. - Each of Plaid/Stripe/QB/Xero/Document AI can opt in by wrapping its
request()— the same tests cover that path.
- ⏱️ Per-agent task timeout in
src/agents/base.ts—execute_task()now wrapsthis.execute(task)in aPromise.raceagainst a timeout (default 30s, configurable viaAgentConfig.timeout_ms). A hung LLM call or stuck worker now fails closed with a clear error instead of blocking the request. - 🧠
withMemoryStore()helper insrc/memory/index.ts— runs memory operations insidewithTenantContext, so the memories-table RLS policy (tenant_id = current_setting('app.current_tenant_id', true)) fires on every query. This is the preferred entry point for callers; the legacy singleton pattern is documented as "don’t use this for real requests".
- MemoryStore still has some
.unsafestring interpolation on dynamic filter fields. Mitigated today becausetenantIdcomes from a signed JWT and RLS enforces isolation at the DB, but it’s flagged for the P2 hygiene sweep.
- 🆕
src/skills/layer — the first real entry point for skills, not just markdown docs:types.ts—SkillContext,SkillDefinition<In, Out>, and typed error classes (SkillNotFoundError,SkillInputError,SkillOutputError).registry.ts— in-memory registry of skills.executor.ts— singleinvokeSkill(name, input, ctx)entry point. Validates input via Zod, runs the optional PII redaction hook, calls the handler, validates the output.index.ts— auto-registers built-in skills on import.
- 🧾 Handler:
invoice-processor— takes an OCR blob, redacts PII (SSN/CC/email), callsllm.parse_invoice(), returns typed invoice fields. - 💳 Handler:
payment-gateway— creates a Stripe PaymentIntent with a deterministic idempotency key (hash oftenant_id | invoice_id | amount). Retries with the same key never double-charge. Falls back to a synthetic response whenSTRIPE_SECRET_KEYis unset, so local dev and CI don't need real Stripe creds. - 🧪 Tests added at
tests/skills/executor.test.ts:- Valid input → validated output
- PII hook runs before handler
- Unknown skill →
SkillNotFoundError - Bad input →
SkillInputError - Bad output from handler →
SkillOutputError - Built-in skills are registered
- Synthetic Stripe mode + idempotency key stability
- Workers will call
invokeSkill()rather than reimplement the logic locally. - The capability-to-worker table in AP Lead (P0-4) points to worker agents; P1-5 wires each worker to its skill.
- 🧑
✈️ CEOdelegate_to()is real now. Looks up the orchestrator throughagent_runtime, callsexecute_task()with the current tenant context, and throws on orchestrator failure (previously returned a silent mock). - 🏢 AP Lead dispatches to workers.
AccountsPayableLeadAgent.execute()now routes eachrequired_capabilityto the matching worker ID via a tiny capability→worker table (e.g.invoice_parsing → ap_invoice_parser). - 🔁 Reference implementation. The same dispatch pattern (capability table +
dispatch_to_worker()+fallback_local()) will be copied into the other 8 orchestrators in P1-5. - 🪢 Module cycle handled. Both CEO and AP Lead use
await import('./index')/await import('../index')inside the dispatch function so theindex.ts ↔ agent.tscircular import resolves cleanly.
- Workers are still
WorkerAgentstubs (templated responses). Dispatch is real and tested — but actual worker logic (OCR, validation, payment) lands in P0-5 (skill executor) and P1-5 (worker behavior). Until then, the hierarchy returns stub output from workers, which is the correct scaffolding state for this phase.
- 🧪 New test file at
tests/rls/isolation.test.ts— 7 integration tests that spin up real data in two tenants and prove RLS blocks cross-tenant reads and writes throughwithTenantContext(). - 🔒 What’s proved:
- Tenant A sees exactly its own invoices; Tenant B likewise.
- An
UPDATEfrom tenant A aimed at tenant B’s rows → 0 rows affected. viewerrole cannotINSERTinvoices (WITH CHECK blocks it).accountantcan insert in their own tenant but not with a spoofedtenant_id(cross-tenant WITH CHECK blocks it).super_admincan read across tenants — the intended bypass, confirmed to work.
- 🧹 Fixtures are self-contained — UUIDs per run, clean-up in
afterAll, no cross-test collisions.
DATABASE_URL=postgres://user:pass@localhost:5432/transactionwonder bun test tests/rlsSelf-skips without DATABASE_URL, same pattern as P0-1.
- ✏️ 315 references across 104 files renamed in a single sweep — all prose, code, docs, SQL, config.
- 🏷️ Brand is now just TransactionWonder everywhere (no more dual "TransactionWonder / ClawKeeper" split).
- 🧑
✈️ CEO agent is now identified asceoinstead ofclawkeeper— same role, clearer name.- Class renamed:
ClawKeeperAgent→CeoAgent. - File renamed:
src/agents/clawkeeper.ts→src/agents/ceo.ts. - Dir renamed:
agents/clawkeeper/→agents/ceo/. - Agent ID everywhere:
'clawkeeper'→'ceo'.
- Class renamed:
- 📦 Packages renamed: root
transactionwonder, dashboardtransactionwonder-dashboard. - 🗄️ DB / service names kept as the product name, not the agent name:
transactionwonderdatabase,transactionwonder_servicerole, etc. - ✅ Full TypeScript typecheck still clean on every changed file.
grep -r "ClawKeeper\|clawkeeper"across the whole tree → 0 hits.- No orphan references:
ceoas an identifier only appears where it refers to the CEO agent role.
- 🔴 Cross-tenant leak via the connection pool. The old middleware ran
SET app.current_tenant_id = '...'which is session-scoped, not transaction-scoped. With a pool of 10 connections, one request’s tenant context could stick on a connection and be seen by the next request that reused it — a real multi-tenant bleed. - 🔴 SQL-injection-shaped SET command. The same code did
sql.unsafe(\SET app.current_tenant_id = '${decoded.tenant_id}'`)`, string-interpolating a JWT value into raw SQL. Safe only while the JWT secret stayed secret.
- ➕ New helper at
src/db/with-context.ts— opens a Postgres transaction and sets the three RLS session variables via parameterizedset_config(name, value, true)calls (transaction-scoped, no interpolation). - 🔁 Middleware rewrite in
src/api/server.ts— now wraps every authenticated request inside awithTenantContext()transaction and hands the transactional SQL client to the handler viac.var.sql. - 🔁 All 11 route files now use
c.var.sqlinstead of the module-scopedsqlpool client. This is the only client with tenant context applied for the current request. - 🧪 Tests added at
tests/rls/with-context.test.ts— prove GUCs are scoped to the transaction, do not leak to the outer pool connection, stay isolated under concurrency, and treat injection-shaped values as data. - 🚰 SSE streaming edge case handled — the streaming endpoint opens its own short-lived
withTenantContext()for the audit write that happens after the outer transaction has committed.
- No cross-tenant GUC bleed.
- No SQL injection path via
SET. - Any route handler throw → full transaction rollback (stronger consistency than before).
- Full TypeScript typecheck passes on every changed file.
# Requires a running Postgres instance.
DATABASE_URL=postgres://user:pass@localhost:5432/transactionwonder bun test tests/rlsThe test suite self-skips if DATABASE_URL is not set, so CI without a DB still passes.
- 🆕
src/db/with-context.ts - 🆕
tests/rls/with-context.test.ts - ✏️
src/api/server.ts - ✏️
src/types/hono.ts - ✏️
src/api/routes/{auth,invoices,reports,reconciliation,accounts,agents,dashboard,activity,vendors,customers,metrics}.ts
- P0-2 — RLS isolation test suite (tenant A can’t read tenant B, viewer can’t write, etc.)
- P0-3 + P0-4 — Wire the CEO → Orchestrator → Worker execution spine end-to-end on the AP Lead slice.
- P0-5 — Build
src/skills/with a real skill executor.
Built with care by Phani Marupaka 👋
- 🧠 Architecture, product direction, and every line of code you see here.
- 🤖 Audit + refactor cadence supported by Claude Code.
- 💬 Feedback, ideas, contributions — all welcome.
📜 License: MIT