Skip to content

Implement FastAPI app with chat and admin routes#7

Merged
danrixd merged 1 commit into
mainfrom
codex/implement-api-structure-with-fastapi-or-flask
Aug 1, 2025
Merged

Implement FastAPI app with chat and admin routes#7
danrixd merged 1 commit into
mainfrom
codex/implement-api-structure-with-fastapi-or-flask

Conversation

@danrixd
Copy link
Copy Markdown
Owner

@danrixd danrixd commented Aug 1, 2025

Summary

  • add FastAPI app with login endpoint
  • add JWT-based auth helpers
  • add chat routes using conversation manager and response generator
  • add admin routes for tenant management

Testing

  • pytest -q

https://chatgpt.com/codex/tasks/task_e_688c59edde14832a8bb3c8247f37475e

@danrixd danrixd merged commit f647c0b into main Aug 1, 2025
@danrixd danrixd deleted the codex/implement-api-structure-with-fastapi-or-flask branch August 1, 2025 06:13
danrixd added a commit that referenced this pull request Apr 14, 2026
Big coordinated batch covering Tier 2 (functional gaps), Tier 3 (polish),
and Tier 4 (production readiness) from the earlier what's-missing audit.
Only #20 (KMS/Vault secrets manager) is intentionally skipped as overkill
for a single-operator dev box.

# Backend

T2 #5 — generalize exact_lookup (db/query_engine.py)
  Now dispatches by tenant and table. financebench queries with a ticker
  hint (sniffed from the user message) and a YYYY-MM-DD date land on
  daily_bars(ticker, date, open, high, low, close, volume) in
  financebench.db and return the exact row. Verified live: "AAPL closing
  price on 2024-06-14" -> {close: 212.49, volume: 70.1M}. Legacy
  market_data path still works for company/organization vaults.

  ResponseGenerator._lookup_db now tries the intraday pattern first,
  then falls back to date-only; passes message= through so the ticker
  detector can see it. Shared _format_row helper handles both row shapes.

T2 #6 — FinanceBench eval harness (scripts/eval_financebench.py)
  Runs each of the 150 open-source ground-truth questions through
  /chat/message and scores the reply with three strategies:
    - numeric match (2% relative tolerance, handles $/bn/M/%)
    - substring match on the canonical answer
    - optional Claude-as-judge via --judge flag
  Writes a markdown report at docs/financebench_eval.md with headline
  accuracy, per-question-type breakdown, latency stats, sample failures.

  Usage:
    python scripts/eval_financebench.py --model-provider anthropic       --model-name claude-opus-4-6 --limit 50 --judge

T2 #7 — chunked re-ingest on vault edit (api/routes_files.py)
  _ingest_file_into_tenant_store now uses the shared ai/chunking module,
  deletes every existing {tenant}:{filename}#* vector under the file's
  prefix before inserting fresh chunks, and honors section-aware
  chunking. The vault PUT endpoint delegates to the same helper so
  edits match the loader's layout exactly.

T2 #8 — streaming responses (api/routes_chat.py)
  New POST /chat/message/stream endpoint returning SSE events:
    event: delta   data: {"text": "..."}  (60-char chunks)
    event: done    data: {"latency_ms": N}
    event: error   data: {"detail": "..."}
  Records usage + writes audit log + updates conversation history
  just like the non-streaming variant.

T2 #9 — cleared all datetime.utcnow() deprecation warnings
  Updated 8 callsites across api/routes_auth, db/{audit_log,
  conversation, file, rag_trace, settings, user}_repository, and
  ingestion/metadata_generator to use datetime.now(timezone.utc).
  Warning count dropped 83 -> 12 on pytest runs.

T3 #14 — cross-tenant search (api/routes_admin.py + frontend page)
  GET /admin/search?q=...&tenants=... runs the hybrid retriever
  across every (or a subset of) tenant's Chroma store and returns a
  ranked flat list with tenant tags. Super-admin only. Verified live:
  q=quokka across all 7 tenants returns 29 hits primarily from
  organization + company vaults.

T3 #15 — audit log viewer (db/audit_log_repository + /admin/audit-log + frontend)
  audit_log_repository gains list_logs(limit, offset, username, action)
  and count_logs(). GET /admin/audit-log returns paginated events.
  New pages/AuditLog.jsx renders a filterable table.

T3 #16 — prompt-cache verification (ai/models/anthropic_model.py)
  AnthropicModel.generate now logs
    anthropic usage: input=N cache_create=N cache_read=N output=N
  on every response via a dedicated smartbaseai.anthropic logger,
  tagged with the request id from the middleware. Lets operators
  verify cache_control is actually being hit across turns.

T3 #21 — metadata-aware chunking (new ai/chunking.py)
  Two-tier chunker:
    1. Split markdown on ## / ### headings (preserves 10-K sections
       like Risk Factors, MD&A, Balance Sheet in their own chunks)
    2. Within each section, pack paragraphs with a MAX_CHUNK_CHARS=2000
       hard cap; oversize paragraphs get whitespace-aligned hard-split.
  Each chunk is prefixed with **Section Title** so semantic similarity
  can match on the section name. chunk_with_sections() returns rich
  dicts with section metadata for callers that want it.

T4 #17 — structured logging + request IDs (new api/logging_config.py)
  New RequestIdMiddleware assigns uuid4 per request (or honors an
  incoming X-Request-ID header), propagates via contextvars, and
  returns the id in the response header. Log format:
    HH:MM:SS INFO rid=abc123456789 smartbaseai.http: POST /chat/trace -> 200 (142.1ms)
  Installed via configure_logging() in api/app.py.

T4 #18 — rate limiting + cost tracking (new db/usage_repository.py)
  llm_usage table records (tenant_id, username, provider, model_name,
  input_tokens, output_tokens, cache_read_tokens, cache_create_tokens,
  latency_ms, created_at) per request. chat_message + chat_message_stream
  both record on every call with a 4-chars-per-token heuristic for
  providers that don't expose usage.

  Per-tenant daily cap: set "daily_token_cap" on the tenant config to
  enforce. Chat endpoint rejects with 429 when hit:
    "Daily token cap reached for tenant 'X' (N/CAP)"

  GET /admin/usage returns a per-day x per-tenant x per-provider
  rollup with estimated cost using blended prices (Anthropic/OpenAI
  public rates). Ollama is free/local.

  New pages/UsageDashboard.jsx renders the rollup with totals across
  requests / tokens / estimated cost.

T4 #19 — session timeout interceptor (frontend/src/api/api.js)
  axios response interceptor catches 401 / "Token expired" / "Invalid
  token" and:
    - clears localStorage (access_token, role, tenant_id, active_tenant, username)
    - stashes the current path in sessionStorage.post_login_redirect
    - window.location.assign('/login')
  Login page reads post_login_redirect on success and bounces back.

# Frontend

T3 #10 — markdown preview in Vault editor (pages/Vault.jsx)
  New edit / split / preview toggle above the textarea. split mode
  shows raw markdown on the left and rendered output on the right.
  Uses react-markdown (new dep).

T3 #11 — export trace (pages/RagVisualizer.jsx)
  Three new buttons above the pipeline diagram:
    - 📋 Copy JSON — writes full trace to clipboard
    - ⬇ .md — downloads a formatted markdown report (query, DB
      lookup, hybrid retrieval ranking, fusion block, LLM reply)
    - ⬇ .json — downloads the raw trace JSON
  Live alongside the existing "💾 Save trace" button.

T3 #12 — bulk upload in Vault (pages/Vault.jsx)
  File input becomes <input multiple>. upload() iterates every
  selected file, catches per-file errors, reports
  "Uploaded N/M files · K ingested." when multi.

T3 #13 — error boundary (components/ErrorBoundary.jsx + App.jsx)
  React class component wraps <AppRouter/>. Catches render errors
  with a recoverable fallback card (try again / reload app) instead
  of blanking the page.

# Login / app plumbing

  Login.jsx now also stashes username in localStorage and honors the
  post_login_redirect bounce. components/Layout.jsx gets new sidebar
  links for super_admin (Cross-tenant Search / Usage / Audit Log /
  Settings).

# Tests

  24/24 pytest green. Warning count dropped from 83 to 12 due to the
  datetime.utcnow() cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant