Implement chatbot core modules#6
Merged
danrixd merged 1 commit intoAug 1, 2025
Merged
Conversation
danrixd
added a commit
that referenced
this pull request
Apr 14, 2026
Big coordinated batch covering Tier 2 (functional gaps), Tier 3 (polish), and Tier 4 (production readiness) from the earlier what's-missing audit. Only #20 (KMS/Vault secrets manager) is intentionally skipped as overkill for a single-operator dev box. # Backend T2 #5 — generalize exact_lookup (db/query_engine.py) Now dispatches by tenant and table. financebench queries with a ticker hint (sniffed from the user message) and a YYYY-MM-DD date land on daily_bars(ticker, date, open, high, low, close, volume) in financebench.db and return the exact row. Verified live: "AAPL closing price on 2024-06-14" -> {close: 212.49, volume: 70.1M}. Legacy market_data path still works for company/organization vaults. ResponseGenerator._lookup_db now tries the intraday pattern first, then falls back to date-only; passes message= through so the ticker detector can see it. Shared _format_row helper handles both row shapes. T2 #6 — FinanceBench eval harness (scripts/eval_financebench.py) Runs each of the 150 open-source ground-truth questions through /chat/message and scores the reply with three strategies: - numeric match (2% relative tolerance, handles $/bn/M/%) - substring match on the canonical answer - optional Claude-as-judge via --judge flag Writes a markdown report at docs/financebench_eval.md with headline accuracy, per-question-type breakdown, latency stats, sample failures. Usage: python scripts/eval_financebench.py --model-provider anthropic --model-name claude-opus-4-6 --limit 50 --judge T2 #7 — chunked re-ingest on vault edit (api/routes_files.py) _ingest_file_into_tenant_store now uses the shared ai/chunking module, deletes every existing {tenant}:{filename}#* vector under the file's prefix before inserting fresh chunks, and honors section-aware chunking. The vault PUT endpoint delegates to the same helper so edits match the loader's layout exactly. T2 #8 — streaming responses (api/routes_chat.py) New POST /chat/message/stream endpoint returning SSE events: event: delta data: {"text": "..."} (60-char chunks) event: done data: {"latency_ms": N} event: error data: {"detail": "..."} Records usage + writes audit log + updates conversation history just like the non-streaming variant. T2 #9 — cleared all datetime.utcnow() deprecation warnings Updated 8 callsites across api/routes_auth, db/{audit_log, conversation, file, rag_trace, settings, user}_repository, and ingestion/metadata_generator to use datetime.now(timezone.utc). Warning count dropped 83 -> 12 on pytest runs. T3 #14 — cross-tenant search (api/routes_admin.py + frontend page) GET /admin/search?q=...&tenants=... runs the hybrid retriever across every (or a subset of) tenant's Chroma store and returns a ranked flat list with tenant tags. Super-admin only. Verified live: q=quokka across all 7 tenants returns 29 hits primarily from organization + company vaults. T3 #15 — audit log viewer (db/audit_log_repository + /admin/audit-log + frontend) audit_log_repository gains list_logs(limit, offset, username, action) and count_logs(). GET /admin/audit-log returns paginated events. New pages/AuditLog.jsx renders a filterable table. T3 #16 — prompt-cache verification (ai/models/anthropic_model.py) AnthropicModel.generate now logs anthropic usage: input=N cache_create=N cache_read=N output=N on every response via a dedicated smartbaseai.anthropic logger, tagged with the request id from the middleware. Lets operators verify cache_control is actually being hit across turns. T3 #21 — metadata-aware chunking (new ai/chunking.py) Two-tier chunker: 1. Split markdown on ## / ### headings (preserves 10-K sections like Risk Factors, MD&A, Balance Sheet in their own chunks) 2. Within each section, pack paragraphs with a MAX_CHUNK_CHARS=2000 hard cap; oversize paragraphs get whitespace-aligned hard-split. Each chunk is prefixed with **Section Title** so semantic similarity can match on the section name. chunk_with_sections() returns rich dicts with section metadata for callers that want it. T4 #17 — structured logging + request IDs (new api/logging_config.py) New RequestIdMiddleware assigns uuid4 per request (or honors an incoming X-Request-ID header), propagates via contextvars, and returns the id in the response header. Log format: HH:MM:SS INFO rid=abc123456789 smartbaseai.http: POST /chat/trace -> 200 (142.1ms) Installed via configure_logging() in api/app.py. T4 #18 — rate limiting + cost tracking (new db/usage_repository.py) llm_usage table records (tenant_id, username, provider, model_name, input_tokens, output_tokens, cache_read_tokens, cache_create_tokens, latency_ms, created_at) per request. chat_message + chat_message_stream both record on every call with a 4-chars-per-token heuristic for providers that don't expose usage. Per-tenant daily cap: set "daily_token_cap" on the tenant config to enforce. Chat endpoint rejects with 429 when hit: "Daily token cap reached for tenant 'X' (N/CAP)" GET /admin/usage returns a per-day x per-tenant x per-provider rollup with estimated cost using blended prices (Anthropic/OpenAI public rates). Ollama is free/local. New pages/UsageDashboard.jsx renders the rollup with totals across requests / tokens / estimated cost. T4 #19 — session timeout interceptor (frontend/src/api/api.js) axios response interceptor catches 401 / "Token expired" / "Invalid token" and: - clears localStorage (access_token, role, tenant_id, active_tenant, username) - stashes the current path in sessionStorage.post_login_redirect - window.location.assign('/login') Login page reads post_login_redirect on success and bounces back. # Frontend T3 #10 — markdown preview in Vault editor (pages/Vault.jsx) New edit / split / preview toggle above the textarea. split mode shows raw markdown on the left and rendered output on the right. Uses react-markdown (new dep). T3 #11 — export trace (pages/RagVisualizer.jsx) Three new buttons above the pipeline diagram: - 📋 Copy JSON — writes full trace to clipboard - ⬇ .md — downloads a formatted markdown report (query, DB lookup, hybrid retrieval ranking, fusion block, LLM reply) - ⬇ .json — downloads the raw trace JSON Live alongside the existing "💾 Save trace" button. T3 #12 — bulk upload in Vault (pages/Vault.jsx) File input becomes <input multiple>. upload() iterates every selected file, catches per-file errors, reports "Uploaded N/M files · K ingested." when multi. T3 #13 — error boundary (components/ErrorBoundary.jsx + App.jsx) React class component wraps <AppRouter/>. Catches render errors with a recoverable fallback card (try again / reload app) instead of blanking the page. # Login / app plumbing Login.jsx now also stashes username in localStorage and honors the post_login_redirect bounce. components/Layout.jsx gets new sidebar links for super_admin (Cross-tenant Search / Usage / Audit Log / Settings). # Tests 24/24 pytest green. Warning count dropped from 83 to 12 due to the datetime.utcnow() cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Testing
python -m pytest -qpython -m py_compile $(git ls-files '*.py')https://chatgpt.com/codex/tasks/task_e_688c58f66f94832a9cbe23fd2d534d7d