-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Code Audit — Senior Engineering Review
This issue documents findings from a systematic audit of the codebase covering security, correctness, performance, and code quality. Each issue has been validated against the actual source before being included. Issues are ordered by severity within each category.
Security
1. Process-wide TLS disabling is racy with concurrent requests (Critical)
Files: src/core/url-fetcher.ts:164–176, src/connectors/http-utils.ts:25–69
Both files temporarily set process.env["NODE_TLS_REJECT_UNAUTHORIZED"] = "0" globally to support self-signed certs, then restore the previous value in a finally block. Because Node's event loop is single-threaded but I/O is concurrent, two simultaneous HTTP requests where one needs self-signed cert support and one doesn't can race: the restoration in request A's finally can run while request B's TLS handshake is still in progress with "0" still set. The env var is process-global and undici (Node's native fetch) reads it at connection time, not at call time.
Fix: Pass a per-request https.Agent with rejectUnauthorized: false instead of mutating process-wide state. Both call sites already wrap the fetch logic so this is a contained change.
2. API key comparison is vulnerable to timing attacks (High)
File: src/api/middleware.ts:115
if (token !== apiKey) { // string equality — not constant timeJavaScript's !== short-circuits on the first differing character, leaking information about the key through response timing. An attacker making many requests can statistically determine the correct key one character at a time.
Fix:
import { timingSafeEqual } from "node:crypto";
if (token.length !== apiKey.length || !timingSafeEqual(Buffer.from(token), Buffer.from(apiKey))) {Bugs
3. Chunks can be committed to the DB without embeddings, silently (High)
File: src/core/indexing.ts:324–335
Inside the indexing transaction, insertEmbedding failures that are not "no such table" are caught, logged as warn, and then swallowed. The chunk row was already written by insertChunk.run() on the line above. The transaction still commits. The document now has chunks that will never appear in semantic search results, with no flag in the schema to distinguish them from properly-indexed chunks. The issue is especially hard to detect because listDocuments and getDocument return the document normally.
Fix: Track whether any embeddings failed and either (a) abort the transaction and surface the error to the caller, or (b) add an embedding_status column to chunks (pending | ok | failed) so gaps are discoverable and retriable via reindex.
4. minRating filter uses a correlated subquery instead of the pre-joined aggregate (High)
File: src/core/search.ts:433–435, 555–557
The FTS and hybrid search variants both build a LEFT JOIN subquery to compute avg_rating (lines ~413–417 and ~535–539) and select it as avg_r.avg_rating. However, when minRating is set, the filter appended to the WHERE clause is:
AND (SELECT AVG(r.rating) FROM ratings r WHERE r.document_id = d.id) >= ?This is a correlated subquery evaluated once per row, re-aggregating the same data already aggregated by the JOIN. The correct reference is already in scope:
AND avg_r.avg_rating >= ?The first search variant (vector search, lines ~196–229) does it correctly. The other two do not.
Performance
5. N+1 queries in resolveSelector tag filtering (High)
File: src/core/bulk.ts:73–80
ids = ids.filter((id) => {
const docTags = getDocumentTags(db, id).map((t) => t.name); // one DB query per document
return requiredTags.every((rt) => docTags.includes(rt));
});getDocumentTags() executes a prepared statement each time. With MAX_BATCH_SIZE = 1000, a bulk operation with a tag filter issues up to 1000 sequential DB queries. There is also a secondary O(n²) pattern on lines 60–70 where docs.find() is called inside a filter() loop (O(n) per element) instead of using a pre-built Map.
Fix (tag N+1): Add a batch variant — e.g., getTagsByDocumentIds(db, ids) returning Map<string, string[]> — and use it here.
Fix (docs.find O(n²)):
const docMap = new Map(docs.map((d) => [d.id, d]));
ids = ids.filter((id) => {
const doc = docMap.get(id);
return doc != null && doc.createdAt >= from;
});6. loadConfig() called per-request with no caching (Medium)
File: src/api/routes.ts:174, 295–296, 323, 458
loadConfig() is called six times per relevant request, parsing the config file from disk each time. For high-throughput usage this adds unnecessary I/O on the hot path.
Fix: Cache the result with a short TTL (e.g., 30s) or invalidate on SIGHUP. A simple module-level cached value with a lastLoaded timestamp would suffice.
Code Quality & Architecture
7. SSE stream does not handle backpressure or client disconnect (Medium)
File: src/api/routes.ts:342–344
for await (const event of stream) {
res.write(`data: ${JSON.stringify(event)}\n\n`); // return value ignored
}res.write() returns false when the write buffer is full or the socket is closed. Ignoring it means the loop continues generating and serializing events into a dead socket until the upstream LLM stream finishes. This wastes embedding compute and holds open connections.
Fix:
for await (const event of stream) {
const ok = res.write(`data: ${JSON.stringify(event)}\n\n`);
if (!ok) break; // client disconnected or buffer full
}8. Vector table DDL is duplicated between schema.ts and reindex.ts (Medium)
Files: src/db/schema.ts:305–314 (via createVectorTable()), src/core/reindex.ts:64–69
reindex.ts contains an inline CREATE VIRTUAL TABLE IF NOT EXISTS chunk_embeddings USING vec0(...) that duplicates the logic in createVectorTable(). If the two ever diverge (e.g., a dimension change), reindex will silently succeed against a table with the wrong dimensions.
Fix: Replace the inline DDL in reindex.ts with a call to createVectorTable(db, provider.dimensions).
9. Hand-rolled YAML parser in Obsidian connector is fragile (Medium)
File: src/connectors/obsidian.ts:187–230
parseSimpleYaml() is a regex-based custom implementation that handles basic key: value and simple lists but fails silently on common real-world Obsidian frontmatter patterns: quoted strings containing colons (title: "foo: bar"), inline tags (tags: [one, two] with spaces after brackets), multi-line string blocks, nested keys, and boolean/numeric coercion. Malformed or complex frontmatter is silently ignored rather than raising a parse error.
Fix: Replace with js-yaml (MIT, zero dependencies) which is the standard YAML 1.2 parser used across the JS ecosystem. Obsidian itself serializes frontmatter as YAML 1.1/1.2 compatible output.
Implementation Plan
The following groups work that can proceed in parallel across separate PRs.
Phase 1 — Security (do first, no dependency on other phases)
- PR-A: Fix process-wide TLS disabling — refactor
url-fetcher.tsandhttp-utils.tsto use per-requesthttps.Agent(Bug: createVectorTable SQL injection via string interpolation #7, above issue 1) - PR-B: Fix API key timing attack — replace
!==withtimingSafeEqualinmiddleware.ts(issue 2)
Phase 2 — Bug fixes
- PR-C: Silent embedding gaps — add
embedding_statustochunksschema, surface failures in indexing transaction (issue 3) - PR-D: Fix correlated
minRatingsubquery in FTS and hybrid search paths (issue 4)
Phase 3 — Performance
- PR-E: Batch
getDocumentTagsinresolveSelector; fix O(n²)docs.findwith Map (issue 5) - PR-F: Cache
loadConfig()with TTL (issue 6)
Phase 4 — Code quality (can be batched into one PR)
- SSE backpressure check (issue 7)
- Consolidate vector table DDL to
createVectorTable()(issue 8) - Replace
parseSimpleYamlwithjs-yaml(issue 9)
Audit performed 2026-03-04. All findings verified against source before inclusion.