Skip to content

pgwire SELECT on TYPE DOCUMENT STRICT silently returns null/empty columns via extended protocol (data loss) #44

@hollanf

Description

@hollanf

SELECT col1, col2 FROM <strict-doc-collection> WHERE … executed via the pgwire extended-query protocol (the default for most drivers) returns rows where every column value is NULL or the row object is [{}], even when the underlying rows exist and the same query via the simple-query protocol returns the correct data.

This is a silent failure — the client receives rows (not an error), the rows look well-formed (correct count), but every field is null. Dedup logic, update paths, cache lookups, and any read-modify-write pattern that uses a SELECT result on a STRICT doc collection quietly misbehaves.

Symptom seen by client

Two production workaround comments from a downstream app, verbatim:

nodedb-memory-store.ts:507-511

NodeDB 0.0.4 quirk: tagged-template SELECT on TYPE DOCUMENT STRICT
collections returns empty row objects ([{}]). Use simple-query
protocol (sql.unsafe) to get the `result`-wrapped JSON shape that
unwrap() can parse correctly. Without this, entity dedup was
returning null on every lookup → new entity row per mention.

nodedb-memory-store.ts:754-756

NodeDB 0.0.4: tagged-template SELECT on TYPE DOCUMENT STRICT collections
silently returns rows with all columns null. Use sql.unsafe (simple-query)
+ esc() quoting. Same pattern as findEntityByExactName fix.

The "entity dedup was returning null on every lookup → new entity row per mention" line is the business-impact tell: any SELECT … WHERE canonical_name = $1 LIMIT 1 over a STRICT doc collection returned an empty-body row, the app treated it as "not found", and inserted a duplicate. A caller can't tell the difference between "row doesn't exist" and "row exists but all fields decoded to null".

Schema shape that triggers it (from the downstream app's schema setup):

CREATE COLLECTION entities TYPE DOCUMENT STRICT (
  id STRING PRIMARY KEY,
  tenant_id STRING NOT NULL,
  user_id STRING NOT NULL,
  canonical_name STRING NOT NULL,
  aliases STRING,
  embedding VECTOR(1024),
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);

Query that manifests (extended protocol — postgres.js tagged template):

const rows = await sql\`
  SELECT id, canonical_name FROM entities
  WHERE tenant_id = \${tenantId}
    AND user_id = \${userId}
    AND canonical_name = \${name}
  LIMIT 1
\`;
// rows = [{}] or [{ id: null, canonical_name: null }]

Same query via simple-query protocol (sql.unsafe(...)) returns the actual row contents wrapped in the { result: "..." } envelope — correct data.

Scope

  • Only TYPE DOCUMENT STRICT collections are reported affected; regular CREATE COLLECTION (schemaless) works via either protocol, per the downstream app's split — it uses tagged template for schemaless tables without issue.
  • Affects any pgwire driver that defaults to extended protocol (postgres.js, psycopg, JDBC, pgx), unless the client explicitly opts out of PREPARE (prepare: false in postgres.js).
  • Affects the basic SELECT path — not DSL. This is distinct from issue pgwire extended-query protocol rejects DSL statements (SEARCH / GRAPH / UPSERT / MATCH) — forces simple-query + string interpolation #43 (DSL parse rejection) where the extended path errors out; here it succeeds but returns wrong data, which is worse.

Likely root cause (for triage)

The extended path goes through prepared/execute.rs:46-49:

let mut results = self
    .execute_planned_sql_with_params(&identity, &stmt.sql, tenant_id, &addr, &params)
    .await?;

which routes to execute_planned_sql_inner in routing/mod.rs:87. The simple-query path instead goes through sql_exec.rs:286-307 and ends up in execute_planned_sql plus the DDL dispatcher at line 286. The row-encoding layer (envelope-wrapped JSON result vs. typed columns) differs between the two paths, and somewhere in the typed-column encoder for STRICT-doc scans the fields are not populated — or the result schema inferred at Parse time (prepared/parser.rs:63-76) doesn't match the actual columns decoded at Execute time.

Why this matters

  1. Silent. Silent. Silent. No error, no warning — the driver reports N rows and the application reads N null-valued records.
  2. Corrupts application state. The downstream app was creating a new entity row for every mention because findEntityByExactName returned null. Over days, entity count blows up with duplicates and the LLM's memory graph degrades.
  3. Forces prepare: false workaround globally (see also pgwire extended-query protocol rejects DSL statements (SEARCH / GRAPH / UPSERT / MATCH) — forces simple-query + string interpolation #43), which is also why this bug is not trivially side-stepped.

Repro

CREATE COLLECTION t TYPE DOCUMENT STRICT (
  id STRING PRIMARY KEY,
  name STRING
);
INSERT INTO t (id, name) VALUES ('a', 'alice');
// postgres.js with DEFAULT (extended protocol) — buggy
const rows = await sql\`SELECT id, name FROM t WHERE name = \${'alice'} LIMIT 1\`;
console.log(rows); // expected: [{ id: 'a', name: 'alice' }], actual: [{}] or [{ id: null, name: null }]

// Same query via simple-query — correct
const rows2 = await sql.unsafe(\"SELECT id, name FROM t WHERE name = 'alice' LIMIT 1\");
console.log(rows2); // [{ id: 'a', name: 'alice' }] (possibly wrapped in { result: \"...\" })

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions