Skip to content

Scalar functions in WHERE silently return MATCH ALL instead of raising error #22

@hollanf

Description

@hollanf

Summary

When a scalar function (e.g., LOWER, UPPER, SUBSTRING, LENGTH, arithmetic on columns) appears on the left side of a WHERE comparison, the predicate is silently dropped — the query returns empty rows (or all rows ignoring that predicate) with no error.

Because there is no error, applications receive wrong results without any signal. We only caught this when we noticed duplicate entities in our collection despite a dedup query that looked correct.

Repro

CREATE COLLECTION entities TYPE DOCUMENT STRICT (
  id STRING PRIMARY KEY,
  canonical_name STRING NOT NULL
);

INSERT INTO entities (id, canonical_name) VALUES ('ent_1', 'JavaScript');

-- Works: direct comparison
SELECT id FROM entities WHERE canonical_name = 'JavaScript';
-- → 1 row

-- Broken: LOWER() in WHERE returns empty instead of 1 row, with no error
SELECT id FROM entities WHERE LOWER(canonical_name) = 'javascript';
-- → 0 rows (expected: 1 row, or at minimum an error)

LOWER() itself is implemented and evaluates correctly in SELECT:

SELECT LOWER(canonical_name) FROM entities;
-- → "javascript"

So the function exists — only the WHERE-clause planner path is affected.

Source pointer

From reading the code, the issue appears to be in nodedb/src/control/planner/sql_plan_convert/filter.rs around sql_expr_to_scan_filters() (BinaryOp handler). The SqlExpr::Function branch seems to treat any function as an aggregate (for HAVING), which produces a field name that doesn't match any column, and the default fall-through returns match_all(). No error is raised.

Why this matters

  • Silent failure is the worst outcome — crashes or error messages are recoverable; silent empty results mask data bugs for days or months.
  • Users coming from Postgres/MySQL/SQLite expect LOWER(col) = 'x' to work — it's standard SQL. Hitting an empty result set with no error is very surprising.
  • Scope is broad — not just LOWER. Any scalar function (UPPER, SUBSTRING, LENGTH, DATE_TRUNC, arithmetic like col + 1) in WHERE would hit the same path.

Workaround we're using

For case-insensitive string matching, text_match(col, 'value') with the analyzer pipeline (lowercase + stemming) is the NodeDB-idiomatic approach and works correctly. For our entity dedup path we've also normalized canonical_name to lowercase at write time. These are fine once you know — but the silent fail was what burned us.

Environment

  • NodeDB v0.0.1 (built from main on Apr 2026, also reproducible on farhansyah/nodedb Docker image v0.0.0)
  • Linux ARM64 (Ubuntu 24.04) inside OrbStack VM
  • HTTP /query endpoint (same behavior via pgwire)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions