Skip to content

[Bug] SELECT * FROM view ORDER BY ... LIMIT returns NULL for non-key/non-orderby columns (TopN lazy materialization) #62598

@turbo-one-off

Description

@turbo-one-off

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris-4.0.5-rc01-59de8c4c524 (also reproducible on 4.0.4-rc01-d74dc1fb93)

What's Wrong?

SELECT * FROM <regular_view> ORDER BY ... LIMIT N returns NULL for
columns that are NOT referenced in WHERE or ORDER BY, while the same
query on the underlying physical table works correctly.

Reproduces with the smallest possible case: a single UNIQUE KEY table
with 3 rows behind a trivial view. No joins, no complex WHERE, no
large data volume.

This appears to be a regression in the TopN Lazy Materialization code
path (introduced in 4.0): the first phase reads ORDER BY columns and
row identifiers correctly, but the second-phase fetch of the remaining
columns fails when the query goes through a view, silently returning
NULL.

This is a silent wrong-result bug (no error raised), which is
especially dangerous for data warehousing use cases where views are
used as abstraction layers — callers may not realize the NULLs are
incorrect.

What You Expected?

Query through a view should return the same data as the query on the
underlying physical table.

SELECT * FROM v ORDER BY id DESC LIMIT 3
= SELECT * FROM t ORDER BY id DESC LIMIT 3 (modulo any extra view columns)

How to Reproduce?

Setup

CREATE DATABASE bug_repro;
USE bug_repro;

CREATE TABLE t (
    id BIGINT NOT NULL,
    name VARCHAR(50),
    email VARCHAR(100),
    age INT
) UNIQUE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
    "replication_num" = "1",
    "enable_unique_key_merge_on_write" = "true"
);

INSERT INTO t VALUES
    (1, 'Alice', 'alice@x.com', 30),
    (2, 'Bob',   'bob@x.com',   25),
    (3, 'Carol', 'carol@x.com', 35);

CREATE VIEW v AS
    SELECT id, name, email, age, CAST('a' AS VARCHAR(32)) AS tag FROM t;

Baseline — querying the physical table (correct)

SELECT * FROM t ORDER BY id DESC LIMIT 3;

Output:

+----+-------+-------------+------+
| id | name  | email       | age  |
+----+-------+-------------+------+
|  3 | Carol | carol@x.com |   35 |
|  2 | Bob   | bob@x.com   |   25 |
|  1 | Alice | alice@x.com |   30 |
+----+-------+-------------+------+

Bug — same query through the view

SELECT * FROM v ORDER BY id DESC LIMIT 3;

Actual output:

+----+------+-------+------+------+
| id | name | email | age  | tag  |
+----+------+-------+------+------+
|  3 | NULL | NULL  | NULL | a    |
|  2 | NULL | NULL  | NULL | a    |
|  1 | NULL | NULL  | NULL | a    |
+----+------+-------+------+------+
  • id (ORDER BY column) is populated correctly.
  • tag (a CAST constant) is populated correctly.
  • name, email, age (columns not in WHERE/ORDER BY) are all NULL.

What works (no bug)

SELECT COUNT(*) FROM v;                              -- returns 3
SELECT * FROM v WHERE id = 1;                        -- complete data
SELECT * FROM t ORDER BY id DESC LIMIT 3;            -- complete data (physical table)

Session variables attempted — none of them fix the bug

SET enable_two_phase_read_opt = false;
SET experimental_topn_lazy_materialization_threshold = 0;
SET enable_partition_topn = false;
SET push_topn_to_agg = false;
SET topn_opt_limit_threshold = 0;

-- Still returns NULL for name/email/age:
SELECT * FROM v ORDER BY id DESC LIMIT 3;

Subquery and CTE wrapping — also do not fix the bug

-- Still returns NULL:
SELECT * FROM (SELECT * FROM v) x ORDER BY id DESC LIMIT 3;

-- Still returns NULL:
WITH x AS (SELECT * FROM v)
SELECT * FROM x ORDER BY id DESC LIMIT 3;

Workaround — replacing the view with an async materialized view returns correct data

DROP VIEW v;

CREATE MATERIALIZED VIEW v
BUILD IMMEDIATE
REFRESH AUTO ON SCHEDULE EVERY 1 MINUTE
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1")
AS SELECT id, name, email, age, CAST('a' AS VARCHAR(32)) AS tag FROM t;

-- Returns correct data:
SELECT * FROM v ORDER BY id DESC LIMIT 3;

Anything Else?

Possibly related prior work:

Workaround: replacing the regular view with an async materialized view
returns correct data, but this has performance implications
(full-refresh cost) for large base tables and is not a viable
general-purpose solution.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions