Search before asking
Version
doris-4.0.5-rc01-59de8c4c524 (also reproducible on 4.0.4-rc01-d74dc1fb93)
What's Wrong?
SELECT * FROM <regular_view> ORDER BY ... LIMIT N returns NULL for
columns that are NOT referenced in WHERE or ORDER BY, while the same
query on the underlying physical table works correctly.
Reproduces with the smallest possible case: a single UNIQUE KEY table
with 3 rows behind a trivial view. No joins, no complex WHERE, no
large data volume.
This appears to be a regression in the TopN Lazy Materialization code
path (introduced in 4.0): the first phase reads ORDER BY columns and
row identifiers correctly, but the second-phase fetch of the remaining
columns fails when the query goes through a view, silently returning
NULL.
This is a silent wrong-result bug (no error raised), which is
especially dangerous for data warehousing use cases where views are
used as abstraction layers — callers may not realize the NULLs are
incorrect.
What You Expected?
Query through a view should return the same data as the query on the
underlying physical table.
SELECT * FROM v ORDER BY id DESC LIMIT 3
= SELECT * FROM t ORDER BY id DESC LIMIT 3 (modulo any extra view columns)
How to Reproduce?
Setup
CREATE DATABASE bug_repro;
USE bug_repro;
CREATE TABLE t (
id BIGINT NOT NULL,
name VARCHAR(50),
email VARCHAR(100),
age INT
) UNIQUE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_num" = "1",
"enable_unique_key_merge_on_write" = "true"
);
INSERT INTO t VALUES
(1, 'Alice', 'alice@x.com', 30),
(2, 'Bob', 'bob@x.com', 25),
(3, 'Carol', 'carol@x.com', 35);
CREATE VIEW v AS
SELECT id, name, email, age, CAST('a' AS VARCHAR(32)) AS tag FROM t;
Baseline — querying the physical table (correct)
SELECT * FROM t ORDER BY id DESC LIMIT 3;
Output:
+----+-------+-------------+------+
| id | name | email | age |
+----+-------+-------------+------+
| 3 | Carol | carol@x.com | 35 |
| 2 | Bob | bob@x.com | 25 |
| 1 | Alice | alice@x.com | 30 |
+----+-------+-------------+------+
Bug — same query through the view
SELECT * FROM v ORDER BY id DESC LIMIT 3;
Actual output:
+----+------+-------+------+------+
| id | name | email | age | tag |
+----+------+-------+------+------+
| 3 | NULL | NULL | NULL | a |
| 2 | NULL | NULL | NULL | a |
| 1 | NULL | NULL | NULL | a |
+----+------+-------+------+------+
id (ORDER BY column) is populated correctly.
tag (a CAST constant) is populated correctly.
name, email, age (columns not in WHERE/ORDER BY) are all NULL.
What works (no bug)
SELECT COUNT(*) FROM v; -- returns 3
SELECT * FROM v WHERE id = 1; -- complete data
SELECT * FROM t ORDER BY id DESC LIMIT 3; -- complete data (physical table)
Session variables attempted — none of them fix the bug
SET enable_two_phase_read_opt = false;
SET experimental_topn_lazy_materialization_threshold = 0;
SET enable_partition_topn = false;
SET push_topn_to_agg = false;
SET topn_opt_limit_threshold = 0;
-- Still returns NULL for name/email/age:
SELECT * FROM v ORDER BY id DESC LIMIT 3;
Subquery and CTE wrapping — also do not fix the bug
-- Still returns NULL:
SELECT * FROM (SELECT * FROM v) x ORDER BY id DESC LIMIT 3;
-- Still returns NULL:
WITH x AS (SELECT * FROM v)
SELECT * FROM x ORDER BY id DESC LIMIT 3;
Workaround — replacing the view with an async materialized view returns correct data
DROP VIEW v;
CREATE MATERIALIZED VIEW v
BUILD IMMEDIATE
REFRESH AUTO ON SCHEDULE EVERY 1 MINUTE
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1")
AS SELECT id, name, email, age, CAST('a' AS VARCHAR(32)) AS tag FROM t;
-- Returns correct data:
SELECT * FROM v ORDER BY id DESC LIMIT 3;
Anything Else?
Possibly related prior work:
Workaround: replacing the regular view with an async materialized view
returns correct data, but this has performance implications
(full-refresh cost) for large base tables and is not a viable
general-purpose solution.
Are you willing to submit PR?
Code of Conduct
Search before asking
Version
doris-4.0.5-rc01-59de8c4c524 (also reproducible on 4.0.4-rc01-d74dc1fb93)
What's Wrong?
SELECT * FROM <regular_view> ORDER BY ... LIMIT Nreturns NULL forcolumns that are NOT referenced in WHERE or ORDER BY, while the same
query on the underlying physical table works correctly.
Reproduces with the smallest possible case: a single UNIQUE KEY table
with 3 rows behind a trivial view. No joins, no complex WHERE, no
large data volume.
This appears to be a regression in the TopN Lazy Materialization code
path (introduced in 4.0): the first phase reads ORDER BY columns and
row identifiers correctly, but the second-phase fetch of the remaining
columns fails when the query goes through a view, silently returning
NULL.
This is a silent wrong-result bug (no error raised), which is
especially dangerous for data warehousing use cases where views are
used as abstraction layers — callers may not realize the NULLs are
incorrect.
What You Expected?
Query through a view should return the same data as the query on the
underlying physical table.
SELECT * FROM v ORDER BY id DESC LIMIT 3
= SELECT * FROM t ORDER BY id DESC LIMIT 3 (modulo any extra view columns)
How to Reproduce?
Setup
Baseline — querying the physical table (correct)
Output:
Bug — same query through the view
Actual output:
id(ORDER BY column) is populated correctly.tag(a CAST constant) is populated correctly.name,email,age(columns not in WHERE/ORDER BY) are all NULL.What works (no bug)
Session variables attempted — none of them fix the bug
Subquery and CTE wrapping — also do not fix the bug
Workaround — replacing the view with an async materialized view returns correct data
Anything Else?
Possibly related prior work:
Workaround: replacing the regular view with an async materialized view
returns correct data, but this has performance implications
(full-refresh cost) for large base tables and is not a viable
general-purpose solution.
Are you willing to submit PR?
Code of Conduct