Skip to content

Replace JSON column in the subquery to requested subcolumns #75538

@Avogar

Description

@Avogar

Company or project name

ClickHouse

Use case

Right now when subcolumns are read from the JSON column selected in the subquery we read the whole JSON column and use getSubcolumn function:

:) create table test (data JSON, c1 UInt32, c2 UInt32) engine=MergeTree order by tuple();

CREATE TABLE test
(
    `data` JSON,
    `c1` UInt32,
    `c2` UInt32
)
ENGINE = MergeTree
ORDER BY tuple()

Query id: 05d1ba3c-51de-4f03-8483-07be6479e6df

Ok.

0 rows in set. Elapsed: 0.011 sec.

:) explain actions=1 select data.a, data.b from (select * from test)

EXPLAIN actions = 1
SELECT
    data.a,
    data.b
FROM
(
    SELECT *
    FROM test
)

Query id: 1eab9c56-71ba-4863-9c49-6881f2489e78

    ┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
 1. │ Expression ((Project names + (Projection + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))) │
 2. │ Actions: INPUT : 0 -> data JSON : 03. │          COLUMN Const(String) -> 'a'_String String : 14. │          COLUMN Const(String) -> 'b'_String String : 25. │          ALIAS data :: 0 -> __table2.data JSON : 36. │          ALIAS __table2.data :: 3 -> data JSON : 07. │          ALIAS data :: 0 -> __table1.data JSON : 38. │          FUNCTION getSubcolumn(__table1.data : 3, 'a'_String :: 1) -> getSubcolumn(__table1.data, 'a'_String) Dynamic : 09. │          FUNCTION getSubcolumn(__table1.data :: 3, 'b'_String :: 2) -> getSubcolumn(__table1.data, 'b'_String) Dynamic : 110. │          ALIAS getSubcolumn(__table1.data, 'a'_String) :: 0 -> data.a Dynamic : 211. │          ALIAS getSubcolumn(__table1.data, 'b'_String) :: 1 -> data.b Dynamic : 012. │ Positions: 2 013. │   ReadFromPreparedSource (Read from NullSource)                                                                                                                        │
    └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

13 rows in set. Elapsed: 0.002 sec.

Describe the solution you'd like

We could propogate subcolumns to the subquery so only requested subcolumns are read from the table. Like this:

:) explain actions=1 select data.a, data.b from (select data.a, data.b from test)

EXPLAIN actions = 1
SELECT
    data.a,
    data.b
FROM
(
    SELECT
        data.a,
        data.b
    FROM test
)

Query id: a0d40b82-fa91-4534-9b7b-0e719cac3d9d

    ┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
 1. │ Expression ((Project names + (Projection + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))) │
 2. │ Actions: INPUT : 0 -> data.a Dynamic : 03. │          INPUT : 1 -> data.b Dynamic : 14. │          ALIAS data.a :: 0 -> __table2.data.a Dynamic : 25. │          ALIAS data.b :: 1 -> __table2.data.b Dynamic : 06. │          ALIAS __table2.data.a :: 2 -> data.a Dynamic : 17. │          ALIAS __table2.data.b :: 0 -> data.b Dynamic : 28. │          ALIAS data.a :: 1 -> __table1.data.a Dynamic : 09. │          ALIAS data.b :: 2 -> __table1.data.b Dynamic : 110. │          ALIAS __table1.data.a :: 0 -> data.a Dynamic : 211. │          ALIAS __table1.data.b :: 1 -> data.b Dynamic : 012. │ Positions: 2 013. │   ReadFromPreparedSource (Read from NullSource)                                                                                                                        │
    └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

13 rows in set. Elapsed: 0.002 sec.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurejsonIssues and pull-requests related to the new JSON data typeperformance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions