Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28127: Exception when rebuilding materialized view with calculated columns on iceberg sources #5143

Merged
merged 1 commit into from Apr 16, 2024

Conversation

kasakrisz
Copy link
Contributor

@kasakrisz kasakrisz commented Mar 20, 2024

What changes were proposed in this pull request?

Incremental materialized view rebuild plan has a step when the AST built from the optimized Calcite plan is transformed to an AST which represents a multi insert statement when the MV definition has aggregate and no record were deleted from the source tables since the last rebuild. To update existing values in the MV one of the branches of the multi insert is writing delete delta another is writing insert delta files. In case of Iceberg source tables the one which writes the delete delta needs the old values from the MV. If the MV definition has aggregate functions which return value can be calculated from other projected expressions such columns are not projected from the MV in the incremental rebuild plan.

Example MV from mv_iceberg_orc5.q

create materialized view mat2 stored by iceberg stored as orc tblproperties ('format-version'='2') as
select tbl_ice.b, tbl_ice.c, sum(tbl_ice_v2.f), count(tbl_ice_v2.f), avg(tbl_ice_v2.f)
from tbl_ice
join tbl_ice_v2 on tbl_ice.a=tbl_ice_v2.d where tbl_ice.c > 52
group by tbl_ice.b, tbl_ice.c;

In the AST built from Calcite plan the branch scanning the MV does not project avg

            TOK_QUERY
               TOK_FROM
                  TOK_TABREF
                     TOK_TABNAME
                        default
                        mat2
                     TOK_TABLEPROPERTIES
                        TOK_TABLEPROPLIST
                     default.mat2
               TOK_INSERT
                  TOK_DESTINATION
                     TOK_DIR
                        TOK_TMP_FILE
                  TOK_SELECT
                     TOK_SELEXPR
                        .
                           TOK_TABLE_OR_COL
                              default.mat2
                           b
                        b
                     TOK_SELEXPR
                        .
                           TOK_TABLE_OR_COL
                              default.mat2
                           c
                        c
                     TOK_SELEXPR
                        .
                           TOK_TABLE_OR_COL
                              default.mat2
                           _c2
                        _c2
                     TOK_SELEXPR
                        .
                           TOK_TABLE_OR_COL
                              default.mat2
                           _c3
                        _c3
                     TOK_SELEXPR
                        true
                        $f4
            $hdt$_0

But it is referenced later in the multi insert plan's delete branch (see select expression $hdt$_0._c4):

   TOK_INSERT
      TOK_INSERT_INTO
         TOK_TAB
            TOK_TABNAME
               default
               mat2
      TOK_SELECT
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  $hdt$_0
               PARTITION__SPEC__ID
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  $hdt$_0
               PARTITION__HASH
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  $hdt$_0
               FILE__PATH
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  $hdt$_0
               ROW__POSITION
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  $hdt$_0
               b
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  $hdt$_0
               c
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  $hdt$_0
               _c2
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  $hdt$_0
               _c3
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  $hdt$_0
               _c4
      TOK_WHERE
         .
            TOK_TABLE_OR_COL
               $hdt$_0
            $f4
      TOK_SORTBY
         TOK_TABSORTCOLNAMEASC
            TOK_NULLS_FIRST
               .
                  TOK_TABLE_OR_COL
                     $hdt$_0
                  PARTITION__SPEC__ID
         TOK_TABSORTCOLNAMEASC
            TOK_NULLS_FIRST
               .
                  TOK_TABLE_OR_COL
                     $hdt$_0
                  PARTITION__HASH
         TOK_TABSORTCOLNAMEASC
            TOK_NULLS_FIRST
               .
                  TOK_TABLE_OR_COL
                     $hdt$_0
                  FILE__PATH
         TOK_TABSORTCOLNAMEASC
            TOK_NULLS_FIRST
               .
                  TOK_TABLE_OR_COL
                     $hdt$_0
                  ROW__POSITION

Why are the changes needed?

Old values coming from the MV are necessary for delete Iceberg writers but the current plan does not project calculated column hence Exception is thrown at compile time when generating Hive operator plan.

Does this PR introduce any user-facing change?

No.

Is the change a dependency upgrade?

No.

How was this patch tested?

mvn test -Dtest.output.overwrite -Dtest=TestIcebergCliDriver -Dqfile=mv_iceberg_orc5.q -pl itests/qtest-iceberg -Piceberg -Pitests

Copy link

sonarcloud bot commented Mar 26, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor stuff

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@kasakrisz kasakrisz merged commit f06cc29 into apache:master Apr 16, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants