Skip to content

HIVE-26980: Creation of materialized view stored by Iceberg fails if …#3993

Merged
kasakrisz merged 4 commits intoapache:masterfrom
kasakrisz:HIVE-26980-master-convert-schema
Jan 31, 2023
Merged

HIVE-26980: Creation of materialized view stored by Iceberg fails if …#3993
kasakrisz merged 4 commits intoapache:masterfrom
kasakrisz:HIVE-26980-master-convert-schema

Conversation

@kasakrisz
Copy link
Contributor

…source table has tinyint column

What changes were proposed in this pull request?

Some Hive column datatypes are currently not supported by Iceberg. In case of CTAS statements and materialized views Hive converts some of the source table column types to a compatible Iceberg column type.
For the conversion a select operator is generated. The number of input and output columns has to be the same. The number of output columns also depends on dynamic partitioning but in case of Iceberg target table partitioning is handled by the storage handler so it should be ignored.

Why are the changes needed?

To support partitioned materialized view stored by iceberg and to support ctas statements which create tables stored by Iceberg but the source table/query has a column datatype which is not supported by Iceberg.

Does this PR introduce any user-facing change?

No. But such statements runs successfuly.

How was this patch tested?

mvn test -Dtest.output.overwrite -Dtest=TestIcebergCliDriver -Dqfile=ctas_iceberg_orc.q,mv_iceberg_partitioned_orc.q,mv_iceberg_partitioned_orc2.q -pl itests/qtest-iceberg -Piceberg -Pitests

@kasakrisz kasakrisz self-assigned this Jan 27, 2023
@kasakrisz kasakrisz requested a review from deniskuzZ January 27, 2023 14:38
throw new SemanticException("Unknown destination type: " + destType);
}

if (!(destType == QBMetaData.DEST_DFS_FILE && qb.getIsQuery())) {
Copy link
Member

@deniskuzZ deniskuzZ Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kasakrisz, could you please elaborate on if condition? what if destType=DEST_LOCAL_FILE?
IsQuery() would be always false for CTAS and MV

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@InvisibleProgrammer
Copy link
Contributor

Hi, @kasakrisz !

Thank you for fixing that issue. I have faced with the exact same in https://issues.apache.org/jira/browse/HIVE-26977?

Could you please add that qtest as well to validate your fix fixes the other issue as well. I already tested it on my computer and it worked.

set iceberg.mr.schema.auto.conversion=true;
set hive.vectorized.execution.enabled=true;
set hive.explain.user=false;

drop table if exists source_table;
create external table source_table
(
    id char(16)
);

insert into source_table values ('ID_1');
insert into source_table values ('ID_2');

drop table if exists target_table;

explain
create table target_table
stored by iceberg
stored as orc
as select * from source_table;

create table target_table
stored by iceberg
stored as orc
as select * from source_table;

select count(*) from target_table;

Thank you,
Zsolt

@kasakrisz kasakrisz force-pushed the HIVE-26980-master-convert-schema branch from 8ba31a4 to 084ec5c Compare January 31, 2023 07:03
@kasakrisz
Copy link
Contributor Author

@InvisibleProgrammer
Thanks for validating this patch. I checked your test case and found that the test iceberg/iceberg-handler/src/test/queries/positive/ctas_iceberg_orc.q updated by this patch covers your scenario. The only difference is that you explicitly enabled vectorization but that is enabled by default in PTests.

@kasakrisz kasakrisz requested a review from deniskuzZ January 31, 2023 07:08
@InvisibleProgrammer
Copy link
Contributor

The only difference is that you explicitly enabled vectorization but that is enabled by default in PTests.

Sad to hear. That means, if we run the test on the build server, we get a different result than running it on our machine. Could you please add that vectorization test to the setting to get the exact same behavior?

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1, pending tests

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@kasakrisz kasakrisz merged commit 8fe73e6 into apache:master Jan 31, 2023
@kasakrisz kasakrisz deleted the HIVE-26980-master-convert-schema branch January 31, 2023 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants