HIVE-26980: Creation of materialized view stored by Iceberg fails if … by kasakrisz · Pull Request #3993 · apache/hive

kasakrisz · 2023-01-27T14:29:34Z

…source table has tinyint column

What changes were proposed in this pull request?

Some Hive column datatypes are currently not supported by Iceberg. In case of CTAS statements and materialized views Hive converts some of the source table column types to a compatible Iceberg column type.
For the conversion a select operator is generated. The number of input and output columns has to be the same. The number of output columns also depends on dynamic partitioning but in case of Iceberg target table partitioning is handled by the storage handler so it should be ignored.

Why are the changes needed?

To support partitioned materialized view stored by iceberg and to support ctas statements which create tables stored by Iceberg but the source table/query has a column datatype which is not supported by Iceberg.

Does this PR introduce any user-facing change?

No. But such statements runs successfuly.

How was this patch tested?

mvn test -Dtest.output.overwrite -Dtest=TestIcebergCliDriver -Dqfile=ctas_iceberg_orc.q,mv_iceberg_partitioned_orc.q,mv_iceberg_partitioned_orc2.q -pl itests/qtest-iceberg -Piceberg -Pitests

deniskuzZ · 2023-01-27T17:08:57Z

ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java

      throw new SemanticException("Unknown destination type: " + destType);
    }

+    if (!(destType == QBMetaData.DEST_DFS_FILE && qb.getIsQuery())) {


@kasakrisz, could you please elaborate on if condition? what if destType=DEST_LOCAL_FILE?
IsQuery() would be always false for CTAS and MV

IIUC when destType=DEST_LOCAL_FILE we are inserting into a local directory and no conversion is required since the target is not an actual table. Example.

hive/ql/src/test/queries/clientpositive/insert_overwrite_local_directory_1.q

Line 2 in 0a5558d

insert overwrite local directory '../../data/files/local_src_table_1'

I extended the if condition with checking whether the destinationTable is null and does it has a storage handler.

yes, isQuery should be false for ctas and mv

hive/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java

Lines 2443 to 2444 in 0a5558d

if (qb.isCTAS() || qb.isMaterializedView()) {

qb.setIsQuery(false);

InvisibleProgrammer · 2023-01-30T20:44:37Z

Hi, @kasakrisz !

Thank you for fixing that issue. I have faced with the exact same in https://issues.apache.org/jira/browse/HIVE-26977?

Could you please add that qtest as well to validate your fix fixes the other issue as well. I already tested it on my computer and it worked.

set iceberg.mr.schema.auto.conversion=true;
set hive.vectorized.execution.enabled=true;
set hive.explain.user=false;

drop table if exists source_table;
create external table source_table
(
    id char(16)
);

insert into source_table values ('ID_1');
insert into source_table values ('ID_2');

drop table if exists target_table;

explain
create table target_table
stored by iceberg
stored as orc
as select * from source_table;

create table target_table
stored by iceberg
stored as orc
as select * from source_table;

select count(*) from target_table;

Thank you,
Zsolt

…source table has tinyint column

kasakrisz · 2023-01-31T07:08:21Z

@InvisibleProgrammer
Thanks for validating this patch. I checked your test case and found that the test iceberg/iceberg-handler/src/test/queries/positive/ctas_iceberg_orc.q updated by this patch covers your scenario. The only difference is that you explicitly enabled vectorization but that is enabled by default in PTests.

InvisibleProgrammer · 2023-01-31T07:16:18Z

The only difference is that you explicitly enabled vectorization but that is enabled by default in PTests.

Sad to hear. That means, if we run the test on the build server, we get a different result than running it on our machine. Could you please add that vectorization test to the setting to get the exact same behavior?

deniskuzZ

LGTM +1, pending tests

sonarqubecloud · 2023-01-31T07:57:41Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
No Duplication information

kasakrisz self-assigned this Jan 27, 2023

kgyrtkirk added the tests pending label Jan 27, 2023

kasakrisz requested a review from deniskuzZ January 27, 2023 14:38

deniskuzZ reviewed Jan 27, 2023

View reviewed changes

kgyrtkirk added tests failed and removed tests pending labels Jan 27, 2023

kasakrisz force-pushed the HIVE-26980-master-convert-schema branch from 5cfab02 to 5e24c8d Compare January 30, 2023 07:42

kgyrtkirk added tests pending tests unstable tests passed and removed tests failed tests pending tests unstable labels Jan 30, 2023

kasakrisz added 4 commits January 31, 2023 06:52

HIVE-26980: Creation of materialized view stored by Iceberg fails if …

6090b56

…source table has tinyint column

get serializer from tableDescriptor because destinationTable can be bull

d104452

add conversion operator for ctas only if storage handler exists

7aa910c

add tests

084ec5c

kasakrisz force-pushed the HIVE-26980-master-convert-schema branch from 8ba31a4 to 084ec5c Compare January 31, 2023 07:03

kgyrtkirk added tests pending and removed tests passed labels Jan 31, 2023

kasakrisz requested a review from deniskuzZ January 31, 2023 07:08

deniskuzZ approved these changes Jan 31, 2023

View reviewed changes

kgyrtkirk added tests passed and removed tests pending labels Jan 31, 2023

kasakrisz merged commit 8fe73e6 into apache:master Jan 31, 2023

kasakrisz deleted the HIVE-26980-master-convert-schema branch January 31, 2023 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-26980: Creation of materialized view stored by Iceberg fails if …#3993

HIVE-26980: Creation of materialized view stored by Iceberg fails if …#3993
kasakrisz merged 4 commits intoapache:masterfrom
kasakrisz:HIVE-26980-master-convert-schema

kasakrisz commented Jan 27, 2023

Uh oh!

deniskuzZ Jan 27, 2023 •

edited

Loading

Uh oh!

kasakrisz Jan 31, 2023

Uh oh!

deniskuzZ Jan 31, 2023

Uh oh!

InvisibleProgrammer commented Jan 30, 2023

Uh oh!

kasakrisz commented Jan 31, 2023

Uh oh!

InvisibleProgrammer commented Jan 31, 2023

Uh oh!

deniskuzZ left a comment

Uh oh!

sonarqubecloud bot commented Jan 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if (qb.isCTAS() \|\| qb.isMaterializedView()) {
	qb.setIsQuery(false);

Conversation

kasakrisz commented Jan 27, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

deniskuzZ Jan 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kasakrisz Jan 31, 2023

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Jan 31, 2023

Choose a reason for hiding this comment

Uh oh!

InvisibleProgrammer commented Jan 30, 2023

Uh oh!

kasakrisz commented Jan 31, 2023

Uh oh!

InvisibleProgrammer commented Jan 31, 2023

Uh oh!

deniskuzZ left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Jan 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

deniskuzZ Jan 27, 2023 •

edited

Loading