HIVE-24827. Hive aggregation query returns incorrect results for non text files. #2018

ayushtkn · 2021-02-25T07:11:24Z

No description provided.

kgyrtkirk

will this also work for in cases when the input data is compressed?
I think the existing one did worked for that as well...tests passed so it was either not covered or still working fine :)

kgyrtkirk · 2021-02-26T09:26:29Z

ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java

+      if (footerCount > 0 && table.getInputFileFormatClass() != null
+          && !TextInputFormat.class
+          .isAssignableFrom(table.getInputFileFormatClass())) {
+        LOG.warn("skip.footer.line.count is only valid for TextInputFormat "
+            + "files, ignoring the value.");
+        footerCount = 0;
+      }


seems to be a duplicate block ; you could move it into a method

ayushtkn · 2021-02-26T11:24:40Z

Thanx @kgyrtkirk for the review. I tried the scenario in HIVE-24224. I think it worked as expected:

+-------------------+--------------------+----------------+
| bz2tst2.sequence  |     bz2tst2.id     | bz2tst2.other  |
+-------------------+--------------------+----------------+
| 9                 | 20200315 X00 1356  | 123            |
| 17                | 20200315 X00 1357  | 123            |
+-------------------+--------------------+----------------+

The file in was :

 printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 1357\",123\nrst,rst,rst" > data.csv

bzip2 -f data.csv 

hdfs dfs -put data.csv.bz2 hdfs://hostname:8020/warehouse/tablespace/external/hive/bz2tst2

Table is

+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `bz2tst2`(                   |
|   `sequence` int,                                  |
|   `id` string,                                     |
|   `other` string)                                  |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.serde2.OpenCSVSerde'     |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.mapred.TextInputFormat'       |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION                                           |
|   'hdfs://ayushsaxena-3.ayushsaxena.root.hwx.site:8020/warehouse/tablespace/external/hive/bz2tst2' |
| TBLPROPERTIES (                                    |
|   'bucketing_version'='2',                         |
|   'skip.footer.line.count'='1',                    |
|   'skip.header.line.count'='1',                    |
|   'transient_lastDdlTime'='1614334965')            |
+----------------------------------------------------+

Seems working, the UT was working. :-)

ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java

…text files.

…text files. (apache#2018) (Ayush Saxena reviewed by Zoltan Haindrich)

kgyrtkirk added tests pending tests failed tests passed and removed tests pending tests failed labels Feb 25, 2021

kgyrtkirk reviewed Feb 26, 2021

View reviewed changes

kgyrtkirk added tests pending and removed tests passed labels Feb 26, 2021

kgyrtkirk added tests unstable tests pending tests passed and removed tests pending tests unstable labels Feb 26, 2021

kgyrtkirk reviewed Mar 2, 2021

View reviewed changes

ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java Outdated Show resolved Hide resolved

kgyrtkirk added tests pending tests unstable and removed tests passed tests pending labels Mar 2, 2021

kgyrtkirk added tests pending tests failed and removed tests unstable tests pending labels Mar 3, 2021

ayushtkn added 3 commits March 3, 2021 18:35

HIVE-24827. Hive aggregation query returns incorrect results for non …

1851173

…text files.

Refactor.

5783f54

Fix Review Comments.

ad1a9da

ayushtkn force-pushed the HIVE-24827 branch from 3db77a2 to ad1a9da Compare March 3, 2021 13:09

kgyrtkirk added tests pending tests passed and removed tests failed tests pending labels Mar 3, 2021

kgyrtkirk approved these changes Mar 8, 2021

View reviewed changes

kgyrtkirk merged commit 8b0542f into apache:master Mar 8, 2021

aihuaxu pushed a commit to aihuaxu/hive that referenced this pull request Mar 17, 2021

HIVE-24827: Hive aggregation query returns incorrect results for non …

02c1cb4

…text files. (apache#2018) (Ayush Saxena reviewed by Zoltan Haindrich)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-24827. Hive aggregation query returns incorrect results for non text files. #2018

HIVE-24827. Hive aggregation query returns incorrect results for non text files. #2018

ayushtkn commented Feb 25, 2021

kgyrtkirk left a comment

kgyrtkirk Feb 26, 2021

ayushtkn commented Feb 26, 2021

HIVE-24827. Hive aggregation query returns incorrect results for non text files. #2018

HIVE-24827. Hive aggregation query returns incorrect results for non text files. #2018

Conversation

ayushtkn commented Feb 25, 2021

kgyrtkirk left a comment

Choose a reason for hiding this comment

kgyrtkirk Feb 26, 2021

Choose a reason for hiding this comment

ayushtkn commented Feb 26, 2021