Skip to content

Conversation

@AshinGau
Copy link
Member

@AshinGau AshinGau commented Jun 30, 2023

Proposed changes

Fix tow bugs:

  1. COW & Read Optimized table will use hive splitter to split files, but it can't recognize some specific files.
ERROR 1105 (HY000): errCode = 2, detailMessage =
(172.21.0.101)[CORRUPTION]Invalid magic number in parquet file, bytes read: 3035, file size: 3035,
path: /usr/hive/warehouse/hudi.db/test/.hoodie/metadata/.hoodie/00000000000000.deltacommit.inflight, read magic:
  1. The read optimized table created by spark will add empty partition even if the table has no partition, so we have to filter these empty partition keys in hive client.
| test_ro | CREATE TABLE `test_ro`(
  `_hoodie_commit_time` string COMMENT '',
  ...
  `ts` bigint COMMENT '')
PARTITIONED BY (
 `` string)
ROW FORMAT SERDE

@github-actions github-actions bot added the area/planner Issues or PRs related to the query planner label Jun 30, 2023
@AshinGau
Copy link
Member Author

run buildall

morningman added a commit to morningman/doris that referenced this pull request Jun 30, 2023
@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 42.14 seconds
stream load tsv: 458 seconds loaded 74807831229 Bytes, about 155 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 56 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 69.0 seconds inserted 10000000 Rows, about 144K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230630094611_clickbench_pr_170456.html

@morningman morningman added the dev/2.0.0 2.0.0 release label Jun 30, 2023
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 30, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 0e17cd4 into apache:master Jul 1, 2023
morningman pushed a commit to morningman/doris that referenced this pull request Jul 3, 2023
Fix tow bugs:

COW & Read Optimized table will use hive splitter to split files, but it can't recognize some specific files.
ERROR 1105 (HY000): errCode = 2, detailMessage =
(172.21.0.101)[CORRUPTION]Invalid magic number in parquet file, bytes read: 3035, file size: 3035,
path: /usr/hive/warehouse/hudi.db/test/.hoodie/metadata/.hoodie/00000000000000.deltacommit.inflight, read magic:
The read optimized table created by spark will add empty partition even if the table has no partition, so we have to filter these empty partition keys in hive client.
| test_ro | CREATE TABLE `test_ro`(
  `_hoodie_commit_time` string COMMENT '',
  ...
  `ts` bigint COMMENT '')
PARTITIONED BY (
 `` string)
ROW FORMAT SERDE
@xiaokang xiaokang added dev/2.0.0-merged and removed dev/2.0.0 2.0.0 release labels Jul 3, 2023
xiaokang pushed a commit that referenced this pull request Jul 3, 2023
Fix tow bugs:

COW & Read Optimized table will use hive splitter to split files, but it can't recognize some specific files.
ERROR 1105 (HY000): errCode = 2, detailMessage =
(172.21.0.101)[CORRUPTION]Invalid magic number in parquet file, bytes read: 3035, file size: 3035,
path: /usr/hive/warehouse/hudi.db/test/.hoodie/metadata/.hoodie/00000000000000.deltacommit.inflight, read magic:
The read optimized table created by spark will add empty partition even if the table has no partition, so we have to filter these empty partition keys in hive client.
| test_ro | CREATE TABLE `test_ro`(
  `_hoodie_commit_time` string COMMENT '',
  ...
  `ts` bigint COMMENT '')
PARTITIONED BY (
 `` string)
ROW FORMAT SERDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/planner Issues or PRs related to the query planner dev/2.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants