Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][HUDI-1180] Upgrade HBase to 2.4.9 #4020

Closed
wants to merge 5 commits into from

Conversation

codope
Copy link
Member

@codope codope commented Nov 17, 2021

What is the purpose of the pull request

Upgrade HBase version from 1.x to 2.x.

Brief change log

  • Upgrade HBase version from 1.2.3 to 2.4.7 and Hadoop from 2.7.3 to 2.10.1.
  • KeyValue.KVComparator is deprecated in 2.x and completely removed from 3.x. We need to extend CellComparatorImpl which implements CellComparator interface.
  • HBase 2.x reader and writer factory methods changed.

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

try {
fileInfo = reader.loadFileInfo();
ByteBuffer serializedFilter = reader.getMetaBlock(KEY_BLOOM_FILTER_META_BLOCK, false);
fileInfo = reader.getHFileInfo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little worried,This method does not exist even in hbase2.2.3

@codope
Copy link
Member Author

codope commented Nov 24, 2021

@danny0405 Could you please help in testing this patch for Flink?

@vinothchandar vinothchandar changed the title [WIP][HUDI-2783] Upgrade HBase [WIP][HUDI-2783] Upgrade HBase to 2.x Nov 24, 2021
ReaderContext context = new ReaderContextBuilder()
.withFilePath(path)
.withInputStreamWrapper(stream)
.withFileSize(getFs("hoodie", conf).getFileStatus(path).getLen())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a fake path so getFileStatus() will fail. file size could be content.length?

.build();
HFileInfo fileInfo = new HFileInfo(context, conf);
this.reader = HFile.createReader(context, fileInfo, new CacheConfig(conf), conf);
fileInfo.initMetaAndIndex(reader);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use of this explicit call? Is this required or some optimization?

<groupId>org.apache.hbase</groupId>
<artifactId>hbase-hadoop-compat</artifactId>
<version>${hbase.version}</version>
</dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many new jar dependencies and i'm worried about the conflicts, we must exclude the jars that are conflict-prone like google guava explicitly if there are indirect dependency.

@codope codope force-pushed the hudi-2783-hbase-upgrade branch 2 times, most recently from a23ba86 to 790d2ba Compare December 9, 2021 12:21
@alexeykudinkin
Copy link
Contributor

@hudi-bot run azure

return readAllRecords(schema, schema);
}

public List<Pair<String, R>> readRecords(List<String> keys) throws IOException {
reader.loadFileInfo();
Schema schema = new Schema.Parser().parse(new String(reader.loadFileInfo().get(KEY_SCHEMA.getBytes())));
reader.getHFileInfo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this here?

@vinothchandar vinothchandar added this to Ready for Review in PR Tracker Board Dec 10, 2021
@alexeykudinkin
Copy link
Contributor

I've been trying to address to fix IT tests after HBase upgrade and kept hitting HBase classes conflicts b/w our HBase deps and Hadoop 2.x deps (there are non-BWC changes).

Tried shading our HBase classes but that didn't go too well, and as such decided to go ahead and try upgrading Hadoop to 3.3.x branch.

It's WIP and you can track progress in the following PR #4286

@vinothchandar vinothchandar moved this from Ready for Review to Under Discussion PRs in PR Tracker Board Dec 25, 2021
@yihua
Copy link
Contributor

yihua commented Feb 17, 2022

I'm going to take over this PR and get it ready for review.

@yihua yihua changed the title [WIP][HUDI-2783] Upgrade HBase to 2.x [HUDI-1180] Upgrade HBase to 2.x Feb 17, 2022
@yihua yihua changed the title [HUDI-1180] Upgrade HBase to 2.x [WIP][HUDI-1180] Upgrade HBase to 2.x Feb 17, 2022
@yihua yihua changed the title [WIP][HUDI-1180] Upgrade HBase to 2.x [WIP][HUDI-1180] Upgrade HBase to 2.4.9 Feb 17, 2022
Fix some unit tests

Resolve dependency issue

Upgrade Hadoop to 2.10.1 and fix HFile inline reader test

Separate hbase shaded version for presto bundle

Resolve hbase dep conflicts in flink, utilities and hadoop-mr bundles
Diasble access time validation
@yihua yihua self-assigned this Feb 17, 2022
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua
Copy link
Contributor

yihua commented Mar 11, 2022

I'm closing this in favor of #5004 which has more changes and deviates from this one which has conflicts with master.

@yihua yihua closed this Mar 11, 2022
PR Tracker Board automation moved this from Under Discussion PRs to Done Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

9 participants