Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-8862] [HBase] Support HBase snapshot read #5639

Closed
wants to merge 3 commits into from

Conversation

neoremind
Copy link
Contributor

What is the purpose of the change

Flink-hbase connector only supports reading/scanning HBase over region server scanner, there is also snapshot scanning solution, just like Hadoop provides 2 ways to scan HBase, one is TableInputFormat, the other is TableSnapshotInputFormat, so it would be great if flink supports both solutions to ensure more wider usage scope and provide alternatives for users.

Brief change log

  • Create TableInputSplitStrategy interface and its implementations as abstraction logic for AbstractTableInputFormat
  • Update HBaseRowInputFormat and TableInputFormat
  • Add HBaseSnapshotRowInputFormat and TableSnapshotInputFormat
  • Extract 2 interfaces including HBaseTableScannerAware and ResultToTupleMapper
  • Add HBaseSnapshotReadExample

Verifying this change

This change is already covered by existing tests as follows, and new test cases has been added as well.

org.apache.flink.addons.hbase.HBaseConnectorITCase

This change added tests and can be verified as follows:

  • Manually create one snapshot for a specific HBase table, and use TableSnapshotInputFormat to do full scan.
  • Running existing HBaseReadExample to do full scan.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
  • For document, please visit JIRA ticket, a detailed design doc and class diagram have been attached.

@neoremind
Copy link
Contributor Author

@zentol I was thinking could you help me review the feature and code? Or is there anyone more appropriate for this PR? many thanks.

@neoremind
Copy link
Contributor Author

@ramkrish86 @fhueske could you help to review this PR? Since I noticed that you guys contribute most of the code. This update enables HBase snapshot read and I refactor some of the code and test cases , you can find design doc and class diagram on https://issues.apache.org/jira/projects/FLINK/issues/FLINK-8862?filter=allopenissues. Thanks!

@fhueske
Copy link
Contributor

fhueske commented Mar 13, 2018

Thanks for the PR @neoremind.
At the moment, the community is busy working on the 1.5 release which means that PRs for 1.5 fixes have priority right now. Also a large contribution such as this one takes a lot of time to review. Unfortunately, I won't be able to review the PR in the near future. Best, Fabian

@neoremind
Copy link
Contributor Author

@fhueske Thanks for your response. I understand this case. Please take your time, hope this PR can be reviewed in the future and help people who needed. Thanks!

@snuyanzin
Copy link
Contributor

snuyanzin commented Oct 19, 2023

@neoremind
The Flink HBase connector resides in it's own repository nowadays, if this code change is still relevant, please open the PR in https://github.com/apache/flink-connector-hbase/

@snuyanzin snuyanzin closed this Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants