Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Connector]Add hbase source connector #6348

Merged
merged 22 commits into from
May 15, 2024
Merged

[Connector]Add hbase source connector #6348

merged 22 commits into from
May 15, 2024

Conversation

TaoZex
Copy link
Contributor

@TaoZex TaoZex commented Feb 12, 2024

Purpose of this pull request

#3018

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

@TaoZex TaoZex marked this pull request as ready for review February 14, 2024 05:39
@TaoZex
Copy link
Contributor Author

TaoZex commented Feb 14, 2024

Create hbase table and inset 10000 rows:
image
Use hbase source connector to read table:
hbase_res

Copy link
Member

@hailin0 hailin0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update plugin-mapping.properties?

@TaoZex
Copy link
Contributor Author

TaoZex commented Feb 26, 2024

Thanks for your review. @hailin0 @Hisoka-X Recently I am busy with my graduation project, I will fix it next week according to the comments.

@zz977
Copy link

zz977 commented Mar 11, 2024

Excuse me, is there any support about Kerberos authentication in hbase source connector?

@TaoZex
Copy link
Contributor Author

TaoZex commented Apr 10, 2024

I have modified the code according to the suggestion, thanks for @hailin0 @Hisoka-X review. I have communicated with @lihjChina on wechat. At present, he has used this pr in the production environment of the company. After this pr is merged, next he will contribute to add Kerberos authentication in hbase source connector.

@TaoZex TaoZex requested a review from Hisoka-X April 10, 2024 00:09
@Hisoka-X
Copy link
Member

cc @TyrantLucifer as well.

hailin0
hailin0 previously approved these changes Apr 18, 2024
Copy link
Member

@hailin0 hailin0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@TyrantLucifer TyrantLucifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, Could you please offer some screeshots to verify it worked? Maybe require three engines(flink/spark/zeta).
There are also relevant screenshots of the data that need to be verified for correct data reading.
Due to hbase e2e did not be enabled in CI/CD, so there has no way to prove that this connector is usable.

Connection connection = ConnectionFactory.createConnection(hbaseConfiguration);
return connection;
} catch (IOException e) {
throw new RuntimeException(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use connector exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

return Double.valueOf(Bytes.toString(cell));
case BYTES:
return cell;
case DATE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

date/time/timestamp as String? It maybe has some problem. We can add parameters to setting the format of these types as text file connector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

date/time/timestamp as String? It maybe has some problem. We can add parameters to setting the format of these types as text file connector.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for advice. I have already modified it.

@TaoZex
Copy link
Contributor Author

TaoZex commented Apr 27, 2024

Use my hbase cluster for testing:
image

For testing different engines, generate 10000 rows and written to hbase.
Hbase source to console sink in different engines:
flink:
image

spark:
image

zeta:
image

@TaoZex TaoZex closed this Apr 30, 2024
@TaoZex TaoZex reopened this Apr 30, 2024
@EricJoy2048
Copy link
Member

Why not use hbase docker?

@TaoZex
Copy link
Contributor Author

TaoZex commented Apr 30, 2024

Why not use hbase docker?

This requires adding mapping information.
1714450756121

|--------------------|--------|----------|---------------|
| zookeeper_quorum | string | yes | - |
| table | string | yes | - |
| query_columns | list | yes | - |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this option we can implement schema projection ? If true, You need update here

[x] [schema projection](../../concept/connector-v2-features.md)

and the HbaseSource need implement SupportColumnProjection interface.

HbaseSource need implement SupportParallelism too, because it support parallelism .

@EricJoy2048
Copy link
Member

I found that the Hbase connector does not implement the Catalog interface. Do you have any plans to implement it?

@EricJoy2048
Copy link
Member

Please fix ci

@hailin0 hailin0 merged commit f108a5e into apache:dev May 15, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants