Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#3362] feat(flink-connector): Add the code skeleton for flink-connector #2635

Merged
merged 28 commits into from
May 24, 2024

Conversation

coolderli
Copy link
Contributor

@coolderli coolderli commented Mar 22, 2024

What changes were proposed in this pull request?

  • support GravitinoCatalogStore to register the catalog. In the MR, we will support to create the hive catalog.

Why are the changes needed?

Does this PR introduce any user-facing change?

  • support flink in gravitino

How was this patch tested?

  • add UTs

@jerryshao
Copy link
Contributor

@coolderli can we please move forward the flink support?

@coolderli coolderli changed the title feat(flink): support flink in gravitino [#3362] feat(flink): support GravitinoCatalogStore to register the catalog May 13, 2024
@coolderli coolderli marked this pull request as ready for review May 13, 2024 07:31
@coolderli
Copy link
Contributor Author

@coolderli can we please move forward the flink support?

@jerryshao Of course. I will finish the first one today.

@coolderli
Copy link
Contributor Author

@jerryshao @FANNG1 Could you please help view this? Thanks.

@coolderli coolderli closed this May 13, 2024
@coolderli coolderli reopened this May 13, 2024

@Override
public Catalog createCatalog(Context context) {
this.hiveCatalogFactory = new HiveCatalogFactory();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the HiveCatalogFactory from flink-connector-hive to create the HiveCatalog.

flink-connector/build.gradle.kts Outdated Show resolved Hide resolved
integration-test/build.gradle.kts Outdated Show resolved Hide resolved
integration-test/build.gradle.kts Outdated Show resolved Hide resolved
#

com.datastrato.gravitino.flink.connector.store.GravitinoCatalogStoreFactory
com.datastrato.gravitino.flink.connector.hive.GravitinoHiveCatalogFactory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If supporting Iceberg, just provide another GravitinoIcebergCatalogFactory ?

Copy link
Contributor Author

@coolderli coolderli May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FANNG1 Yes. This is what I want to talk about. Do we implement one CatalogFactory for each storage, or just one GravitinoCatalogFactory? If we use one GravitinoCatalogFactory, we may need another property to identify which real catalog should be used.

Another question is do we need to use GravitinoCatalog for all storage. I think the CatalogFactory should be consistent with the Catalog. Flink supports registering the catalog manually, if we only give a GravitinoCatalog, we can also simplify the usage.

The original usage:https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/catalogs/#using-sql-ddl

// Create a HiveCatalog 
Catalog catalog = new HiveCatalog("myhive", null, "<path_of_hive_conf>");

// Register the catalog
tableEnv.registerCatalog("myhive", catalog);

Use one GravitinoCatalog.

Map<String, String> properties = new HashMap<>();
**properties.put("catalog.type", "hive"); // another property used to identify the real catalog**

Catalog catalog = new GravitinoCatalog("gravitino", properties)

// Register the catalog
tableEnv.registerCatalog("myhive", catalog);

Use different GravitinoCatalog for different storage:

Catalog catalog = new GravitinoHiveCatalog("gravitino", xxx)

// Register the catalog
tableEnv.registerCatalog("myhive", catalog);

which do you prefer? Can you share your thoughts? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also help review it? Thanks. @hackergin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer different GravitinoCatalog, because this solution seems simple for user.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the Catalog Store is essentially for storing the configuration of Flink Catalog. Therefore, if users create a catalog using the following DDL, we should support saving the configuration of the Catalog.

CREATE CATALOG myhive WITH (
  'type' = 'hive',
  'hive-conf-dir' = '/opt/hive-conf'
);

At the same time, when constructing a real Catalog instance, we can automatically create a GravitinoCatalog instead of a HiveCatalog.

And we should prohibit the direct creation of Catalog instances.

tableEnv.registerCatalog("myhive", catalog);

When using the Catalog instance to register this Catalog with the table env, this Catalog will not be persisted to the Catalog Store.

@coolderli
Copy link
Contributor Author

@FANNG1 I have updated the MR. Please help review it when you have time. Thanks.

@coolderli coolderli changed the title [#3362] feat(flink): support GravitinoCatalogStore to register the catalog [#3362] feat(flink-connector): support GravitinoCatalogStore to register the catalog May 16, 2024
@coolderli coolderli force-pushed the init-flink branch 2 times, most recently from 77806f4 to 7e20a11 Compare May 20, 2024 08:49
@coolderli
Copy link
Contributor Author

@FANNG1 Could you help review this MR again? Thanks.

this.hiveCatalogFactory = new HiveCatalogFactory();
final FactoryUtil.CatalogFactoryHelper helper =
FactoryUtil.createCatalogFactoryHelper(this, context);
helper.validateExcept(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will skip the validation with the specified prefix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if user add a properties like flink.bypass.aa by mistake, this will cause all Flink jobs fail? I prefer to ignore it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FANNG1 It's good for users. How about just removing the validation? Otherwise, I have to remove these useless key before the validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FANNG1 I thought about it again. The validation is needed, it can be used to validate the required options such as metastore.uris. Let me check how to finish it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comment before helper.validateExcept to descript how it works

@coolderli
Copy link
Contributor Author

@FANNG1 I think I have already rebased the main branch. But the GitHub CI still failed. Can you help me with it? Thanks.

Copy link
Contributor Author

@coolderli coolderli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FANNG1 I have finished it. Make hive.metastore.uris as required and ignore the unknown key. Please take a look. Thanks.

Assertions.assertEquals(
"unknown.value",
flinkProperties.get(flinkByPass("unknown.key")),
"The unknown.key will not cause failure and will be saved in Gravitino.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unknown key will be saved in Gravitino but not be ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it works? the unknow.key not exists in requiredOptions or optionalOptions and the customer CatalogFactoryHelper only works when loading catalog from Gravitino. Do I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FANNG1 When creating the catalog, it will pass the unknow.key to the Context of GravitinoHiveCatalog and the CatalogDescription of the GravitinoCatalogStore. The GravitinoCatalogStore will not validate the unknow.key. And we use FactoryUtils.GravitinoCatalogFactoryHelper instead of FactoryUtil.CatalogFactoryHelper to skip the validation. So the unknow.key will save to the Gravitino successfully.

Of course, there is a little weird about the key that is not in the optionalOptions not failing the Flink Job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FANNG1 Do you think this is reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a little weird,but let's go ahead

@FANNG1 FANNG1 merged commit 2113efd into apache:main May 24, 2024
26 checks passed
@FANNG1
Copy link
Contributor

FANNG1 commented May 24, 2024

@coolderli , thanks for proposing the base PR for Flink, a big step!

diqiu50 pushed a commit to diqiu50/gravitino that referenced this pull request Jun 13, 2024
…connector (apache#2635)

<!--
1. Title: [#<issue>] <type>(<scope>): <subject>
   Examples:
     - "[apache#123] feat(operator): support xxx"
     - "[apache#233] fix: check null before access result in xxx"
     - "[MINOR] refactor: fix typo in variable name"
     - "[MINOR] docs: fix typo in README"
     - "[apache#255] test: fix flaky test NameOfTheTest"
   Reference: https://www.conventionalcommits.org/en/v1.0.0/
2. If the PR is unfinished, please mark this PR as draft.
-->

### What changes were proposed in this pull request?

- support GravitinoCatalogStore to register the catalog. In the MR, we
will support to create the hive catalog.

### Why are the changes needed?

- Fix apache#3362 

### Does this PR introduce _any_ user-facing change?

- support flink in gravitino 

### How was this patch tested?

- add UTs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] [flink-connector] Support GravitinoCatalogStore to register the catalog
4 participants