[FLINK-11518] [table] Add partition related catalog APIs and implement them in GenericInMemoryCatalog #8222

bowenli86 · 2019-04-19T05:53:50Z

What is the purpose of the change

This PR adds support for partition related operations to Catalog APIs

Brief change log

adds partition related APIs in both ReadableCatalog and ReadableWritableCatalog
implemented them in GenericInMemoryCatalog

Verifying this change

This change added tests and can be verified as follows:

added corresponding unit tests in GenericInMemoryCatalogTest

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (yes )
If yes, how is the feature documented? (JavaDocs)

…ement them in GenericInMemoryCatalog

flinkbot · 2019-04-19T05:53:54Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❗ 3. Needs [attention] from.
- Needs attention by @KurtYoung, @wuchong [committer]
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

bowenli86 · 2019-04-19T05:55:07Z

cc @twalthr @dawidwys @zjffdu @JingsongLi

...e/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericCatalogTable.java

flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogTable.java

...link-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericInMemoryCatalog.java

.../src/main/java/org/apache/flink/table/catalog/exceptions/PartitionAlreadyExistException.java

JingsongLi · 2019-04-23T05:37:55Z

@flinkbot attention @wuchong

wuchong

The pull request looks good to me. I only have some doubts, as I'm not familiar with Hive and catalog.

flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogTable.java

...-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogPartition.java

wuchong · 2019-04-23T12:41:21Z

flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogTable.java

+	 * Get the partition keys of the table. This will be an empty set if the table is not partitioned.
+	 * @return partition keys of the table.
+	 */
+	LinkedHashSet<String> getPartitionKeys() throws TableNotPartitionedException;


I find that Hive doesn't allow partitioned keys in table schema, do we have this restriction?

I don't really understand the question. What do you mean restriction? Can you elaborate?

You are right that Hive's Table uses a separate field dedicated for partition keys.

...ink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericCatalogPartition.java

...-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogPartition.java

flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogTable.java

...k-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java

.../src/main/java/org/apache/flink/table/catalog/exceptions/PartitionAlreadyExistException.java

...on/src/main/java/org/apache/flink/table/catalog/exceptions/TableNotPartitionedException.java

...-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogPartition.java

xuefuz

Looks good overall. Some suggestions for consideration.

…xception, and update catalog APIs

...le/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogPartitionSpec.java

bowenli86 · 2019-04-24T17:23:40Z

Thanks for your reivew @JingsongLi @wuchong @xuefuz @zjffdu . To summarize major feedbacks I've addressed:

make CatalogPartitionSpec its own class, and use it as id for partitions in APIs, like ObjectPath as id for tables
removed its copy() method, since it internally keeps an unmodifiable map

@KurtYoung Would be great to have you take a look and merge this PR if everything looks good

bowenli86 · 2019-04-24T17:26:01Z

@flinkbot attention @KurtYoung

...e/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericCatalogTable.java

KurtYoung · 2019-04-25T03:52:04Z

...k-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableCatalog.java

+	 * @throws PartitionNotExistException thrown if the partition is not partitioned
+	 * @throws CatalogException	in case of any runtime exception
+	 */
+	CatalogPartition getPartition(ObjectPath tablePath, CatalogPartitionSpec partitionSpec)


This functionality can be covered by "listPartitions"?

not really. PartitionSpec in listPartition can be a partial spec to indicate a subset of all partitions in the table, while it must be a full spec in getPartition to represent a single partition.

E.g., say we have a table with partition keys(k1, k2) and 2 partitions (k1=a, k2=b) and (k1=a, k2=c), listPartitions(tablePath, (k1=a)) returns both partitions, and of course as an extreme case, listPartitions(tablePath, (k1=a, k2=b)) returns a list with the first partition as the only element inside. getPartition requires a full spec not partial ones, e.g. you can pass (k1=a, k2=b), but not just (k1=a) in which case it throws exception

KurtYoung · 2019-04-25T03:54:40Z

...flink-table-common/src/main/java/org/apache/flink/table/catalog/ReadableWritableCatalog.java

+	 * @throws PartitionAlreadyExistsException thrown if the target partition already exists
+	 * @throws CatalogException in case of any runtime exception
+	 */
+	void createPartition(ObjectPath tablePath, CatalogPartitionSpec partitionSpec, CatalogPartition partition, boolean ignoreIfExists)


Can you explain more about this interface? In which case this API would be invoked. I mean, does there exists some partition-only operations? My feeling is most operations to partition is through table.

An example from a SQL perspective is "ALTER TABLE ADD PARTITION". Operations on partitions will be using and only using table as an identifier to locate the partition, and that's why this API includes tablePath.

KurtYoung · 2019-04-25T07:08:53Z

LGTM, will merge this after travis

…t them in GenericInMemoryCatalog This closes apache#8222

[FLINK-11518] [SQL/TABLE] Add partition related catalog APIs and impl…

a593584

…ement them in GenericInMemoryCatalog

rmetzger added the review=description? label Apr 19, 2019

zjffdu reviewed Apr 19, 2019

View reviewed changes

...e/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericCatalogTable.java Outdated Show resolved Hide resolved

zjffdu reviewed Apr 19, 2019

View reviewed changes

...e/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/GenericCatalogTable.java Outdated Show resolved Hide resolved

rmetzger added the component=TableSQL/API label Apr 19, 2019

address feedback

467c42a

JingsongLi reviewed Apr 19, 2019

View reviewed changes

address feedback

c895717

bowenli86 changed the title ~~[FLINK-11518] [SQL/TABLE] Add partition related catalog APIs and implement them in GenericInMemoryCatalog~~ [FLINK-11518] [table] Add partition related catalog APIs and implement them in GenericInMemoryCatalog Apr 21, 2019

rmetzger requested a review from wuchong April 23, 2019 05:39

wuchong reviewed Apr 23, 2019

View reviewed changes

rmetzger requested a review from wuchong April 23, 2019 12:44