Spark: Correct partition transform functions to match spec.md #8192

clettieri · 2023-07-31T18:40:16Z

Given the spec for the time transform functions, this PR adds the singular (instead of plural) names of the transform functions.

The spec specifies year, month, hour, day, but previously Spark only supported the plural version of these words (years, months, etc).

Hopefully this saves someone else a few cycles of debugging :)

Fokko · 2023-07-31T21:13:46Z

@clettieri Thanks for raising this. This is a known issue, and I also run into it. However, maybe it is better to add day as an option to Spark since the spec is considered the source of truth.

ajantha-bhat · 2023-08-01T04:52:00Z

format/spec.md

-| **`month`**       | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz`                                                                        | `int`       |
-| **`day`**         | Extract a date or timestamp day, as days from 1970-01-01     | `date`, `timestamp`, `timestamptz`                                                                        | `int`      |
-| **`hour`**        | Extract a timestamp hour, as hours from 1970-01-01 00:00:00  | `timestamp`, `timestamptz`                                                                                        | `int`       |
+| **`years`**       | Extract a date or timestamp year, as years from 1970         | `date`, `timestamp`, `timestamptz`                                                                        | `int`       |


This might impact Trino, Dremio and other engines which has naming conventions according to spec.

https://trino.io/docs/current/connector/iceberg.html#partitioned-tables
https://docs.dremio.com/cloud/reference/sql/commands/create-table/

IMO, better to change the spark transform name and deprecate the old ones.

cc: @aokolnychyi, @RussellSpitzer

I can do that.

Would it be useful to keep the unintended "backward" compatibility though? Or do you think just force the Spark API to match the spec?

We need to maintain backward compatibility. I would suggest adding the singular options and also keeping the plural ones as well. They don't conflict and avoids people to run into errors. Also curious about what others think of it.

I added singular options to each Spark3 version. How does this look?

I can add some tests as well if you think that would be useful.

Also, should I change the prefix of this PR to Spark: ?

I can add some tests as well if you think that would be useful.

You can add a test in TestAlterTablePartitionFields to make sure that we don't break this in the future. It could be something similar as testSparkTableAddDropPartitions where you use the singular transform names.

Also, should I change the prefix of this PR to Spark: ?

Yes please :)

For the tests, what if I replicate these but for some or all of year, month, hour, day ?

Yes, that would be perfect

Added the tests.

I ran /gradlew :iceberg-spark:iceberg-spark-extensions-3.4_2.12:test --tests TestAlterTablePartitionFields locally and saw things pass. LMK if I can do anything else here (I'll try to catch up on the contributing guidelines tomorrow). Thanks!

spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java

Fokko

I also ran into this, more than one :3 Would be good to get this in! Thanks for working on this @clettieri

RussellSpitzer · 2023-08-02T21:50:07Z

I would want @rdblue to check this. If I remember correctly we originally did this because the base implementation happened to be wrong so we had to add another implementation that took account for 0 correctly.

The other issue here is that the non-plural names are Spark native functions, is that an issue?

nastra · 2023-09-13T06:00:48Z

...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestAlterTablePartitionFields.java

+    PartitionSpec expected =
+        PartitionSpec.builderFor(table.schema()).withSpecId(1).year("ts").build();
+
+    Assert.assertEquals("Should have new spec field", expected, table.spec());


it would be great to not add more JUnit4-style assertions (even if the underlying class already uses them). Can you please convert those to AssertJ? That will make migration to JUnit5 easier. See https://iceberg.apache.org/contribute/#testing for some details

@nastra updated the tests, let me know what you think

aokolnychyi · 2023-09-19T19:30:49Z

@clettieri, sorry it took so long to get back to this PR.

Could you, please, add tests that the new name is also supported in CREATE TABLE statements too? I believe this change only covers ALTER TABLE. In addition, this PR should be rebased and include Spark 3.5.

clettieri · 2023-09-20T17:30:20Z

@clettieri, sorry it took so long to get back to this PR.

Could you, please, add tests that the new name is also supported in CREATE TABLE statements too? I believe this change only covers ALTER TABLE. In addition, this PR should be rebased and include Spark 3.5.

Hey @aokolnychyi, I can rebase and include Spark 3.5 👍🏼 .

Regarding the CREATE TABLE tests though, I don't see any for the previous transform functions and am unclear what new functionality they would test compared to what we have now. The current tests seem sufficient IMO to validate the transform functions can be called. Am I missing something here? Should I create a new test class to verify creating tables with these transform functions succeed?

Thanks!

aokolnychyi · 2023-09-20T22:40:18Z

@clettieri, what about TestCreateTable? Unlike what is covered by new tests, we should also check the new syntax is supported by Spark without extensions.

aokolnychyi · 2023-09-23T00:52:00Z

Looks like there are some spotless issues that fail the build.

clettieri · 2023-09-23T10:27:25Z

Looks like there are some spotless issues that fail the build.

Yep, I see that. I'll try to get to that on Monday. I'm a bit new to the JVM/Spark world and was having some trouble switching Spark environments to run Gradle locally. :)

aokolnychyi · 2023-09-24T00:37:04Z

Thank you, @clettieri!

rdblue · 2023-09-24T20:39:41Z

I fixed a typo and opened clettieri#1 with the spotless changes.

Apply spotless

clettieri · 2023-09-25T10:44:22Z

I fixed a typo and opened clettieri#1 with the spotless changes.

Thank you for saving me the time this morning :)

nastra · 2023-09-25T12:56:48Z

thanks @clettieri

ajantha-bhat · 2023-09-25T13:38:29Z

Thanks for fixing. I couldn't take a look at it again before.
LGTM.

But we missed to update the documentation in https://github.com/apache/iceberg/blob/master/docs/spark-ddl.md.

@nk1506 would you like to work on it?

This particular apache#8192 has fixed the code but it seems documented is not in sync. Hence the follow up PR.

ajantha-bhat reviewed Aug 1, 2023

View reviewed changes

clettieri force-pushed the docs-fix-partition-transforms branch from 433c8bf to 10ddc9b Compare August 1, 2023 13:41

github-actions bot added the spark label Aug 1, 2023

Fokko reviewed Aug 1, 2023

View reviewed changes

spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java Outdated Show resolved Hide resolved

Fokko reviewed Aug 1, 2023

View reviewed changes

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java Outdated Show resolved Hide resolved

clettieri changed the title ~~Docs: Correct partition transform functions on spec.md~~ Spark: Correct partition transform functions on spec.md Aug 1, 2023

clettieri changed the title ~~Spark: Correct partition transform functions on spec.md~~ Spark: Correct partition transform functions to match spec.md Aug 1, 2023

Fokko approved these changes Aug 2, 2023

View reviewed changes

Fokko added this to the Iceberg 1.4.0 milestone Aug 2, 2023

nastra requested a review from rdblue September 13, 2023 05:58

nastra reviewed Sep 13, 2023

View reviewed changes

clettieri added 8 commits September 20, 2023 13:32

add non-plural transform function names

1d6776d

typos

c20e700

typo

d95d4c3

add tests for singular partition transform functions

ef6a352

update tests to assertj

deef41a

add spark 3.5

9a62170

use new createTable() in tests

a886a6a

formatting

0b9c76e

clettieri force-pushed the docs-fix-partition-transforms branch from 569e5bf to 0b9c76e Compare September 20, 2023 17:59

create table test

cc49a0a

rdblue approved these changes Sep 24, 2023

View reviewed changes

rdblue added 2 commits September 24, 2023 13:29

Add missing : to fix syntax error.

c13c989

Apply spotless.

3bcdd3d

Merge pull request #1 from rdblue/docs-fix-partition-transforms

60b77f6

Apply spotless

nastra approved these changes Sep 25, 2023

View reviewed changes

nastra merged commit c0bed74 into apache:master Sep 25, 2023
37 checks passed

nk1506 added a commit to nk1506/iceberg that referenced this pull request Sep 25, 2023

Docs: Update spark partition transform as per spec.

4f5d78f

This particular apache#8192 has fixed the code but it seems documented is not in sync. Hence the follow up PR.

nk1506 mentioned this pull request Sep 25, 2023

Docs: Update spark partition transform as per spec. #8640

Merged

nk1506 added a commit to nk1506/iceberg that referenced this pull request Sep 25, 2023

Docs: Update spark partition transform as per spec.

129f240

This particular apache#8192 has fixed the code but it seems documented is not in sync. Hence the follow up PR.

nk1506 added a commit to nk1506/iceberg that referenced this pull request Sep 25, 2023

Docs: Update spark partition transform as per spec.

0787e08

This particular apache#8192 has fixed the code but it seems documented is not in sync. Hence the follow up PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Correct partition transform functions to match spec.md #8192

Spark: Correct partition transform functions to match spec.md #8192

clettieri commented Jul 31, 2023 •

edited

Fokko commented Jul 31, 2023

ajantha-bhat Aug 1, 2023

ajantha-bhat Aug 1, 2023

clettieri Aug 1, 2023

Fokko Aug 1, 2023

clettieri Aug 1, 2023 •

edited

Fokko Aug 1, 2023

clettieri Aug 1, 2023

Fokko Aug 1, 2023

clettieri Aug 1, 2023

Fokko left a comment

RussellSpitzer commented Aug 2, 2023

nastra Sep 13, 2023

clettieri Sep 13, 2023

clettieri Sep 13, 2023

aokolnychyi commented Sep 19, 2023

clettieri commented Sep 20, 2023 •

edited

aokolnychyi commented Sep 20, 2023

aokolnychyi commented Sep 23, 2023

clettieri commented Sep 23, 2023

aokolnychyi commented Sep 24, 2023

rdblue commented Sep 24, 2023

clettieri commented Sep 25, 2023

nastra commented Sep 25, 2023

ajantha-bhat commented Sep 25, 2023 •

edited

Spark: Correct partition transform functions to match spec.md #8192

Spark: Correct partition transform functions to match spec.md #8192

Conversation

clettieri commented Jul 31, 2023 • edited

Fokko commented Jul 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clettieri Aug 1, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fokko left a comment

Choose a reason for hiding this comment

RussellSpitzer commented Aug 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi commented Sep 19, 2023

clettieri commented Sep 20, 2023 • edited

aokolnychyi commented Sep 20, 2023

aokolnychyi commented Sep 23, 2023

clettieri commented Sep 23, 2023

aokolnychyi commented Sep 24, 2023

rdblue commented Sep 24, 2023

clettieri commented Sep 25, 2023

nastra commented Sep 25, 2023

ajantha-bhat commented Sep 25, 2023 • edited

clettieri commented Jul 31, 2023 •

edited

clettieri Aug 1, 2023 •

edited

clettieri commented Sep 20, 2023 •

edited

ajantha-bhat commented Sep 25, 2023 •

edited