-
Notifications
You must be signed in to change notification settings - Fork 83
fix(transforms): DayTransform #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Its transform function returns an `int32`, but its `ResultType` method indicated it's a date. Per the spec, the transform function is correct: the result is an int32 number of days since epoch.
The |
Hmm, ok. The spec states that the result type of the day transform is
Where does it need to be transformed to a string? The implementations of I ran into an issue with this when using |
That's a good point, you're right. I forgot about the |
Can you create a test case that hits the problem you found and is solved by this change? |
Yep, will do. |
@@ -720,7 +720,7 @@ func (t DayTransform) MarshalText() ([]byte, error) { | |||
|
|||
func (DayTransform) String() string { return "day" } | |||
|
|||
func (DayTransform) ResultType(Type) Type { return PrimitiveTypes.Date } | |||
func (DayTransform) ResultType(Type) Type { return PrimitiveTypes.Int32 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://iceberg.apache.org/spec/#partition-transforms
day
transform result type should be int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was originally Date
probably because of pyiceberg
https://github.com/apache/iceberg-python/blob/c84017dc6fd17da0c077504ba5d8de587563ec9c/pyiceberg/transforms.py#L645-L651
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably where I got it from 😄
Then we're all good here once @jhump adds a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can probably get to adding a test early next week. Sorry for the delays. I got side-tracked with some other urgent stuff that popped up this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to apologize :P
that's just how OSS contributions go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup yup! No rush nor need for apologizing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I've pushed a test. But, unfortunately, it did not reproduce the issue I originally encountered.
I stepped through everything in this test with a debugger to figure out what was going on, since I was surprised it passed. It turns out that the avro encoder will happily accept an int32
value (and also accept an iceberg.Date
, since its underlying kind is int32
) even for a field with the "date" logical type. It is a little odd that it doesn't survive a round-trip: the tests writes an int32
but gets back an iceberg.Date
when reading that just-written file. But I didn't add an assertion to verify that all values round-trip correctly because (sigh 😢) none of the int32 values do: while this one gets deserialized as a time.Time
(and then converted to iceberg.Date
in dataFile.initializeMapData
), all of the others get deserialized as int
(instead of int32
).
I did a little spelunking in my own codebase, to try to figure out what error I saw when I originally encountered an issue. It turns out that my application has extra checks before writing a manifest, which verify that the partition data for each data file has the correct/expected types of values. The map had an int32
value, but because DayTransform.ResultType()
returned Date
instead of Int32
the checks was expecting a time.Time
(what would have been loaded directly from Avro, due to the "date" logical type in the schema) or an iceberg.Date
. So it was my own code that was returning an error, not code in this library.
I still think this change makes sense, despite the fact that things do actually work without it. It is possible that integrating with something that tries to read the manifest could fail since the type in the manifest doesn't quite match the expected type per the Iceberg spec. 🤷
… transform result types
Hmmm, looks like a semantic merge conflict with #427... |
@zeroshade, I finally had a second to look at this. It was a very simple/tiny fix, but I just didn't have a chance to even look at it before now. Tests and lint are green for me locally now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Its transform function returns an
int32
, but itsResultType
method indicated it's a date. Per the spec, the transform function is correct: the result is an int32 number of days since epoch.