New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-3065: Introduce UUID logical type to Python implementation #1148
AVRO-3065: Introduce UUID logical type to Python implementation #1148
Conversation
This PR introduces the UUID logical type implementation that was missing in the primary python implementation. A new `UUIDSchema` has been introduced, and test cases for schema and io have been updated. Closes: https://issues.apache.org/jira/browse/AVRO-3065
lang/py/avro/test/test_io.py
Outdated
@@ -80,7 +80,7 @@ | |||
'{"type": "long", "logicalType": "timestamp-micros"}', | |||
datetime.datetime(2000, 1, 18, 2, 2, 1, 123499, tzinfo=avro.timezones.tst) | |||
), | |||
('{"type": "string", "logicalType": "uuid"}', u'12345abcd'), | |||
('{"type": "string", "logicalType": "uuid"}', u'570feebe-2bbc-4937-98df-285944e1dbbd'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the test case, should this be:
('{"type": "string", "logicalType": "uuid"}', u'570feebe-2bbc-4937-98df-285944e1dbbd'), | |
('{"type": "string", "logicalType": "uuid"}', uuid.UUID('570feebe-2bbc-4937-98df-285944e1dbbd')), |
I'm not sure if this suggestion is correct, but it's what I would expect once UUID logical type is implemented!
Since this is a new feature, should we go with the expectation that the in-memory representation of a UUID datum is a UUID object instead of a string? Should we accept either/or when serializing, but always deserialize to one or the other? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on!
My comment on the test case is that if we're (finally!) introducing the UUID type to python, we might use the uuid
type for these data (like we use datetime
instances for the date and time logical types).
This could break code for developers that have schemas with uuid logical types and are happy to treat them as string
s for now, though. Much like using avro-1.7.7 would happily ignore date logical types and treat them as int
s...
If we decide there's no real advantage to using uuid
instances, this looks good though!
Hey @RyanSkraba! Thank you for the review. Yes, I was thinking primarily about backward compatibility when I represented UUID as a string. Some aspects (and past experiences) that made string representation a better option for me:
Having said that, UUID object representation might be a better long-term option. Please let me know what you think. |
This commit changes UUID validation to accept UUID values of any version, instead of being locked to version 4. Also, return `None` on failed validation instead of `False`, for consistency with others.
f7ec849 updates the validation to accept UUIDs of different versions, instead of being locked to version 4. |
Thanks for the explanation -- your reasoning is sound. Especially since the |
Hey, if nobody has any other comments, let's just merge this! Thanks for the contribution! |
This PR introduces the UUID logical type implementation that was missing in the
primary python implementation. A new
UUIDSchema
has been introduced thatvalidates for valid UUID4 string values.
Closes: https://issues.apache.org/jira/browse/AVRO-3065
Make sure you have checked all steps below.
Jira
Tests
Commits
Documentation