Skip to content

AVRO-1938: Generate Parsing Canonical Forms of Schema#1167

Merged
kojiromike merged 4 commits intoapache:masterfrom
subhashb:1938-generate-canonical-form
Apr 5, 2021
Merged

AVRO-1938: Generate Parsing Canonical Forms of Schema#1167
kojiromike merged 4 commits intoapache:masterfrom
subhashb:1938-generate-canonical-form

Conversation

@subhashb
Copy link
Copy Markdown
Contributor

@subhashb subhashb commented Mar 30, 2021

This PR adds support for generating Parsing Canonical Forms of Avro Schemas to the main avro package.

The bulk of the work was done by @kojiromike and @forsberg. This PR cleans up code where necessary, adds more test cases, and clarifies on transformations wherever not applicable in Python (ex. Transformation of integers with leading zeros)

Closes: https://issues.apache.org/jira/browse/AVRO-1938
Closes: #700

Make sure you have checked all steps below.

Jira

Tests

  • My PR adds the following unit tests:
    Test cases in test/test_schema.py::CanonicalFormTestCase

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    Test cases act as stand-ins for the feature for now. Comprehensive documentation needs to be added later.

This PR adds support for generating Parsing Canonical Forms of Avro Schemas
to the main `avro` package.

The bulk of work was done by @kojiromike and @forsberg. This PR cleans up code
where necessary, adds more test cases, and clarifies on transformations where
not applicable in Python (ex. Transformation of integers with leading zeros)

Closes: https://issues.apache.org/jira/browse/AVRO-1938
@subhashb subhashb mentioned this pull request Mar 30, 2021
5 tasks
This commit ensures that `names` is an optional parameter in all subclass
implementations of `to_canonical_json`. It also addresses minor formatting
issues and review comments to previous commits.
@subhashb
Copy link
Copy Markdown
Contributor Author

subhashb commented Mar 31, 2021

@kojiromike Would you prefer adding fingerprinting support in a subsequent PR, or should it be tacked to this PR itself?

I am almost ready with fingerprinting changes. If we would like to let this PR bake for a bit, I can add fingerprinting support to the same PR. Or we can pull this in, and I can raise a separate PR.

Please let me know.

@subhashb
Copy link
Copy Markdown
Contributor Author

Fingerprinting support is now done and ready to push. 👍

@property
def canonical_properties(self):
props = self.props
return collections.OrderedDict(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just use a dictionary here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While keys are inserted in the order specified by the canonical form, dictionary order is guaranteed only in Python 3.7 and above. So I believe we still need this to be OrderedDict.

If we were to stop supporting 3.6 someday, we could revert to using a dictionary; I believe it would work without any changes.

@kojiromike
Copy link
Copy Markdown
Contributor

Please put "closes #700" in the pr description to automatically close that pr on merge.

@subhashb
Copy link
Copy Markdown
Contributor Author

subhashb commented Apr 1, 2021

@kojiromike Addressed the latest review comments. PR is good to go.
I will push fingerprinting changes in a subsequent PR.

@kojiromike kojiromike merged commit c769c43 into apache:master Apr 5, 2021
@subhashb subhashb deleted the 1938-generate-canonical-form branch April 5, 2021 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants