-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-2299 Normalize Avro Standard Canonical Schema. #452
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally had a chance to review this. Thanks for your patience.
lang/java/avro/src/main/java/org/apache/avro/SchemaNormalization.java
Outdated
Show resolved
Hide resolved
lang/java/avro/src/main/java/org/apache/avro/SchemaNormalization.java
Outdated
Show resolved
Hide resolved
Hi @doug Cutting, I have modified the code based on your review comments. Thanks a lot for your commitments. Updated the Spec for properties. The idea is to pass user defined properties for adding canonical form, user defined property should not be avro reserved properties. |
There are a lot of whitespace changes in this which makes it difficult to review. Maybe revert and make only the required changes? If your editor makes whitespace changes automatically then you should disable that feature. |
Whichever works best for you. We just need a pull request that has the
minimal changes.
…On Tue, Mar 5, 2019 at 11:20 AM Rumeshkrishnan ***@***.***> wrote:
There are a lot of whitespace changes in this which makes it difficult to
review. Maybe revert and make only the required changes? If your editor
makes whitespace changes automatically then you should disable that feature.
Is it possible to revert all changes and create new commit or can i fork
newly from master and do these changes ?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#452 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAQhekk65m4gTQ0E9UNiA4C0QMpqN2RMks5vTsN9gaJpZM4az0T7>
.
|
Thank you so much @cutting for your review comments. I refactor the changes with minimal modification and what is required without formatting. Hope this help to make review better way. Kindly let me know, if any information required from my end. I have provided the Specs for The Input Schema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good except for the two javadoc issues below.
lang/java/avro/src/main/java/org/apache/avro/SchemaNormalization.java
Outdated
Show resolved
Hide resolved
lang/java/avro/src/main/java/org/apache/avro/SchemaNormalization.java
Outdated
Show resolved
Hide resolved
@rumeshkrish What do you think about changing the order of the properties to:
This has the nice behavior that if you stop at If this is conceptually extending the list of properties for the parsing canonical form, instead of mixing other properties in to the list, maybe it helps to simplify the implementation for other languages? I'd also be in favor (https://issues.apache.org/jira/browse/AVRO-2299?focusedCommentId=16755375&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16755375) of adding a named Resolution Canonical Form to the implementation and the spec, but that could happen separately. |
lang/java/avro/src/main/java/org/apache/avro/SchemaNormalization.java
Outdated
Show resolved
Hide resolved
AVRO-2299 Get Avro Standard Canonical Schema. removed namespace from the avro reserved list. AVRO-2299 Get Avro Standard Canonical Schema. 1. additional spec for used defined properties 2. toCanonicalForm function code fix for avoid mutating static states. 2. Code changes done based on review comments. AVRO-2299 Get Avro Standard Canonical Schema. resolved git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved code based on git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved code based on git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved code based on git master conflicts. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 3d8adfb. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit e0c6249. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit f93aafe. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 4fe04d4. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 6326880. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 2f790f0. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit f18a773. AVRO-2299 Canonical Normalization code changes based on review comments and spec for properties. AVRO-2299 fully qualified named aliases with sorted order in Schema/Field.
AVRO-2299 Get Avro Standard Canonical Schema. removed namespace from the avro reserved list. AVRO-2299 Get Avro Standard Canonical Schema. 1. additional spec for used defined properties 2. toCanonicalForm function code fix for avoid mutating static states. 2. Code changes done based on review comments. AVRO-2299 Get Avro Standard Canonical Schema. resolved git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved code based on git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved code based on git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved code based on git master conflicts. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 3d8adfb. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit e0c6249. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit f93aafe. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 4fe04d4. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 6326880. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 2f790f0. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit f18a773. AVRO-2299 Canonical Normalization code changes based on review comments and spec for properties. AVRO-2299 fully qualified named aliases with sorted order in Schema/Field.
# Conflicts: # lang/java/avro/src/main/java/org/apache/avro/SchemaNormalization.java # lang/java/avro/src/test/java/org/apache/avro/TestSchemaNormalization.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standard Canonical Form Spec
added. Kindly review this @cutting .
<section>
<title>Standard Canonical Form for Schemas</title>
<p>One of defined way to normalize the avro schema using
<em>Standard Canonical Form Transformation</em>. This involves
stripping unwanted properties and maintain same canonical
ordering. The canonical ordering involves ordering avro
reserved properties followed by custom properties if mentioned while
transforming. Normalization schema which helps to reduce the
total memory size of schema (removed unwanted properties and whitespace)
while transfer avro schema between two system and also reduce the parsing
time for compatibility check and schema evolution.
</p>
<p><em>Standard Canonical Form</em> is a transformation of a schema
into standard canonical ordered. It contains avro reserved properties
<code>"name", "type", "fields", "symbols", "items", "values",
"logicalType", "size", "order", "doc", "aliases", "default"</code>
and <em>other (custom properties)</em> schema properties.
</p>
<section>
<title>Transforming into Standard Canonical Form</title>
<p>Assuming an input schema (in JSON form) that's already
UTF-8 text for a <em>valid</em> Avro schema (including all
quotes as required by JSON), the following transformations
will produce its Standard Canonical Form:</p>
<ul>
<li> [PRIMITIVES] Convert primitive schemas to their simple
form (e.g., <code>int</code> instead of
<code>{"type":"int"}</code>).</li>
<li> [FULLNAMES] Replace short names with fullnames, using
applicable namespaces to do so. Then eliminate
<code>namespace</code> attributes, which are now redundant.</li>
<li> [STRIP] Keep only attributes that are relevant to
reserved properties, which are:
<code>type</code>, <code>name</code>,
<code>fields</code>, <code>symbols</code>,
<code>items</code>, <code>values</code>,
<code>logicalType</code>, <code>size</code>,
<code>order</code>, <code>doc</code>
<code>aliases</code> and <code>default</code>.
Strip all others user defined properties (e.g., <code>format</code>).</li>
<li> [ORDER] Order the appearance of fields of JSON objects
as follows: <code>name</code>, <code>type</code>,
<code>fields</code>, <code>symbols</code>,
<code>items</code>, <code>values</code>,
<code>logicalType</code>, <code>size</code>,
<code>order</code>, <code>doc</code>,
<code>aliases</code>, <code>default</code>.
For example, if an object has <code>type</code>,
<code>name</code>, and <code>size</code> fields, then the
<code>name</code> field should appear first, followed by the
<code>type</code> and then the <code>size</code> fields.</li>
<li> [STRINGS] For all JSON string literals in the schema
text, replace any escaped characters (e.g., \uXXXX escapes)
with their UTF-8 equivalents.</li>
<li> [INTEGERS] Eliminate quotes around and any leading
zeros in front of JSON integer literals (which appear in the
<code>size</code> attributes of <code>fixed</code> schemas).</li>
<li> [WHITESPACE] Eliminate all whitespace in JSON outside of string literals.</li>
</ul>
</section>
<section>
<title>Transforming with Custom Properties</title>
<p>In addition to the standard canonical form transformation, including
<em>custom</em> <code>Schema</code> or <code>Field</code> properties by
passing the properties names while transforming.
For example, if an object has <code>format</code>, <code>type</code>,
<code>name</code>, and <code>size</code> fields, then the
<code>name</code> field should appear first, followed by the
<code>type</code>, <code>size</code> and then <code>format</code>
(custom properties) fields.
</p>
</section>
</section>
what this mean ?
|
@cutting is it possible to add this new feature in |
…va code (#487) * AVRO-2353: Configure spotless-maven-plugin to auto format/validate Java code * AVRO-2353: Adapt code formatter to Avro default style (2 spaces) * AVRO-2353: Remove trailing spaces check from checkstyle This is a bug of eclipse-jdt-formatter so we should disable the trailing validation temporally. For more info see https://bugs.eclipse.org/bugs/show_bug.cgi?id=488229 * AVRO-2353: Auto-format existing code
We introduced automatic code formatting. For more info see the "How to contribute" page. This probably affected this PR can you please rebase it. |
The changes to the spec require some review but I won't have time in the
next few weeks.
Can someone else help out with this review, please?
Thanks,
Doug
…On Thu, Mar 21, 2019 at 4:58 AM Rumeshkrishnan ***@***.***> wrote:
@cutting <https://github.com/cutting> is it possible to add this new
feature in 1.9.0 version ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#452 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAQheiXG44e_4tEKqHsIRBkAvSyOT4Rnks5vY3POgaJpZM4az0T7>
.
|
…erated @deprecated code by default
… be mapped to proper java types intead of long
…appropriate value
AVRO-2299 Get Avro Standard Canonical Schema. removed namespace from the avro reserved list. AVRO-2299 Get Avro Standard Canonical Schema. 1. additional spec for used defined properties 2. toCanonicalForm function code fix for avoid mutating static states. 2. Code changes done based on review comments. AVRO-2299 Get Avro Standard Canonical Schema. resolved git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved code based on git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved code based on git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved code based on git master conflicts. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 3d8adfb. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit e0c6249. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit f93aafe. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 4fe04d4. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 6326880. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 2f790f0. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit f18a773. AVRO-2299 Canonical Normalization code changes based on review comments and spec for properties. AVRO-2299 fully qualified named aliases with sorted order in Schema/Field.
AVRO-2299 Get Avro Standard Canonical Schema. removed namespace from the avro reserved list. AVRO-2299 Get Avro Standard Canonical Schema. 1. additional spec for used defined properties 2. toCanonicalForm function code fix for avoid mutating static states. 2. Code changes done based on review comments. AVRO-2299 Get Avro Standard Canonical Schema. resolved git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved code based on git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved code based on git master conflicts based on https://github.com/apache/avro/pull/452/conflicts. AVRO-2299 Get Avro Standard Canonical Schema. resolved code based on git master conflicts. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 3d8adfb. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit e0c6249. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit f93aafe. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 4fe04d4. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 6326880. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit 2f790f0. Revert "AVRO-2299 Get Avro Standard Canonical Schema." This reverts commit f18a773. AVRO-2299 Canonical Normalization code changes based on review comments and spec for properties. AVRO-2299 fully qualified named aliases with sorted order in Schema/Field.
# Conflicts: # lang/java/avro/src/main/java/org/apache/avro/SchemaNormalization.java # lang/java/avro/src/test/java/org/apache/avro/TestSchemaNormalization.java
Hi, Did any one have some time for reviewing the documentation ? |
@rumeshkrish Something looks off with the PR. It looks like you're changing 631 files, can you fix this? Right now we're running into the same issue, and I believe that the default value should be part of the fingerprint as well. |
Yes, git forked project having problem with latest commits. I am working on
it, I will update you soon.
On Tue, 22 Oct 2019 at 12:37 PM, Fokko Driesprong ***@***.***> wrote:
@rumeshkrish <https://github.com/rumeshkrish> Something looks off with
the PR. It looks like you're changing 631 files, can you fix this? Right
now we're running into the same issue, and I believe that the default value
should be part of the fingerprint as well.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#452?email_source=notifications&email_token=ABHV6YCTQYAKAEI6FC5J5HLQP3JYFA5CNFSM4GWPIT52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB5ISXY#issuecomment-544901471>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHV6YC4CPLJ4XOBLO7F37TQP3JYFANCNFSM4GWPIT5Q>
.
--
Thanks and Regards
*Rumeshkrishnan Mohan*
*Mail: *rumeshkrish@gmail.com
*Mobile:* +49 151 45725188.
*WhatsApp:* +91 94436 87507.
|
JIRA: https://issues.apache.org/jira/browse/AVRO-2299
This is new feature in JAVA component to normalise the Avro schema using canonical order and get the avro schema with reserved properties. The canonical form of schema should follow the below rules,
Avro Schema Reserved Keys:
"name", "type", "fields", "symbols", "items", "values", "logicalType", "size", "order", "doc", "aliases", "default"
As mentioned canonical form for avro types as followed by :-
Input :
{"type":"<primitive_types>", "x":"y"}
Output:
"<primitive_types>" //boolean, bytes, double, float, int, long, null, string
Canonical form for Union:
[ <avro Types>, ... ]
Canonical form for Record:
Canonical form for Enum:
Canonical form for Fixed:
Canonical form for Logical Types: