-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Do not sort schema fields by default #12232
Conversation
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)): | |||
# This field is used to set namespace for Avro Record schema. | |||
_avro_namespace = None | |||
|
|||
# Generate a schema where fields are sorted alphabetically | |||
_sorted_fields = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set this value to True
to avoid compatibility problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is that we'll keep getting users stuck with this issue if we don't fix the default behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe other users will meet new issues? This user is ok, but other users may be stuck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we updated Avro in 2.8.0 we saw the same thing happened on Java.
This is not a problem because the consumer has the original writer schema while reading.
This should not be a problem
@sijie can give more context, he gave me lot of useful advice during the Avro update
@merlimat Some tests run failed, Please take a look, thanks.
|
@eolivelli Putting the compatibility conversation aside, the PR here is valuable. Because it provides a flag to control whether to sort schema fields or not. So it will ensure python clients and java clients can generate the schema in the same order, which will improve debuggability and readability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good for me
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't think about the compatibility problem, I'm OK.
I found a case about the schema without sort field. class T2(Record):
b = Integer()
a = Integer()
d = String()
c = Double() But the schema fields of T2 without sort is:
However we expected order is the field define order of the class (I am not sure), such as |
@hangc0276 Could you please help create a separate issue to track the issue you have mentioned? |
### Motivation In Avro schema, the order of fields is used in the validation process, so if we are sorting the fields, that will generate an unexpected schema for a python producer/consumer and it will make it not interoperable with Java and other clients. (cherry picked from commit 2f3ad4d)
@codelipenghui I create an issue to track it apache/pulsar-client-python#48 |
### Motivation In Avro schema, the order of fields is used in the validation process, so if we are sorting the fields, that will generate an unexpected schema for a python producer/consumer and it will make it not interoperable with Java and other clients.
This PR actually leads to an incompatibility between the Java client and the Python client. See the issue here and the discussion in the mail list: https://lists.apache.org/thread/wl5rws7m0gqxc9n512llnpzf7kq5sp0j In short, to be compatible with the POJO like: class User {
String name;
int age;
double score;
} We have to give the following definition in Python: class User(Record):
_sorted_fields = True
name = String()
age = Integer(required=True)
score = Double(required=True) The default values are not compatible with the Java client so I suggest changing these default values. Please continue the discussion in the mail list if any of you has objection. /cc @merlimat @gaoran10 @congbobo184 @codelipenghui @eolivelli @hangc0276 @mattisonchao (BTW, these Python options are not documented well and I found the solution by reading the source code) |
Motivation
In Avro schema, the order of fields is used in the validation process, so if we are sorting the fields, that will generate an unexpected schema for a python producer/consumer and it will make it not interoperable with Java and other clients.