Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize default values in oneofs when calling to_dict() or to_json() #110

Merged
merged 12 commits into from
Jul 25, 2020
Merged

Serialize default values in oneofs when calling to_dict() or to_json() #110

merged 12 commits into from
Jul 25, 2020

Conversation

bradykieffer
Copy link
Contributor

This change is consistent with the official protobuf implementation. If
a default value is set when using a oneof, and then a message is
translated from message -> JSON -> message, the default value is kept in
tact. Also, if no default value is set, they remain null.

This change is consistent with the official protobuf implementation. If
a default value is set when using a oneof, and then a message is
translated from message -> JSON -> message, the default value is kept in
tact. Also, if no default value is set, they remain null.
@boukeversteegh
Copy link
Collaborator

Awesome! Thank you for your contribution and investigation! 🐍👍

So I understand that you're isolating the case for groups (oneof) and making sure that default values are also serialized there.
This leaves still open the matter of whether to output explicitly set values as per #95 , which we can leave out of scope for now.

I'm curious about two things:

@bradykieffer
Copy link
Contributor Author

This leaves still open the matter of whether to output explicitly set values as per #95 , which we can leave out of scope for now.

Yeah I think that's a good idea to leave out of scope for now. I definitely would personally bias toward either including default fields explicitly or using a type such as BoolValue so that false is sent by default. Funny thing about this package - we were literally trying to make using BoolValue less cumbersome internally really great that we can't override which lead us to finding this library 😄 .

I'm curious about two things:

* Does this also fix issue #63 ? Test case [oneof_enum](https://github.com/danielgtaylor/python-betterproto/blob/master/betterproto/tests/inputs/oneof_enum/)

* Have you seen that fixing this for `to_dict` also changes the serialization behavior or `bytes`? Because betterproto currently also doesn't write anything over the wire when the data is default (see [my comment here](https://github.com/danielgtaylor/python-betterproto/issues/95#issuecomment-653810144)), and that may be something we should fix as well (at some point).

I'll check both of these before I open the PR for review, @dhendry actually pointed out that I hadn't covered all of the relevant message types so I went ahead and did so before responding to this. So I'll go ahead and look into those and get back to you!

This will cause oneof fields with default values to explicitly be sent
to clients. Note that does not mean that all fields are serialized and
sent to clients, just those that _could_ be null and are not.
@bradykieffer
Copy link
Contributor Author

Okay so @boukeversteegh it looks like #65 should be resolved. I just had to update the test cases to reflect proto behavior. Specifically, the fields on the protobufs within those test cases shouldn't ever be null (we'd have to abuse oneofs to make them null... 😄 ). As for changing the behaviour of bytes. I actually included the default value oneof logic in there too - I think that's what we'd want here as it mimics the behavior in to_dict, we will write a oneof field with a default value explicitly now. I don't know if we want to include tracking if the field is set though, I don't think that's how it's done within Google's Protobuf implementation for Python. But, I could be wrong and if I am I think we can raise another issue for that one, I wouldn't mind picking it up at some point 😄

@bradykieffer bradykieffer marked this pull request as ready for review July 8, 2020 22:29
@boukeversteegh
Copy link
Collaborator

Thank you for your time and efforts to investigate and fix this @bradykieffer !

Comment on lines 16 to 24
message = Test()
message.from_json(get_test_case_json_data("oneof_enum", "oneof_enum-enum-0.json"))
assert message.move is None

assert message.move == Move(
x=0, y=0
) # Proto3 will default this as there is no null
assert message.signal == Signal.PASS
assert betterproto.which_one_of(message, "action") == ("signal", Signal.PASS)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed to solve the bug? As I understand the bug is in to_dict and not from_dict..

I think the idea that proto doesn't have null values might actually be somewhat muddled, and incorrect. Because values can be skipped when they are sent over the wire, effectively making them undefined. However, the protobuf libraries will then simply return the default value for those skipped fields. That behavior is indeed part of protobuf specification, but for oneof groups, I consider it not part of the spec that all other fields should have a default value as well (unless its actually wrtten in the spec, sorry I didn't have time to read it 😅 ). At least it doesn't make much sense to me why it would be needed.

Unless it is somehow needed for this bugfix, I would suggest to exclude that part from this PR, because its a functional change (and not sure if its an improvement). And that is a bit harder to judge than the bugfix, which we could merge quickly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is actually default behavior in Google's protobuf implementation as well. What's actually happening here is that the move field is never itself null but it isn't actually sent over the wire - calling to_dict() should only show signal there. I think what's confusing is that a oneof field can have nothing set, an attribute set to a default value, OR an attrbiute set to a non-default value. But, on the actual message object, the attributes themselves are never actually null, they're always in a default state. I replicated this behaviour using these two messages:

message Point{
    int64 x = 1;
    int64 y = 2;
}

message Test{
    oneof test_value {
        Point point = 1;
        google.protobuf.BoolValue bool_value = 2;
        int64 int_value = 3;
    }
}

Compiling with the Google proto impl I then checked the following:

In [4]: t = Test()

In [5]: t
Out[5]:

In [6]: t.point
Out[6]:

In [7]: t.point.x
Out[7]: 0
...

In [9]: type(t.point.x)
Out[9]: int

In [10]: type(t.point)
Out[10]: foo.Point

In [11]: type(t.int_value)
Out[11]: int

In [12]: t.bool_value
Out[12]:

In [13]: type(t.bool_value)
Out[13]: google.protobuf.wrappers_pb2.BoolValue

In [14]: from google.protobuf.json_format import MessageToDict

In [15]: MessageToDict(t)
Out[15]: {}

In [16]: t.point.x = 1

In [17]: MessageToDict(t)
Out[17]: {'point': {'x': '1'}}

In [18]: t.int_value = 1

In [19]: MessageToDict(t)
Out[19]: {'intValue': '1'}

In [20]: t.bool_value.value = True

In [21]: t
Out[21]:
bool_value {
  value: true
}

In [22]: MessageToDict(t)
Out[22]: {'boolValue': True}

I can either add an assert to this test case or a separate test case demonstrating that betterprotos has similar behaviour

# errors.
# TODO: Do we want to allow use of BoolValue directly within a wrapped field or
# should we simply hard fail here?
Test(wrapped_bool_value=False),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

It seems odd that passing a BoolValue to a field that is defined as such will break the library.
Of course, it was meant as a convenience that the BoolValue is not required, but breaking it goes a bit far.

It does raise some questions.
Do we always store a bool internally? Then what happens if you store a BoolValue, then read it again.. it becomes a bool. That may be confusing, and potentially cause bugs.

We may need to look into this, in a separate issue.

Its fine with me to keep the TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if I had to make a choice (with just thinking about this briefly) I would err on the side of failing fast when attempting to assign a non-primitive to a wrapped field. That at least makes the usage of the field unambiguous

Comment on lines 903 to 905
v = v.from_dict(value[key])
if v is not None:
setattr(self, field_name, v)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused how from_dict ever worked properly, if message fields were not assigned to their parent before..

any idea of why this was needed, or why from_dict did actually work?

Also, lines 930-931 assign v to the field. Perhaps just assigning v= would suffice?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I realized it!

v is an existing Message object. The required behavior is that the data from the map is merged into the existing fields. I think no assignment is needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah correct, the v = v.from_dict(value[key]) isn't needed, I'm just used to transforming objects vs mutating them that I wrote it like that! Also, it looks like I can just move lines 930-931 out an indentation layer and achieve the same effect 😄

betterproto/tests/inputs/oneof_enum/test_oneof_enum.py Outdated Show resolved Hide resolved
baz: str = betterproto.string_field(2, group="group1")

assert bytes(Foo(bar=0)) != b""
assert bytes(Foo(baz="")) == b"" # Baz is just an empty string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean its not possible to detect which_one_of if one of the groups is a String and the value was the empty string?

Same with other default values? 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - this actually works with the other default values but not for a string at the moment, I'll get a fix in 😄

…into brady/fix-default-values-oneof-serialization
@bradykieffer
Copy link
Contributor Author

Hey @boukeversteegh I just pulled in upstream changes, this branch should be ready for another pass any time!

@dhendry
Copy link

dhendry commented Jul 14, 2020

@boukeversteegh @danielgtaylor it would be amazing if we could get this in sooner rather than later.

I want to live in the brave new world of betterproto and stop having the official protobuf code gen slowly sap my will to live.

@boukeversteegh
Copy link
Collaborator

Hey @bradykieffer! Thanks so much for keeping the branch in sync. Your PR is first on the list (obviously) now. Unfortunately I need some time to further review, and I'll be busy till after this weekend. Hope you could wait a bit, or perhaps @nat-n could have a look?

What I wanted to confirm before merging is that betterproto behaves (still, or more) correctly (according to specs), intuitively and consistently. Personally speaking, as long as we are compatible with protocol and other clients, we don't necessarily need to follow other clients behavior precisely. Hope to come back to this soon!

@bradykieffer
Copy link
Contributor Author

Hey @nat-n & @boukeversteegh, just pinging again on this PR, we'd really love to start using betterproto internally!

@boukeversteegh
Copy link
Collaborator

I currently don't have time to really formalize the way betterproto should behave, and this PR seems to fix some bugs and bad behavior, so I'm going to merge it.

At some point though, we should really specify clearly how the library handles missing values, default values etcetera, and evaluate if it is logical, intuitive and consistent with specs.

Thanks you for your work and especially patience!

@boukeversteegh boukeversteegh merged commit c1a76a5 into danielgtaylor:master Jul 25, 2020
@bradykieffer bradykieffer deleted the brady/fix-default-values-oneof-serialization branch July 27, 2020 14:07
@bradykieffer
Copy link
Contributor Author

Thanks for getting this in @boukeversteegh, I definitely agree on formalizing how betterproto behaves. We could open an issue to at least track that discussion, I'd definitely love to help out where I can 😄

Gobot1234 added a commit to Gobot1234/python-betterproto that referenced this pull request Aug 6, 2020
Gobot1234 pushed a commit to Gobot1234/python-betterproto that referenced this pull request Aug 24, 2020
danielgtaylor#110)

* Serialize default values in oneofs when calling to_dict() or to_json()

This change is consistent with the official protobuf implementation. If
a default value is set when using a oneof, and then a message is
translated from message -> JSON -> message, the default value is kept in
tact. Also, if no default value is set, they remain null.

* Some cleanup + testing for nested messages with oneofs

* Cleanup oneof_enum test cases, they should be fixed

This _should_ address:
danielgtaylor#63

* Include default value oneof fields when serializing to bytes

This will cause oneof fields with default values to explicitly be sent
to clients. Note that does not mean that all fields are serialized and
sent to clients, just those that _could_ be null and are not.

* Remove assignment when populating a sub-message within a proto

Also, move setattr out one indentation level

* Properly transform proto with empty string in oneof to bytes

Also, updated tests to ensure that which_one_of picks up the set field

* Formatting betterproto/__init__.py

* Adding test cases demonstrating equivalent behaviour with google impl

* Removing a temporary file I made locally

* Adding some clarifying comments

* Fixing tests for python38
@abn abn mentioned this pull request Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants