Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Proto3 Field Presence #50

Merged
merged 2 commits into from
Apr 20, 2021
Merged

Conversation

sneako
Copy link
Contributor

@sneako sneako commented Apr 15, 2021

Closes #49

Rather than generating functions like has_#{field_name} to track presence, as suggested in this document, I am simply using the value nil in the case where an optional field was not present. I think this should be alright, because in normal case, where fields are not optional and therefore set to the default value according to their type, none of them use nil.

The strategy I used was to change the field label to :proto3_optional when proto3_optional == true in the %Protox.Google.Protobuf.FieldDescriptorProto{} as it seemed to be the least intrusive change to make.

I also considered adding a new field to the 5 element tuple that is used internally to represent fields, rather than reusing the label field, but I think that would have been more intrusive.

Have you considered refactoring the 5-tuple to a struct? It could possibly be more future proof and easier to pass around than the tuple.

I tried to follow the instructions in the README to run the conformance tests, but it seems something has changed so I was not able to run them yet, but otherwise the new code I added is covered by my additions to the ExampleTest. I'm happy to add more tests if you'd like. EDIT: It appears that the CI runs these for you 😄

Looking forward to hearing your thoughts!

@coveralls
Copy link

coveralls commented Apr 15, 2021

Coverage Status

Coverage increased (+0.01%) to 99.561% when pulling 132f834 on sneako:add-proto3-optional into 0be185c on ahamez:master.

@sneako sneako marked this pull request as ready for review April 16, 2021 07:54
@ahamez
Copy link
Owner

ahamez commented Apr 16, 2021

First, thank you so much! I'll do a more thorough review this weekend, but at a glance, I think everything's fine!

Closes #49

Rather than generating functions like has_#{field_name} to track presence, as suggested in this document, I am simply using the value nil in the case where an optional field was not present. I think this should be alright, because in normal case, where fields are not optional and therefore set to the default value according to their type, none of them use nil.

Completely agree. Moreover, if the need arises for has_xxx functions, it should not be too difficult to add them in the future.

The strategy I used was to change the field label to :proto3_optional when proto3_optional == true in the %Protox.Google.Protobuf.FieldDescriptorProto{} as it seemed to be the least intrusive change to make.

I also considered adding a new field to the 5 element tuple that is used internally to represent fields, rather than reusing the label field, but I think that would have been more intrusive.

Have you considered refactoring the 5-tuple to a struct? It could possibly be more future proof and easier to pass around than the tuple.

No, but now that you're mentioning it, it makes sense 😅. This 5-tuple makes it quite difficult to follow the code, I was never very happy with it…

I tried to follow the instructions in the README to run the conformance tests, but it seems something has changed so I was not able to run them yet, but otherwise the new code I added is covered by my additions to the ExampleTest. I'm happy to add more tests if you'd like. EDIT: It appears that the CI runs these for you 😄

OK, I will check as well that the instructions are still correct. Is there an error message you can share?

Looking forward to hearing your thoughts!

@sneako
Copy link
Contributor Author

sneako commented Apr 17, 2021

Fantastic!

Forget what I said about the conformance tests, I went through the steps again just now and got them running, not sure what happened the other day 😄

$ mix protox.conformance --runner=../conformance/protobuf-3.15.1/conformance/conformance-test-runner
Generated escript protox_conformance with MIX_ENV=dev

CONFORMANCE TEST BEGIN ====================================

CONFORMANCE SUITE PASSED: 1302 successes, 711 skipped, 0 expected failures, 0 unexpected failures.


CONFORMANCE TEST BEGIN ====================================

CONFORMANCE SUITE PASSED: 0 successes, 119 skipped, 0 expected failures, 0 unexpected failures.

README.md Outdated
@@ -320,7 +320,7 @@ Note that protox will still correctly parse unknown fields, they just won't be a
```
It means that if you need to know if a field has been set by the sender, you just have to test if its value is `nil` or not.

* For Protobuf 3, unset optional fields are mapped to their default values, as mandated by the [Protobuf spec](https://developers.google.com/protocol-buffers/docs/proto3#default):
* For Protobuf 3, unset fields are mapped to their [default values](https://developers.google.com/protocol-buffers/docs/proto3#default). However, if you use the `optional` keyword, then unset fields will be mapped to `nil`:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could document that it requires at least protobuf 3.15.

|> Enum.map(fn {tag, _, name, kind, type} ->
single = make_single_case(msg_var, keep_set_fields, tag, name, kind, type)
delimited = make_delimited_case(msg_var, keep_set_fields, single, tag, name, kind, type)
|> Enum.map(fn field ->
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much more readable than before, thank you 😉

ahamez
ahamez previously requested changes Apr 18, 2021
Copy link
Owner

@ahamez ahamez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test that an optional field is indeed serialized as a oneof, I wrote the following test:

defmodule OptionalProto3Test do
  use ExUnit.Case

  use Protox,
    schema: """
      syntax = "proto3";

      message Msg1 {
        optional int32 foo = 1;
      }

      message Msg2 {
        oneof _foo {
          int32 foo = 1;
        }
      }
    """

  test "An proto3 optional field is encoded as a oneof" do
    msg1 = %Msg1{foo: 1}
    msg2 = %Msg2{_foo: {:foo, 1}}

    encoded_msg1 = msg1 |> Msg1.encode!() |> :binary.list_to_bin()
    encoded_msg2 = msg2 |> Msg2.encode!() |> :binary.list_to_bin()

    assert encoded_msg1 == encoded_msg2
  end

  test "An unset proto3 optional field is not serialized" do
    explicit_nil = %Msg1{foo: nil}
    implicit_nil = %Msg1{}

    encoded_explicit_nil = explicit_nil |> Msg1.encode!() |> :binary.list_to_bin()
    encoded_implicit_nil = implicit_nil |> Msg2.encode!() |> :binary.list_to_bin()

    assert encoded_explicit_nil == <<>>
    assert encoded_implicit_nil == <<>>
  end
end

However, it fails with the following message:

  1) test An unset proto3 optional field is not serialized (OptionalProto3Test)
     test/optional_proto3_test.exs:30
     ** (KeyError) key :_foo not found in: %Msg1{__uf__: [], foo: nil}. Did you mean one of:

           * :foo

     code: encoded_implicit_nil = implicit_nil |> Msg2.encode!() |> :binary.list_to_bin()
     stacktrace:
       test/optional_proto3_test.exs:4: Msg2.encode__foo/2
       test/optional_proto3_test.exs:4: Msg2.encode!/1
       test/optional_proto3_test.exs:34: (test)

It seems that when the optional field is not explicitly set, the synthetic name (_foo) is used.

@ahamez
Copy link
Owner

ahamez commented Apr 18, 2021

Also, when looking at the documentation in descriptor.proto:

[...] Synthetic oneofs must be ordered after all "real" oneofs.

Do you think they mean in the serialized output? TBH, I don't know if it's really important…

@sneako
Copy link
Contributor Author

sneako commented Apr 19, 2021

Thanks! These are definitely good test cases to add.

I'll figure out what is going on with the failing test.

RE: Synthetic oneofs must be ordered after all "real" oneofs.

I can serialize an example message with one of the official protobuf libs and compare the binary to what this PR produces. Will report back soon 👍

@sneako
Copy link
Contributor Author

sneako commented Apr 19, 2021

Looking at the failing test case, isn't this failing because you are calling Msg2.encode/1 on a %Msg1{}? It seems correct to me that Msg2.encode/1 is looking for _foo since in this case it is not actually a synthetic oneof.

@ahamez
Copy link
Owner

ahamez commented Apr 19, 2021

Looking at the failing test case, isn't this failing because you are calling Msg2.encode/1 on a %Msg1{}? It seems correct to me that Msg2.encode/1 is looking for _foo since in this case it is not actually a synthetic oneof.

You're absolutely right! Sorry 😬 With the correct name, the test passes!

@sneako
Copy link
Contributor Author

sneako commented Apr 19, 2021

RE: The order of synthetic oneofs.
I defined these two messages:

message MsgWithOptional {
  int32 a = 1;
  optional int32 b = 2;
  int32 c = 3;
}

message MsgWithoutOptional {
  int32 a = 1;
  int32 b = 2;
  int32 c = 3;
}

Then generated the js code for them: protoc --js_out=import_style=commonjs,binary:. example.proto and then used the official js library to serialize them, writing the output to files:

var msg = new messages.MsgWithOptional();
msg.setA(1);
msg.setB(2);
msg.setC(3);
var serialized = msg.serializeBinary();
fs.writeFile("js_encoded_msg_with_optional", serialized, function(err) { console.log(err)});

var msg = new messages.MsgWithoutOptional();
msg.setA(1);
msg.setB(2);
msg.setC(3);
var serialized = msg.serializeBinary();
fs.writeFile("js_encoded_msg_without_optional", serialized, function(err) { console.log(err)});

And then checked the written bytes:

iex(4)> File.read!("js_encoded_msg_with_optional")
<<8, 1, 16, 2, 24, 3>>
iex(5)> File.read!("js_encoded_msg_without_optional")
<<8, 1, 16, 2, 24, 3>>

So it seems that serialized order of the fields does not change when using this library.

However, running a similar test in protox does yield output with a different order 🤔

    msg_with_optional = %MsgWithOptional{a: 1, b: 2, c: 3}
    msg_without_optional = %MsgWithoutOptional{a: 1, b: 2, c: 3}

    IO.inspect(MsgWithOptional.encode!(msg_with_optional) |> IO.iodata_to_binary())
    IO.inspect(MsgWithoutOptional.encode!(msg_without_optional) |> IO.iodata_to_binary())

output:

<<16, 2, 8, 1, 24, 3>>
<<8, 1, 16, 2, 24, 3>>

I'll try this out with another official implementation and report back with the results.

@sneako
Copy link
Contributor Author

sneako commented Apr 19, 2021

Got the same results with the official Ruby library as well (<<8, 1, 16, 2, 24, 3>>), so I'll make sure Protox encodes them like that in both cases as well.

@sneako
Copy link
Contributor Author

sneako commented Apr 19, 2021

According to this the field order should not matter, so we might not have to worry about this.
It seems this change of order stems from how oneof encoding functions are prepared first, before the rest of the field encoding functions.
If we wanted to make sure the order of the fields matches the encoding order, we could make some changes in DefineEncoder. What do you think?

@ahamez ahamez dismissed their stale review April 19, 2021 20:31

The test was incorrect

@ahamez
Copy link
Owner

ahamez commented Apr 19, 2021

RE: The order of synthetic oneofs.
[...]
However, running a similar test in protox does yield output with a different order 🤔

    msg_with_optional = %MsgWithOptional{a: 1, b: 2, c: 3}
    msg_without_optional = %MsgWithoutOptional{a: 1, b: 2, c: 3}

    IO.inspect(MsgWithOptional.encode!(msg_with_optional) |> IO.iodata_to_binary())
    IO.inspect(MsgWithoutOptional.encode!(msg_without_optional) |> IO.iodata_to_binary())

output:

<<16, 2, 8, 1, 24, 3>>
<<8, 1, 16, 2, 24, 3>>

This is due to how the encoder ast is defined, oneof serialization functions are defined before all other fields.

According to this the field order should not matter, so we might not have to worry about this.
It seems this change of order stems from how oneof encoding functions are prepared first, before the rest of the field encoding functions.
If we wanted to make sure the order of the fields matches the encoding order, we could make some changes in DefineEncoder. What do you think?

Agreed, it makes senses to deserialize fields in whatever order they may be serialized (and protox has been written this way, as recommended by the link you gave), this is why I find it strange they require this order on synthetic oneof.
So, yes, let's keep things as is. If really this becomes a problem in the future (e.g. conformance tests check for this), we'll just have to change the order in which fields are serialized.

One last thing before we merge : I missed this documentation block in descriptor.proto about message fields:

  // For message fields, proto3_optional doesn't create any semantic change,
  // since non-repeated message fields always track presence. However it still
  // indicates the semantic detail of whether the user wrote "optional" or not.
  // This can be useful for round-tripping the .proto file. For consistency we
  // give message fields a synthetic oneof also, even though it is not required
  // to track presence. This is especially important because the parser can't
  // tell if a field is a message or an enum, so it must always create a
  // synthetic oneof.

I would like to write a test for this case as well to make sure everything's fine, I'll try to do it tomorrow (sorry, I might seem excessively cautious, but this also why, I hope, protox is a reliable library 😅).

@ahamez
Copy link
Owner

ahamez commented Apr 20, 2021

So, I write a few more tests, and everything's fine 👏! I will add them after the merge.
@sneako Do you want to add or change something before I merge this PR?

defmodule OptionalProto3Test do
  use ExUnit.Case

  use Protox,
    schema: """
      syntax = "proto3";

      message Msg1 {
        optional int32 foo = 1;
      }

      message Msg2 {
        oneof _foo {
          int32 foo = 1;
        }
      }

      message Msg3 {
        optional Msg1 foo = 1;
      }

      message Msg4 {
        oneof _foo {
          Msg1 foo = 1;
        }
      }
    """

  test "An proto3 optional field is encoded as a oneof" do
    msg1 = %Msg1{foo: 1}
    msg2 = %Msg2{_foo: {:foo, 1}}

    encoded_msg1 = msg1 |> Msg1.encode!() |> :binary.list_to_bin()
    encoded_msg2 = msg2 |> Msg2.encode!() |> :binary.list_to_bin()

    assert encoded_msg1 == encoded_msg2
  end

  test "An proto3 synthetic oneof can be decoded as an optional field" do
    msg1 = %Msg1{foo: 1}
    msg2 = %Msg2{_foo: {:foo, 1}}

    encoded_msg2 = msg2 |> Msg2.encode!() |> :binary.list_to_bin()

    assert Msg1.decode!(encoded_msg2) == msg1
  end

  test "An unset proto3 optional field is not serialized" do
    explicit_nil = %Msg1{foo: nil}
    implicit_nil = %Msg1{}

    assert explicit_nil |> Msg1.encode!() |> :binary.list_to_bin() == <<>>
    assert implicit_nil |> Msg1.encode!() |> :binary.list_to_bin() == <<>>
  end

  test "An proto3 optional empty message field is encoded as a oneof" do
    msg3 = %Msg3{foo: %Msg1{}}
    msg4 = %Msg4{_foo: {:foo, %Msg1{}}}

    encoded_msg3 = msg3 |> Msg3.encode!() |> :binary.list_to_bin()
    encoded_msg4 = msg4 |> Msg4.encode!() |> :binary.list_to_bin()

    assert encoded_msg3 == encoded_msg4
  end

  test "An proto3 optional non-empty message field is encoded as a oneof" do
    msg3 = %Msg3{foo: %Msg1{foo: -42}}
    msg4 = %Msg4{_foo: {:foo, %Msg1{foo: -42}}}

    encoded_msg3 = msg3 |> Msg3.encode!() |> :binary.list_to_bin()
    encoded_msg4 = msg4 |> Msg4.encode!() |> :binary.list_to_bin()

    assert encoded_msg3 == encoded_msg4
  end

  test "An unset proto3 optional message field is not serialized" do
    explicit_nil = %Msg3{foo: nil}
    implicit_nil = %Msg3{}

    assert explicit_nil |> Msg3.encode!() |> :binary.list_to_bin() == <<>>
    assert implicit_nil |> Msg3.encode!() |> :binary.list_to_bin() == <<>>
  end
end

@sneako
Copy link
Contributor Author

sneako commented Apr 20, 2021

Great news and thanks for adding these tests! I just added a note to the docs about proto3_optional being available in 3.15 and up as you suggested. If that looks good to you then I think this should be ready to merge.

@ahamez ahamez merged commit ff72919 into ahamez:master Apr 20, 2021
@ahamez
Copy link
Owner

ahamez commented Apr 20, 2021

Great! I'll release a new version in a few days. Thanks for the awesome work! 👍

@sneako sneako deleted the add-proto3-optional branch April 20, 2021 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proto3 Field Presence
3 participants