Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renamed message used in field incorrectly treated as wire protocol break #609

Closed
rickardp opened this issue Sep 27, 2021 · 11 comments
Closed

Comments

@rickardp
Copy link

rickardp commented Sep 27, 2021

Best shown in the two new unit tests I added here
https://github.com/bufbuild/buf/compare/main...rickardp:compatible-rename-test?expand=1
With this issue I suggest that both these tests should pass. Currently, the TestRunBreakingFieldWireCompatibleMessageTypeRename test fails, which I believe is wrong.

While I am aware that renaming messages breaks dynamic types etc, for many cases it is an allowed (and needed) modification. I believe that the FIELD_WIRE_COMPATIBLE_TYPE check should allow these cases and not, as now, only compare the names of the message types.

Obviously this breaks generated code and JSON so this change would only make sense in the WIRE rule set.

@bufdev
Copy link
Member

bufdev commented Sep 27, 2021

Breaking tests have files in testdata and testdata_previous, I don't see what the tests are trying to show - can you clarify?

@rickardp
Copy link
Author

I might have missed something w.r.t the structure of the test data, but the testdata_previous has

message A {
  MessageOne one = 1;
  ...
}

message MessageOne {
  int32 one = 1;
  string two = 2;
}
...

and testdata has

message A {
  MessageOneX one = 1;
  ...
}

message MessageOneX {
  int32 one = 1;
  string two = 2;
}
...

The two messages MessageOne and MessageOneX are identical and thus this change will not break the wire protocol (unless the "Any" type is used IIRC). The WIRE check does not detect this fact and opting out of FIELD_WIRE_COMPATIBLE_TYPE this check would cause the check to miss true protocol breaks (i.e. if MessageOne and MessageOneX are indeed wire incompatible, which the second test is trying to capture).

@bufdev
Copy link
Member

bufdev commented Sep 27, 2021

You've changed the message name - there's no way for buf to detect that these are the same messages. Fields are detected within an equivalent message, buf can't account for renames - more to the point, there's no way that buf could realistically account for renames of messages.

@rickardp
Copy link
Author

rickardp commented Sep 27, 2021

Yes, that is the point. Renaming messages is a valid non-wire-breaking operation (though dangerous, which is why automatic checks such as "buf breaking" is most needed).
I believe there would indeed be a way to improve this: Protobuf in this context is more or less duck-typed. If a message gets renamed (or even changed into an already existing, but potentially compatible type) the only thing that needs to be done is to check that the previous field type is compatible with the field type. The point is, no one needs to care if the message was renamed, copied, or the field was remapped onto an existing message. The compatibility check here is at the field level, not at the message level.

You already implemented this check (sort of) for enums in the checkEnumWireCompatibleForField function, so I believe the the only thing missing is a checkGroupMessageWireCompatibleForField that applies the message wire checks recursively (not sure how the latter would be done though, as I just glanced the code quickly).

@bufdev
Copy link
Member

bufdev commented Sep 27, 2021

We aren't going to change this unfortunately - fields are detected within a given message, and a message is identified with it's name.

@bufdev bufdev closed this as completed Sep 27, 2021
@rickardp
Copy link
Author

Please see my updated comment. This is a field check, not a message check. I believe renaming/re-mapping messages are one of the potentially most error-prone things you can do in protobuf (one could argue you shouldn't of course, but that's not the point here). When doing this, having tooling support is very important.

@rickardp
Copy link
Author

rickardp commented Sep 27, 2021

And also, may I ask, why do you do this for enums if renames are out of scope? (#80)

@bufdev
Copy link
Member

bufdev commented Sep 27, 2021

Enums are just a list of enum values, and are actually compatible with integers in terms of serialization - messages have many more properties. We felt it was reasonable to say that enums can move around, but for messages, the concept of equivalence on rename is not as similar. Message are identified by their name, while enums (in effect) are identified by the list of their values.

@rickardp
Copy link
Author

rickardp commented Sep 27, 2021

Thank you for explaining your reasoning.

Message are identified by their name, while enums (in effect) are identified by the list of their values.

While I see your point, IMHO this is not entirely correct in terms of the wire protocol. The wire protocol identifies a message by the collection of their fields, the name itself has no meaning in the wire protocol (except in some edge cases).

I tried to implement such a check now, and while it is possible and solves the usecases I had (mimicing the logic of the enum check), I can totally understand if this something you do not want to maintain (specifically, one would have to reevaluate the relevant wire rules recursively). If there is any interest, I could try to write up a PR (assuming it has any chance of getting merged once it meets your standards).

I understand that buf is slightly opinionated in how and what should be done in terms of when to do breaking changes (among other things), but I also believe that you are very close to making the WIRE checks work according to the protobuf official documentation.

@bufdev
Copy link
Member

bufdev commented Sep 28, 2021

Fundamentally, this isn’t a matter of correctness - we would argue that trying to detect equivalent messages of different names would lead to more breaking issues, not less. We think sticking to message name properly implements breaking change detection. We would not accept a PR that changes this.

@rickardp
Copy link
Author

To clarify, I was suggesting that my proposed check was only run after a failed name check, instead of reporting it as an issue, so it should never produce more issues. But I see two downsides of my suggestion, added complexity and less intuitive error messages in some situations (since it is impossible to detect if it was a name change or a swap to an unrelated type). I can understand if you do not want this added complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants