Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove runtime dependency on Protobuf #1386

Closed
aslakhellesoy opened this issue Feb 25, 2021 · 18 comments
Closed

Remove runtime dependency on Protobuf #1386

aslakhellesoy opened this issue Feb 25, 2021 · 18 comments

Comments

@aslakhellesoy
Copy link
Contributor

aslakhellesoy commented Feb 25, 2021

When we adopted protobuf as the basis for messages, we had 3 goals:

  • Be able to define messages with an IDL
  • Wide programming language support
  • JSON serialisation format (for debuggability)

Protobuf satisfies all of those, but has some drawbacks.

The primary drawback is runtime dependencies. The protoc code generator (or similar code generators in unofficial ports) generates code that depends on protobuf runtime libraries. This poses two challenges:

  • Size. Loading all this code into a browser take a long time because the resulting code is megabytes.
  • Compatibility. A project depending on a different version of protobuf than the one used by Cucumber might lead to conflicts

Another challenge with protobuf is maintenance cost. Because we want to inspect messages visually, we've decided to use the JSON serialisation format instead of the more commonly used binary representation. While JSON is "supported" by protobuf, the implementations have several bugs that we've had to work around or fix. These bugs exist because few people use protobuf with JSON.

During the weekly community meeting on 25 Feb 2021 two alternative solutions were put forward:

Keep .proto but use custom code generator

We could implement a custom code generator that parses the messages.proto file and reads a programming language-specific template file for code generation.

The code generator would generate code with an interface similar to the code generated by protoc, but without any runtime dependencies. It would only support JSON serialisation and not the binary representation.

For TypeScript for example, this could be just messages.d.ts (just the interface) and no implementation at all. We'd get type safety at compile time, and wouldn't need any runtime dependencies for serialisation (that would be handled by the stdlibrary JSON functions).

JSON Schema

(OpenAPI was suggested. OpenAPI has a lot of HTTP stuff in it that isn't relevant to us, and my understanding is that it relies on JSON Schema to define the schema of request/response payloads).

We could translate the .proto schema to JSON Schema and use existing tools to generate lightweight code. Here are some existing ones:

For dynamically typed languages such as Ruby, we wouldn't need code generation or any runtime dependencies at all.

@mpkorstanje
Copy link
Contributor

So we can generate the POJO's from the schema, how will the POJO's be serialized to JSON?

There are a few tricky things around date/time formats that may need to be looked into.

@laeubi
Copy link

laeubi commented Feb 25, 2021

I recently found the "protonature" of the messages API very useful. At least there should be public API that allows to read/write the messages from/to stream so the could be transfered-over-the-wire, e.g. I use that to "remote-debugging" cucumber runs. It should also be considered that code that currently uses the generated proto-messages have to be rewritten the generated methods do not match.

@davidjgoss
Copy link
Contributor

davidjgoss commented Feb 25, 2021

+1 for JSON schema

So we can generate the POJO's from the schema, how will the POJO's be serialized to JSON?

Are we already using jackson or gson or similar in cucumber-jvm?

There are a few tricky things around date/time formats that may need to be looked into.

I don't think we represent date(time)s in messages though, just timestamps expressed as numbers of seconds and nanos?

@davidjgoss
Copy link
Contributor

One thing to watch out for will be treatment of enum values. With OAI/JSON schema they are generally represented as the name (string), but I think with protobuf they serialise as the ordinal.

@laeubi
Copy link

laeubi commented Feb 25, 2021

Are we already using jackson or gson or similar in cucumber-jvm?

Isn't this replacing one dependency by another?

@aslakhellesoy
Copy link
Contributor Author

but I think with protobuf they serialise as the ordinal.

The binary representation uses the ordinal, but the JSON representation uses the name. (This was buggy in some protobuf implementations, and we had to fix it).

@aslakhellesoy
Copy link
Contributor Author

Isn't this replacing one dependency by another?

We’d be using the same JSON library we’re already using in other parts of the code. That library is shaded and won’t conflict if user code also uses it.

@aslakhellesoy
Copy link
Contributor Author

At least there should be public API that allows to read/write the messages from/to stream so the could be transfered-over-the-wire

Agreed. We’ll keep the stream utilities that reads/writes message objects to/from streams.

@laeubi
Copy link

laeubi commented Feb 25, 2021

Just s

We’d be using the same JSON library we’re already using in other parts of the code. That library is shaded and won’t conflict if user code also uses it.

Just some thought

  • maybe protobuf can be shaded too?
  • instead of create own parsers/generator the protobuf json support could be enhanced

We’ll keep the stream utilities that reads/writes message objects to/from streams

👍

@aslakhellesoy
Copy link
Contributor Author

maybe protobuf can be shaded too?

I think it already is.

The runtime library size problem for browser JavaScript I described above remains until the protobuf runtime is replaced with something lightweight.

Due to the cross platform nature of this project that means protobuf will have to be replaced for all platforms.

@aslakhellesoy
Copy link
Contributor Author

aslakhellesoy commented Feb 25, 2021

instead of create own parsers/generator the protobuf json support could be enhanced

That’s what we have done. We’ve submitted pull requests for existing protobuf implementations. IIRC one of the protobuf implementations had been abandoned, so we had to fork.

This is not a case of NIH.

@aslakhellesoy
Copy link
Contributor Author

@laeubi here are the two PRs to improve JSON support in the protobuf gem. They still haven't been merged, which is why we forked and released protobuf-cucumber. This has a high maintenance cost as we'll have to update our fork when/if security patches are released for example.

@ehuelsmann
Copy link
Contributor

This would remove a major complexity from the Perl implementation too.

@mattwynne
Copy link
Member

mattwynne commented May 20, 2021

There are some docs that need to be updated, around here.

@ehuelsmann
Copy link
Contributor

The current Perl implementation doesn't have a runtime dependency on protobuf. Does that mean I can remove the language:perl tag, or is there additional work that this issue has turned into now?

@aurelien-reeves
Copy link
Contributor

The current Perl implementation doesn't have a runtime dependency on protobuf. Does that mean I can remove the language:perl tag, or is there additional work that this issue has turned into now?

Thanks for the tip. I've removed the label.

Also, we may audi the ruby code, but I think it is now free from any dependency to protobuf.

@aurelien-reeves
Copy link
Contributor

@aslakhellesoy @mattwynne @mpkorstanje could we close that one?

Also it is labeled "library: react". Was that issue only dedicated for the "cucumber-react"?

@aslakhellesoy
Copy link
Contributor Author

The dotnet and elixir implementations still use protobuf (and may have fallen behind the evolution of the JSON Schema). I suggest we close this issue after creating two new issues for dotnet and elixir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants