Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auraescript case-sensitivity issue #128

Closed
future-highway opened this issue Nov 22, 2022 · 7 comments · Fixed by #129
Closed

Auraescript case-sensitivity issue #128

future-highway opened this issue Nov 22, 2022 · 7 comments · Fixed by #129

Comments

@future-highway
Copy link
Contributor

It seems like differences in casing for the generated code is causing property/field values to be lost.

JS/TS use camelCase for properties while rust uses snake_case for fields.

@krisnova confirmed this as an issue on stream. A possible solution is to relax the casing requirements with another macro that tags all the fields with #[serde(alias = "camelCaseFieldName")] as Deno uses serde for serializing/deserializing.

*I actually have a macro for this, that probably needs a little adapting before I can contribute it. I'm just a little time constrained before the break, so wanted to leave this issue as a note for future me. For now, if anyone is trying stuff out, just use hardtoread field names so the casing will be the same throughout.

@neoeinstein
Copy link
Contributor

I know what the issue is here, and will have a PR that fixes this anyway. There's already a solution for this that comes from pbjson. Just using the #[serde] attributes blindly is going to lead to weird inconsistencies with the canonical serialized JSON form. pbjson fixes all that.

@future-highway
Copy link
Contributor Author

While we wait for the CLA bot I took a look at pbjson. I'm a little skeptical that it will solve this issue, because I don't think it is the serialization of JS/TS <-> proto that is the issue. I think the issue is at the JS/TS <-> Rust that Deno does. I'm looking forward to being pleasantly surprised by the PR.

But I'd like to probe your comment here if you have time to respond (at your convenience), somewhat for my own knowledge, but will help aurae too when we tackle #126.

My preferred approach for JSON serialization/deserialization has been to be liberal in what is accepted (i.e., camelCase, snake_case, etc.) to a degree, but be strict in the output (match the convention, i.e., camelCase fields, SCREAMIN_SNAKE_CASE enums, etc.). Your comment reads to me like you have insight on why that might be a bad idea. I appreciate any thoughts/input you might have.

@neoeinstein
Copy link
Contributor

In terms of the V8/Rust interop boundary, deno_core::op uses serde_v8 as the serializer/deserializer for non-fast-call-capable APIs. During the serialization into V8, Deno doesn't really care about any TS definition file. It just creates an object in the V8 runtime with properties named as specified in the serialization implementation. Thus, if the serialization specified to use snake_case names, then the object in V8 will have snake_case names. However, there is a mismatch because serde by default generates serializers that use the field names verbatim, and prost/tonic use the Rust standard snake_case names. Meanwhile the default code generation for ts-proto uses camelCase names (but can be configured to use the original snake_case if preferred). Because Auraescript uses the TS definition, it expects the script to use the camelCase convention, but through serialization, the actual V8 object has fields in the snake_case convention.

The same process applies on deserialization out of V8. The deserializer runs through the fields in the V8 object, but, with the default code generation, the new camelCased V8 object doesn't have any of the snake_cased fields that the derived serde deserializer is looking for. This is in part where pbjson comes in.

What you'll find is that the JSON serialization provided by pbjson is indeed a liberal reader for deserialization. It will accept both snake_case and the canonical camelCase when deserializing JSON. When dealing with the canonical JSON form of Protobuf, it is important not to be too liberal in what you accept, otherwise you might find yourself being bound by an edge of the "liberal" reader case which would be breaking to change, and we want the protobuf to define the schema (and through its canonical form, the valid JSON), not edge semantics of a particular JSON reader.

There may be a reason that you don't want to use pbjson: in order to have some custom serialization semantics that aren't built around plain old JSON objects. For example, if you're passing byte arrays, that would be more efficient to pass over the FFI layer without going through the base64 canonical serialization (and would probably require the TS harness you generate to include a MessageType.fromJSON/toJSON call with the canonical JSON semantics.

What you end up with is a problem where your gRPC API semantics are placing constraints on your FFI semantics, using the same data structure to cross multiple layers, kinda like exposing a data base structure directly to the API. You may actually prefer to have a distinct type for FFI (that way you can guarantee or require from TS that certain fields are inhabited, but which proto3 semantically allows to be empty). This would also allow you to provide the canonical JSON serialization for the gRPC API types, but allow you specify a more efficient serialization for the FFI interaction.

If you're not passing specialized types, and instead are primarily passing strings, integers, arrays, and objects, then the canonical JSON representation and the V8 FFI representation will overlap.

@neoeinstein
Copy link
Contributor

I know I just left a wall of text there, but wanted to provide another alternate option: Drop ts-proto and use our own protoc compiler plugin to generate TypeScript definitions, the Deno TS glue, and the Rust FFI glue with appropriate V8 serialization.

I am the author of protoc-gen-prost (and protoc-gen-tonic), and have experience writing custom protoc compiler plugins in Rust. It should be pretty straight forward, and then we'd have more control over the FFI boundary and could specify it more exactly. With ts-proto, users will see the Protobuf encode, decode, toJSON and fromJSON members on the interface types, which really have no relevance within auraescript. We'd also perhaps be able to do all the code generation in that plugin, rather than doing half with ts-proto, then the other half with a proc-macro.

@future-highway
Copy link
Contributor Author

Just to get some very initial thoughts down:

In my opinion, the main goal to keep in mind is that we want to stay at close to fully generated as possible for auraescript; small changes to the proto files should need a rebuild at most. That way development speed on the core functionality isn't hindered. Having looked at your comments only so far, I think your solutions meet that goal, which is great.

I've got no issue with finding another solution or dropping ts-proto (it's just what I was able to find). If we keep it, we can change the settings to eliminate toJSON/fromJSON, but they seem possibly useful. (I thought I already eliminated encode/decode).

Considering the two alternate solutions so far, buf gives me hesitation for multiple (potentially unfounded) reasons.

  1. I would rather rely on protoc as I think that is the standard and widely used generator.
  2. Does it give us a way to inject attributes like tonic/prost does? If not, it may limit our ability or force workarounds to generate other code using proc_macros, and limit our ability to keep rust lints on.
  3. Related to 2, we've been generating slightly different outputs between auraed and auraescript. The difference has been reduced since switching to Deno, potentially to the point of not mattering, but I'm not sure we should limit our ability to diverge again. I assume this can be solved by having multiple passes of the generator instead of the proposed aurae_proto crate. However, you're not the first to want a single crate for the proto output (I think feat(nix): Add support for building/running with nix #95), so the benefits may be adding up.

Admittedly, I don't know much about buf, except that I believe it is relatively new, and I'm happy to adapt to whatever others think is best. However, if we can use the lints from buf, whether we use it for more or not, I think that is a great idea.

A custom protoc plugin sounds appealing. I don't know how much effort is needed to make one or if we want to take on the maintenance cost, but I like that it will still use protoc.

krisnova pushed a commit that referenced this issue Nov 25, 2022
This change enables the crates to be compiled from source without
requiring the installation of `protoc` (if, for example, the crates
were published to a public crate repository for consumption by other
users).

The change introduces a new intermediate crate `aurae-proto` that
is depended upon by both `auraed` and `auraescript` rather than
having `tonic-build` re-compile the protobuf into each of the crates
independently. Generated code is checked in so that it is available
without needing any `build.rs` to invoke `npm` or other external
programs at crate compile-time.

In addition, this change fixes the issue with TypeScript and Rust
protoc-generated code not agreeing on the naming convention being
used for fields when serialized through a JSON interface. By using
`pbjson`, we correctly add the appropriate serializers/deserializers
to ensure that we get the correct cannonical JSON serialization for
protobuf types.

Generating these files now relies on a program `buf` which invokes
some remote plugins to generate Rust and TypeScript code. The `Makefile`
has been updated to provide two new commands: `proto` and `proto-lint`.
`proto` runs `buf generate` to generate code from the protobuf
schemas, while `lint` runs `buf lint` to provide linting recommendations
for the protobuf definitions. The linting is configurable, but I did not
update the schemas to match the `buf` defaults, nor try to configure
the linting to match the current _status quo_.

`buf` also provides a `buf format` command, which can be used to
ensure that the protobuf definition files follow a standard format,
similar to `rustfmt`, but this also has not been incorporated or run.

Fixes #128
@neoeinstein
Copy link
Contributor

I am 100% behind trying to be fully generated, and agree with the idea of using the protobuf schemas as the source of truth for everything downstream.

In regard to relying on buf vs. protoc, we can do both. Buf is effectively a driver for protoc for code generation, but also looks to build up a more complete ecosystem around protobuf. Thus, we can get rid of buf altogether and just string together the right incantations to protoc. Everything we can do via protoc, we can do via buf1, but we can also do more, linting, formating, and breaking-change detection among them. The key thing that I used here was "remote plugins". Effectively, instead of needing to install all the various plugins locally (protoc-gen-prost, protoc-gen-tonic, protoc-gen-ts_proto, and protoc-gen-doc), we can instead invoke them remotely from the Buf Schema Registry. This does require network access for code generation, but if we wanted something in the middle, we can use local plugins instead. We just need to make sure that we install all such plugins.

In this, I'm happy to back out buf, but we may want a Makefile entry that installs those code generation pre-requisites, and then make proto could drive protoc directly as appropriate. I tend to like the additional things that buf brings, but I completely understand if those aren't compelling at this point in the project.

On the code generation side of things, I am the author of protoc-gen-prost and protoc-gen-tonic, co-maintainer of prost, and I've authored additional internal protoc plugins based off of protoc-gen-prost. I believe that, with the pattern already established, the difficulty of creating a custom plugin is not that high, and likely a bit less difficult than writing a proc-macro.

A protoc compiler plugin receives a protobuf CodeGeneratorRequest, which includes options passed in the command line2, as well as the files we're that code generation is being requested for. From there, we can enumerate through the input protobuf files and messages, and generate arbitrary output files through the CodeGeneratorResponse that is passed back to protoc. The hardest part in that particular endeavor is pulling comments out of the source info3, but I've done this as well.

I think that a purpose-built protoc compiler plugin would give aurae the best level of control, and avoid it needing to generate intermediate TypeScript, parse it, and then modify and generate more TypeScript and Rust. We'd also have explicit control over the FFI boundary for V8 too, rather than relying on implicit guarantees, and the proc-macro could likely be retired in favor of this plugin.

Footnotes:
1: You may notice that buf.gen.yaml contains some demonstration of it providing additional command line options to the compiler plugins.
2: Both protoc-gen-prost and protoc-gen-tonic support all of the relevant configuration options that are provided by directly accessing prost-build or tonic-build. This was an explicit goal of these projects.
3: The SourceCodeInfo structure that holds these comments is stored in a parallel data structure as a sorted vector that is addressed in a non-intuitive manner.

@future-highway
Copy link
Contributor Author

First of all, thank you for all the info/context.

I finally had a chance to look at #129, and since it has been merged, I think we are ok with what we have for now. Let's see how things progress and where the pain points are (if any).

Some notes:

  • The proc-macro isn't currently actually parsing any TS, so I'm not concerned about that.
  • Re pbjson: I don't know if squeezing every bit of efficiency out of the TS <-> Rust conversions is necessary, so the code it generates is likely sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants