Allow genqlient types to be marshaled safely #120

benjaminjkraft · 2021-09-29T05:07:05Z

Summary:

When genqlient generates output types, it generates whatever code is
necessary to unmarshal them. Conversely, when it generates input types,
it generates whatever code is necessary to marshal. This is all that's
needed for genqlient itself: it never needs to marshal output types or
unmarshal input types.

But maybe you do! (For example, to put the responses in a cache, which
is the use case that @csilvers hit at Khan, although there are others
one can imagine.) While we can't support every serialization format you
might want (at least not without adding plugins or some such), it's not
unreasonable to expect that since genqlient can read JSON, it can write
it too. Sadly, in the past this was not true for types requiring custom
unmarshaling logic, for several reasons.

In this commit I implement logic to always write both marshalers and
unmarshalers whenever they're needed to be able to correctly round-trip
the types, even though genqlient doesn't do so. I wasn't starting from
scratch, since of course we already write both marshalers and
unmarshalers in some cases. But this ended up requiring surprisingly
large changes on the marshaling side, mostly to correctly support
embedding (which we use for named fragments).

Specifically, as the comments in types.go discuss, the most difficult
issue is spreads with duplicate fields, which translate to Go embedded
fields which end up hidden from the json-marshaler. Ultimately, I had
to do things quite differently from unmarshaling, and essentially
flatten the type when we write marshaler. But in the end it's not so
ugly -- indeed arguably it's cleaner! Mainly it's just different.

One thing to note is that we do marshal __typename based on
what we know about the types; users need not fill it in (and if they
do we'll ignore it). This seemed to me to be a better UX, and
didn't add much complexity.

In general, I begin to wonder whether using encoding/json at all is
really right for genqlient: we're doing a lot of work to appease it,
despite knowing what our types look like. I think it would still be a
significant increase in lines of code to roll our own, but that code
would perhaps be simpler, and would surely be faster (although if we
just want the speed gains we could use another JSON-generator library,
see also #47). Anyway, something to think about in the future.

Test plan:

make tesc

@csilvers

When genqlient generates output types, it generates whatever code is necessary to unmarshal them. Conversely, when it generates input types, it generates whatever code is necessary to marshal. This is all that's needed for genqlient itself: it never needs to marshal output types or unmarshal input types. But maybe you do! (For example, to put the responses in a cache, which is the use case that @csilvers hit at Khan, although there are others one can imagine.) While we can't support every serialization format you might want (at least not without adding plugins or some such), it's not unreasonable to expect that since genqlient can read JSON, it can write it too. Sadly, in the past this was not true for types requiring custom unmarshaling logic, for several reasons. In this commit I implement logic to always write both marshalers and unmarshalers whenever they're needed to be able to correctly round-trip the types, even though genqlient doesn't do so. I wasn't starting from scratch, since of course we already write both marshalers and unmarshalers in some cases. But this ended up requiring surprisingly large changes on the marshaling side, mostly to correctly support embedding (which we use for named fragments). Specifically, as the comments in `types.go` discuss, the most difficult issue is spreads with duplicate fields, which translate to Go embedded fields which end up hidden from the json-marshaler. Ultimately, I had to do things quite differently from unmarshaling, and essentially flatten the type when we write marshaler. But in the end it's not so ugly -- indeed arguably it's cleaner! Mainly it's just different. In general, I begin to wonder whether using `encoding/json` at all is really right for genqlient: we're doing a lot of work to appease it, despite knowing what our types look like. I think it would still be a significant increase in lines of code to roll our own, but that code would perhaps be simpler, and would surely be faster (although if we just want the speed gains we could use another JSON-generator library, see also #47). Anyway, something to think about in the future. Test plan: make tesc Reviewers: marksandstrom, steve, adam, miguel, mahtab, jvoll

csilvers · 2021-09-29T12:07:23Z

generate/types.go

+// Those rules don't work for us.  When unmarshaling, we want to fill in all
+// the potentially-matching fields (QT.A.Id and QT.B.Id in this case), and when
+// marshaling, we want to always marshal exactly one potentially-conflicting
+// field; we're happy to use the Go visibility rules when they apply. such field, choosing an


I think the text got a bit garbled here.

csilvers · 2021-09-29T12:12:45Z

internal/testutil/types.go

+	if t == nil || t.IsZero() {
+		return []byte("null"), nil
+	}


Why did you add this back in?

The t == nil should be unnecessary, I just put it there for completeness. But t.IsZero() we do want, to avoid marshaling a time.Time field to "0001-01-01", since genqlient matches encoding/json in counting zero structs as nonempty for purposes of omitempty.

csilvers · 2021-09-29T12:14:27Z

I'm probably not the right person to review this, but it all seems plausible to me! It is indeed tricky stuff, but it's commented really well.

I do feel like the code would be easier to understand in some ways, actually, if you codegen-ed the json marshal/unmarshal code; you're kinda doing that already for the hard cases. But I agree it would make the size of these generated files explode -- even for the common (?) use-cases where people don't have these fancy embeds and interfaces -- which is a consideration in itself.

StevenACoffman

This took me a while to read through, and I think it's great, and we should merge it. I did wonder if we could offload some of your heavy lifting to a library... but other than possibly the one I slacked about, I didn't casually find one that was an obvious candidate.

benjaminjkraft · 2021-09-29T17:09:48Z

I do feel like the code would be easier to understand in some ways, actually, if you codegen-ed the json marshal/unmarshal code; you're kinda doing that already for the hard cases.

Yeah, that's my theory. I think there's a lot of stuff to write to handle things like omitempty as well as the basics of encoding strings, bools, numbers, etc., all of which we're avoiding now, so I think even just for the genqlient code it would be a big increase in the quantity of code. But I think it would all be much simpler. I may prototype it if I'm bored sometime, and see how it looks.

But I agree it would make the size of these generated files explode -- even for the common (?) use-cases where people don't have these fancy embeds and interfaces -- which is a consideration in itself.

Yeah, I don't know how much of an issue that will be for folks. It's probably a nonzero perf/binary-size impact, too, in that I don't think the Go compiler/linker can be very smart about pruning methods since you might (and indeed encoding/json does) call them via reflection. This PR already adds quite a bit; I think if it becomes an issue for anyone we could add an option to go back to the old behavior of generating marshalers only for input types and unmarshalers only for output types. I think generating both is the better "batteries-included" default; people who care a lot about binary size should expect to need to tune some things. But with generating the json-encoding code, we wouldn't want to have to support both ways.

I did wonder if we could offload some of your heavy lifting to a library... but other than possibly the one I slacked about, I didn't casually find one that was an obvious candidate.

Yeah, thanks for raising that option. I was thinking about how I feel I am learning more and more places where the stdlib API leaves a lot to be desired, but I didn't think to actually look for what others have done. Sadly, as I wrote on #120, as far as I can tell none of the ones we've found will help a lot with the marshaling side, where embedding is the real challenge, rather than interfaces and custom marshalers. (I mean, both require nontrivial code, but the latter is just copied from unmarshaling, so if you went to the pains to wrap your head around what we do there, it's not so bad.) My feeling is that on the whole it doesn't make sense (at least not just for the code-complexity gains; if we used go-codec the perf benefits might be worth it). But I think in any case since the biggest benefits are on the unmarshaling side, I'd want to do it as a separate PR.

dnerdy

The two-pass marshaling process is an interesting solution! A custom marshaler does seem like it could simplify the code, and that would be a good approach to explore.

If your type implements an interface, we add getter methods for the shared fields, so that those may be accessed via the interface. But it turns out occasionally it's useful to have these getter methods when they don't implement a GraphQL interface, so you can use two genqlient-generated types in the same function if they have the same fields. (This comes up most often when you have a GraphQL union that maybe should really be an interface, or if you don't yet support interfaces implementing other interfaces (indeed our parser doesn't either). But one can imagine other use cases.) We can't predict how you want to do that, so we can't generate the interface, but we can generate the methods, so you can define the interface and do a type assertion from there. Since these methods are pretty simple to generate, we just do it always. (As with #120, if binary size becomes an issue we could later add an option to only generate methods that are truly needed but including them seems like the better default.) This also fixes a subtle and rare bug, which would have become much more common (indeed existing tests caught it). Specifically, if you have a query like ```graphql fragment FragmentOne on T { id } fragment FragmentTwo on T { id } query Q { f { # interface type T ...FragmentOne ...FragmentTwo } } ``` since both `FragmentOne` and `FragmentTwo` request some common field, say `id`, we generate a method `GetId` on each one. But since `FragmentOne` and `FragmentTwo` are both on `T`, we also include their interfaces in the interface we generate for the type of `f`, `QFT`. So `QFT` includes a method `GetId`. But on the implementations, the two methods conflict, and neither gets promoted; this causes various code to fail to compile. With this change, this would have happened much more frequently -- even if only one of the two fragments is on `T`, as long as both request the field. Anyway, we now generate explicit methods on each struct for all of its recursively emebedded fields -- using the logic from #120 to compute them -- so that we don't need to rely on method-promotion. Test plan: make tesc Reviewers: steve, marksandstrom, csilvers, jvoll, adam, miguel, mahtab

## Summary: If your type implements an interface, we add getter methods for the shared fields, so that those may be accessed via the interface. But it turns out occasionally it's useful to have these getter methods when they don't implement a GraphQL interface, so you can use two genqlient-generated types in the same function if they have the same fields. (This comes up most often when you have a GraphQL union that maybe should really be an interface, or if you don't yet support interfaces implementing other interfaces (indeed our parser doesn't either). But one can imagine other use cases.) We can't predict how you want to do that, so we can't generate the interface, but we can generate the methods, so you can define the interface and do a type assertion from there. Since these methods are pretty simple to generate, we just do it always. (As with #120, if binary size becomes an issue we could later add an option to only generate methods that are truly needed but including them seems like the better default.) This also fixes a subtle and rare bug, which would have become much more common (indeed existing tests caught it). Specifically, if you have a query like ```graphql fragment FragmentOne on T { id } fragment FragmentTwo on T { id } query Q { f { # interface type T ...FragmentOne ...FragmentTwo } } ``` since both `FragmentOne` and `FragmentTwo` request some common field, say `id`, we generate a method `GetId` on each one. But since `FragmentOne` and `FragmentTwo` are both on `T`, we also include their interfaces in the interface we generate for the type of `f`, `QFT`. So `QFT` includes a method `GetId`. But on the implementations, the two methods conflict, and neither gets promoted; this causes various code to fail to compile. With this change, this would have happened much more frequently -- even if only one of the two fragments is on `T`, as long as both request the field. Anyway, we now generate explicit methods on each struct for all of its recursively emebedded fields -- using the logic from #120 to compute them -- so that we don't need to rely on method-promotion. ## Test plan: make tesc Author: benjaminjkraft Reviewers: csilvers, dnerdy, aberkan, jvoll, mahtabsabet, MiguelCastillo, StevenACoffman Required Reviewers: Approved By: csilvers, dnerdy Checks: ✅ Test (1.17), ✅ Test (1.16), ✅ Test (1.15), ✅ Test (1.14), ✅ Lint, ✅ Test (1.17), ✅ Test (1.16), ✅ Test (1.15), ✅ Test (1.14), ✅ Lint Pull Request URL: #126

benjaminjkraft self-assigned this Sep 29, 2021

benjaminjkraft requested review from aberkan, dnerdy, jvoll, mahtabsabet, MiguelCastillo and StevenACoffman September 29, 2021 05:07

benjaminjkraft mentioned this pull request Sep 29, 2021

Use a different JSON library #47

Open

Fix test failures -- dastardly template whitespace diffs!

833fb9e

benjaminjkraft mentioned this pull request Sep 29, 2021

Include __typename on interface-typed fragments #111

Closed

csilvers reviewed Sep 29, 2021

View reviewed changes

StevenACoffman approved these changes Sep 29, 2021

View reviewed changes

dnerdy approved these changes Sep 29, 2021

View reviewed changes

fix a few comments

d069bb4

benjaminjkraft merged commit f4c9810 into main Sep 29, 2021

benjaminjkraft deleted the benkraft.marshal branch September 29, 2021 17:30

benjaminjkraft mentioned this pull request Oct 1, 2021

Add getter methods to all fields, not just where needed #126

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow genqlient types to be marshaled safely #120

Allow genqlient types to be marshaled safely #120

benjaminjkraft commented Sep 29, 2021 •

edited

Loading

csilvers Sep 29, 2021

csilvers Sep 29, 2021

benjaminjkraft Sep 29, 2021

csilvers commented Sep 29, 2021

StevenACoffman left a comment

benjaminjkraft commented Sep 29, 2021

dnerdy left a comment

Allow genqlient types to be marshaled safely #120

Allow genqlient types to be marshaled safely #120

Conversation

benjaminjkraft commented Sep 29, 2021 • edited Loading

Summary:

Test plan:

csilvers Sep 29, 2021

Choose a reason for hiding this comment

csilvers Sep 29, 2021

Choose a reason for hiding this comment

benjaminjkraft Sep 29, 2021

Choose a reason for hiding this comment

csilvers commented Sep 29, 2021

StevenACoffman left a comment

Choose a reason for hiding this comment

benjaminjkraft commented Sep 29, 2021

dnerdy left a comment

Choose a reason for hiding this comment

benjaminjkraft commented Sep 29, 2021 •

edited

Loading