Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start of new types on the client #4254

Merged
merged 22 commits into from Jul 13, 2022
Merged

Start of new types on the client #4254

merged 22 commits into from Jul 13, 2022

Conversation

pbiggar
Copy link
Member

@pbiggar pbiggar commented Jul 9, 2022

This started as an attempt to move all the client types to match the backend ProgramTypes and RuntimeTypes, but it got to be a lot so I stopped early. So far, just all this really does is updates ID/TLIDs to use int32s, and fixes the fallout (and adds a few types that aren't used yet).

@pbiggar
Copy link
Member Author

pbiggar commented Jul 10, 2022

The integration tests are failing because some packages have uint64 tlids. In dev, the one package created has a 64bit tlid. In production, about half the packages have 64bit tlids.

It seems also that 1000 production toplevels have tlids above 2147483647 (all for some reason being a number like 1587041099384 but 156, 157, or 158.

Also, int_of_string is crashing so we shouldn't use it anyway. But it looks like it will make sense to switch to 64 bit ints here if possible anyway :(

@pbiggar pbiggar removed the request for review from StachuDotNet July 10, 2022 02:16
@pbiggar
Copy link
Member Author

pbiggar commented Jul 10, 2022

Wait, if we json parse 64-bit ints, what makes it to rescript? 64bit ints or strings, or floats with missing precision?

Does it matter whether they're a field in an array (as in the json representation of variants) or as ids in a dictionary).

@pbiggar
Copy link
Member Author

pbiggar commented Jul 10, 2022

Parsing 64 bit ints gets us floats with a loss of precision.

I wonder if we're safe to just convert the tlids in the DB to a lower precision (except for users editing). I don't think tlids are used except to tell the apiserver what db/handler is being deleted.

@pbiggar
Copy link
Member Author

pbiggar commented Jul 10, 2022

Parsing 64 bit ints gets us floats with a loss of precision.

I wonder if we're safe to just convert the tlids in the DB to a lower precision (except for users editing). I don't think tlids are used except to tell the apiserver what db/handler is being deleted.

They are binary serialized into ops. Ugh.

@pbiggar
Copy link
Member Author

pbiggar commented Jul 10, 2022

Related: the expression 923483483489348934 saves properly but evaluates to 923483483489349000. This is due to loss of precision when encoding Dvals as int64 in json.

OCamlTypes has a string in EInteger and FPInteger, which is why programs stay valid using OCamlTypes (when sent back and forth from the apiserver). But dvals don't! Either when sent from a trace, or when sent back from Wasm.

But, I think if we encoded tlid and int64 as strings when they're too big, we could handle this ok.

@pbiggar pbiggar force-pushed the paul/pass-functions-to-client branch from 971e1ac to bb21b0e Compare July 10, 2022 23:48
@pbiggar
Copy link
Member Author

pbiggar commented Jul 11, 2022

OK, the plan here is to serialize int64s and uint64s (including ids and tlids using strings when they're over the max that can fix in an int. That should fix a bunch of small bugs. I'll do that in a separate PR, and then in this PR I'll switch the client to using uint64s (like the backend).

@pbiggar pbiggar mentioned this pull request Jul 11, 2022
@pbiggar pbiggar force-pushed the paul/pass-functions-to-client branch from bb21b0e to 795dcc6 Compare July 12, 2022 01:39
@pbiggar
Copy link
Member Author

pbiggar commented Jul 12, 2022

This addresses all the issues by creating a UInt64 type (backed by a combination of BigInt for printing/serialization as strings, and Int64 for use in-memory), and then using UInt64 for the TLID/ID types.

Numbers over 2^53 are serialized as strings because JSON has a 53bit limit for integers being represented accurately as themselves.

@pbiggar pbiggar force-pushed the paul/pass-functions-to-client branch 2 times, most recently from 72f4b2c to 09a04b5 Compare July 12, 2022 18:40
@pbiggar pbiggar force-pushed the paul/pass-functions-to-client branch from 09a04b5 to ca198a4 Compare July 12, 2022 18:51
@pbiggar
Copy link
Member Author

pbiggar commented Jul 12, 2022

Summary and description

This has gone on a while, so I'm going to restate it all.

Originally, this started as an attempt to replace client types with the types used by the server. And indeed, this adds those types. However, after I made a few subsequent changes, it got big enough that now it's time to stop and get this merged.

The major change was changing the type of TLIDs (and IDs). The backend used uint64s, and the client used strings. Using strings led to some pretty bad situations: the backend used uint64s and so while it was mostly fine to use uint64-encoded-as-strings, in practice we snuck in lots of things that weren't uint64s.

In some cases, those were test items like ID("fake-ac-data1"). But in one place we actually put both traceIDs and IDs into the same data by wrapping the traceID in the ID constructor and pretending everything was OK. That has been fixed.

The attempt to switch to a real uint64 may have been too much work for the value we got out of it, but it's done now. I initially tried to store it as a bs-zarith U.UInt64.t type, but this was incredibly slow in practice. I also tried to store it as a JS BigInt, but using a JS type broke the comparators used in TLID.Dict (and lots of other things probably, there were thousands of broken tests).

As a result, it made more sense to come back to using uint64s-in-theory but int64s-in-practice. That is, I'm using an int64 as a uint64. Since int64 is a known type in Rescript, comparators work. Since it's simply two ints together, the performance is good.

Both uint64 and int64 use 64 bits. There wasn't tooling to just cast them between each other, so I explicitly made the translation (using the negative int64 range to hold the values above the positive int64 range).

The complexity here was converting negative int64s to positive uint64s, especially during stringification (needed for encoders and decoders). For that, I used JS BigInts - this allowed me more range, and also to use their toString functions to avoid needing to create my own.

Along the way I tried a bunch of other things, but they all came out more complex in one way: encoding it as a tuple of 2 int32, manually scaling int64s, my own version of toString, using bs-zarith, etc. All things I came up with were more complex. In retrospect, leaving them as strings and just making the type opaque might have worked, but I'd probably have missed some edge cases so this isn't a terrible outcome.

This PR also:

  • renames dval encode to ocamlDval
  • add RuntimeTypes to the client
  • recompiles on .resi file changes
  • adds tuple encoding/serialization tests
  • refactors client-side encoding/decoding tests (and TestRpcs)

@pbiggar pbiggar marked this pull request as ready for review July 12, 2022 20:58
| Rail
| NoRail

and expr =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file include some Tuple types? it doesn't right now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should. Wrote this before they merged, nice catch.

@StachuDotNet StachuDotNet self-requested a review July 13, 2022 15:15
Copy link
Member

@StachuDotNet StachuDotNet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels a bit offputting to use negative values to represent those out of bounds, but I can't think of anything better. Looks good!

@pbiggar
Copy link
Member Author

pbiggar commented Jul 13, 2022

it feels a bit offputting to use negative values to represent those out of bounds, but I can't think of anything better.

Yeah. The other option is to keep the type opaque and use strings. Any preferences?

@StachuDotNet
Copy link
Member

StachuDotNet commented Jul 13, 2022

it feels a bit offputting to use negative values to represent those out of bounds, but I can't think of anything better.

Yeah. The other option is to keep the type opaque and use strings. Any preferences?

Not really, they're both kinda bad :) So might as well go with the thing we've already coded.

@pbiggar pbiggar force-pushed the paul/pass-functions-to-client branch from 10035f5 to 4e80ac3 Compare July 13, 2022 22:56
@pbiggar pbiggar merged commit 30a6c23 into main Jul 13, 2022
@pbiggar pbiggar deleted the paul/pass-functions-to-client branch July 13, 2022 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants