Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making future types renameable #337

Open
nomeata opened this issue Apr 9, 2022 · 8 comments
Open

Making future types renameable #337

nomeata opened this issue Apr 9, 2022 · 8 comments

Comments

@nomeata
Copy link
Collaborator

nomeata commented Apr 9, 2022

Candid has support for “future types”, meaning that new versions of Candid can introduce new types in a way that old clients can safely skip over them. Here the assumption is that old clients never care about decoding such data, but due to subtyping may have to skip over them in extra record fields, or in optional types. Therefore, the Candid binary format includes the necessary information:

These measures allow the serialisation format to be extended with new types in the future, as long as their representation and the representation of the corresponding values include a length prefix matching the above scheme, and thereby allowing an older deserialiser not understanding them to skip over them. The subtyping rules ensure that upgradability is maintained in this situation, i.e., an old deserialiser has no need to understand the encoded data.

It seems, however, that we may need old clients to do more than just skip data, if we want to implement generic data or closures (#245, #291, #292). This also comes up with the IC’s HTTP Gateway Protocol, that is specified to pass a generic “token” of any Candid type back to the backend canister.

In these discussion we noticed that currently, such generic, opaque use of Candid values is not possible due to Future Types: If the generic value is such a future type, an old client only knows the size of the value, but doesn’t know the structure of the type description, and thus can’t copy it into a new message.

Luckily, we don’t have any future types yet, and we can refine the spec to make that possible:

  • We change

    Any such opcode is followed by an LEB128-encoded count, and then a number of bytes corresponding to this count.

    to something to the effect of

    Any such opcode is followed by an LEB128-encoded count, and then a number of bytes corresponding to this count. These bytes start with a LEB128-encoded count, followed by that many SLEB-128 encoded type indices (negative for primitive types, positive for type table references). The remaining data in this type description may not contain any type table indices.

  • We also say somewhere that the semantics of a future type must not be affected by renumbering the type table entries, or merging identical table entries.

This way, a client can generically store a value together with its (portion of the) type table, and insert the value anywhere in another Candid value, correctly merging the type tables.

And it’s even backward compatible!

WDYT, @rossberg?

@nomeata nomeata changed the title Makeing future types renameable Making future types renameable Apr 9, 2022
@crusso
Copy link
Contributor

crusso commented Apr 9, 2022

Sounds intriguing. Are these indices given meaning by the enclosing encoding? And does the host then need to copy those entries from the input to the output encoding, extending or identifying with existing output types and re-indexing the prefix of the generic value as appropriate? And why are the primitive types necessary - their interpretation is fixed anyway, isn't it? Any two encoded value will agree on the meaning of negative indices, but may differ in the meaning (and possible range) of positive indices, IMU.

@nomeata
Copy link
Collaborator Author

nomeata commented Apr 9, 2022

Sounds intriguing. Are these indices given meaning by the enclosing encoding?

Not sure what you mean. They are type constructor arguments, just like the argument to vec etc.

And does the host then need to copy those entries from the input to the output encoding, extending or identifying with existing output types and re-indexing the prefix of the generic value as appropriate?

Exactly. Allowing that is the goal.

And why are the primitive types necessary - their interpretation is fixed anyway, isn't it?

For uniformity - these entries are type constructor arguments, and thus can be primitive or non-primitive.

@chenyan-dfinity
Copy link
Contributor

An alternative is to make the future type self-describing: the future value contains the a small type table itself. Then we don't need to change the spec.

@nomeata
Copy link
Collaborator Author

nomeata commented Apr 11, 2022

Hmm, that would be relatively expensive if multiple future type nodes in the node graph refer to common, possible large type structures.

And it wouldn't gain us anything: declaring that is already changing the spec. Existing code needs not be adjusted in either case, and code that aims to handle Candid data generically has to to know how to handle future types either way.

@chenyan-dfinity
Copy link
Contributor

We don't need to declare that in the spec. It's defined in the specific future opcode. We only include a type table if the opcode is a type constructor. For the old client, it only needs to know the total length, and pass the whole blob to the host.

My concern with the current proposal is that we are imposing some structure for a future type we don't know about. Even for type constructor, it doesn't only take type index. For example, a record constructor takes both field name and type. It can be made to work with the current format, but it's a bit cumbersome.

@nomeata
Copy link
Collaborator Author

nomeata commented Apr 11, 2022

It can be made to work with the current format, but it's a bit cumbersome.

Right, that is the idea and the goal. There is plenty of space after the list of pointers.

Essentially, we separate separate the pointers from the non-pointer data, to make it traceable without knowing the meaning of the non-pointer data. Just like with GC’ed heap structures.

I don’t think it’s too cumbersome, and certainly the cumbersomeness is worth not losing the sharing of type nodes.

@crusso
Copy link
Contributor

crusso commented Apr 11, 2022

Don't we need to know the variance of the type parameters too, so that we can perform the subtype check appropriately?
Or do you apply the same argument as in our chat, that the local type in a subtype check will never be a future type so there is no need to descend into future types in the subtype check.

However, I can imagine future scenarios where we do want to check subtyping between two foreign type tables that may be contain future types, even if the hosting actor doesn't know about them.

For example, suppose we embedded the actor type as a candid binary blob within a wasm module (rather than the textual format) - say typtbl paired with service type index.
Then the management canister could check compatibility of the blobs using a candid subtype check, even if they both involved future types.

@nomeata
Copy link
Collaborator Author

nomeata commented Apr 12, 2022

I didn't think about subtype checking yet. Good point. It seems there is a whole ladder of “generic features” (just skipping, as we have now; opaque copying, as proposed here; subtyping…).

I'm not sure there is any hope in doing subtyping checks with yet unknown future types, though. There could be arbitrary rules for any concrete future type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants