Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Idea: Generic data #245

Open
rossberg opened this issue Jun 14, 2021 · 11 comments
Open

Feature Idea: Generic data #245

rossberg opened this issue Jun 14, 2021 · 11 comments
Labels
enhancement New feature or request

Comments

@rossberg
Copy link
Contributor

Generic data

Motivation

Candid cannot currently describe services or functions with a type-generic interface. That is a severe limitation that both we and users keep bumping into. For example, it is not possible to describe the interface of a generic key/value store, where the value could be any form of data (either homogeneously or heterogeneously).

Below I collect some thoughts from previous discussions for possible ways to lift this restriction.

Idea 0: Parameterised type definitions

Summary: Allow type parameters in type definitions.

Examples:

type list(t) = variant { nil; cons : record {t; list(t)} }

type kv_store(t) : {
  put : (key : text, val : t) -> ();
  get : (key : text) -> (val : opt t) query;
  fill : (list : vec record {key : text; val : t}) -> ();
}

Since types are structural, this essentially is a simple macro mechanism that avoids having to repeat the same type many times with different instantiations. It's a prerequisite to make generic functions practical.

However, this requires additional checks to make sure that type recursion is well-founded and not diverging. If not restricted somehow, that is a significant extra burden on decoders (unless this is a text-only convenience, and types are duplicated in the serialised value, but that would be wasteful).

Idea 1: Dynamic data type

Summary: Extend Candid with a new type dyn, which can carry any type of data. It is represented as a nested, self-contained Candid blob, complete with its own type description. (Type-theoretically, dyn = exists X. X)

This probably is the simplest approach. It essentially assigns a special type to blobs that contain nested Candid encodings.

Example:

service gen_kv_store : {
  put : (key : text, val : dyn) -> ();
  get : (key : text) -> (val : opt dyn) query;
  fill : (list : vec record {key : text; val : dyn}) -> ();
}

Pros: very simple
Cons:

  • does not convey much intention (e.g., are all vals expected to have homogeneous type?)
  • might be inefficient by repeating the same type many times (e.g., if the dyn vals in fill all share the same type, but this type is rather large; they cannot share any type defs with the outer blob either)

Idea 2: Polymorphism encoded as dyn

Summary: Add syntactic sugar on top of dyn for expressing polymorphic functions with universally quantifed parameters. A value of parameter type is represented like dyn, but with the assumption that its internal type matches the instantiation of the type parameter.

Example:

service gen_kv_store : {
  put : (type t) (key : text, val : t) -> ();
  get : (type t) (key : text) -> (val : opt t) query;
  fill : (type t) (list : vec record {key : text; val : t}) -> ();
}

Parameter types can be referred to by other parameters.
They are not represented in the serialised argument tuple.

Pros: still relatively simple; conveys intention
Cons: maintains potential representation inefficiencies as dyn

Idea 3: Polymorphism with separately serialised types

Summary: Introduce a type type that actually is backed by a serialised type. Values of parameter type are represented as serialised Candid blobs, but without repeating the type information -- that is taken from the separately serialised type.

Example (as before):

service kv_store : {
  put : (type t) (key : text, val : t) -> ();
  get : (type t) (key : text) -> (val : opt t) query;
  fill : (type t) (list : vec record {key : text; val : t}) -> ();
}

Parameter types can be referred to by other parameters.
They are represented by a serialised type (without a value) as part of the serialised argument tuple.

Pros: conveys intention; avoids unnecessary type repetition in the wire format
Cons: somewhat more complicated to implement (not simply a nested decoder)

Idea 4: Add existential polymorphism

Both idea 2 and 3 can be combined with 1, making type dyn available for non-parametric use cases or to encode existentially quantified use cases.

Instead (or additionally), existential types could be introduced more explicitly. Two possibilities are:
(a) Also allow 'type results' right of the array: get : (key : text) -> (type t) (val : opt t) query
(b) Allow type fields in records: get : (key : text) -> opt record {type t; val : opt t} query

In the latter case, dyn becomes expressible as

type dyn = record {type t; val : t}

Thoughts?

@nomeata
Copy link
Collaborator

nomeata commented Jun 15, 2021

(unless this is a text-only convenience, and types are duplicated in the serialised value, but that would be wasteful).

I won’t shy away from starting with interface-only features that don’t affect the serialized format. It’s not more wasteful than what we would have now, where people have to write it out (or what happens if you currently export Candid from Motoko that has parametrized type definitions). And decoder simplicity is a useful goal…

Not sure how much dyn offers over blob. And it seems a “semantic restriction” that is out of scope for Candid (which at least claims to not include semantic restrictions that do not affect representation). OTOH, it’s nice and simple and may already address lots of use cases, or at least take out some of the pressure here. So worth considering.

For polymorphism encoded as dyn, can you clarify what kind of type checking would have to be done on deserializtion? Would the decoder have to check that these are valid Candid blobs, and all of the same type? Would you require identical type tables, or just equivalent one? Or maybe even allow subtyping? In that case, what is the “expected type” that our coercion function requires?

For that reason, “Polymorphism with separately serialised types” may actually be easier to implement! But still, since decoding/coercion requires an expected type, we need more detail here. Maybe the detail is “only decoding happens, but no coerction, given that thes values are only used in parametric ways anyways”?

In terms of bang-for-the-complexity-buck, the dyn is most appealing :-D

@matthewhammer
Copy link
Contributor

matthewhammer commented Jun 15, 2021

idea 4 is what I often want, especially for this project (see put operation's type, which uses a Candid AST to do the dyn idea):

Specifically, I am using a Motoko datatype to encode the fully-parsed AST of the Candid value, from which I can recover the Candid type, when needed.

I believe it would be more systematic and more efficient if this kind of thing were supported by Candid directly, somehow.

In the meantime, I created a CLI tool in Rust that pipes the AST of the existing candid parser over the wire.

The value of this AST approach (or dyn) is that an archiving service (like Candid Spaces wants to be) can accept data generically, and store it generically, as the ecosystem of canisters around it evolves.

The downside is that I need more tools to use it (The caniput tool converts to and from the AST form). If we had dyn as a candid type that a service can accept and produce, it would obviate the need for special tools like caniput, IIUC. That would be nicer, since it would fold that support into the other candid tools, and make it standardized.

@crusso
Copy link
Contributor

crusso commented Jun 15, 2021

(a) Also allow 'type results' right of the array: get : (key : text) -> (type t) (val : opt t) query
I guess that should be arrow, not array.

I'm a bit surprised by the key-value store interfaces above. The one I primarily want is a generic class that I can instantiate to specific K, V types and then use monomorphically (a homogeneous store):

service gen_kv_store<K, V>: () -> {
  put : (key : K, val : V) -> ();
  get : (key : K) -> (val : opt V) query;
  fill : (list : vec record {key : K; val : V}) -> ();
}

This would require generic services, but no generic methods per se, though we might want those too in future. Would that actually be simpler to implement?

@matthewhammer
Copy link
Contributor

matthewhammer commented Jun 15, 2021

The one I primarily want is a generic class that I can instantiate to specific K, V types and then use monomorphically (a homogeneous store)

I wonder what happens when the canister using this key-value storage class as a client undergoes an upgrade, and its definition of K or V changes to one that is not related as a subtype?

I think that's a common situation for people building apps today, as everyone is starting from a clean page and things they build change quickly.

In the meantime, only the last option (option 4) really gives a viable path for people that:

  1. want a DB-like canister to hold their stable storage data
  2. want to avoid constraints that come from stable storage types (K and V above) needing to "agree" across version changes

@rossberg
Copy link
Contributor Author

@nomeata:

Not sure how much dyn offers over blob. And it seems a “semantic restriction” that is out of scope for Candid (which at least claims to not include semantic restrictions that do not affect representation). OTOH, it’s nice and simple and may already address lots of use cases, or at least take out some of the pressure here. So worth considering.

It's a bit more than just a semantic restriction. For one, it documents a representation difference for blobs themselves: serialising a blob as type blob has a different representation than serialising it as type dyn.

Moreover, the knowledge about this type enables serialisers to handle these cases directly, i.e., without the user having to perform the nested serialisations separately. (You could also model this as an implicit conversion between blob and any other type, but that would be a hack that doesn't work in all cases, see first point.)

For polymorphism encoded as dyn, can you clarify what kind of type checking would have to be done on deserializtion? Would the decoder have to check that these are valid Candid blobs, and all of the same type? Would you require identical type tables, or just equivalent one? Or maybe even allow subtyping? In that case, what is the “expected type” that our coercion function requires?

Good question. On the caller side you'll always have a concrete type instantiation to use for deserialisation (of results). So the deserialiser would just work recursively and thereby naturally check that each dyn's internal type match the expected target type at each occurrence (naturally consider subtyping). I don't think you need any extra checking there.

The callee doesn't know the instantiation, and is assumed to behave parametrically. In that case the the serialiser should probably still check that it is a valid Candid blob, though that produces extra cost. I think it's still worth requiring? By the parametricity assumption, I don't think it should have to check anything about the relations of the types inside multiple dyns.

For that reason, “Polymorphism with separately serialised types” may actually be easier to implement! But still, since decoding/coercion requires an expected type, we need more detail here.

Maybe it's easier, thought that would be surprising. Honestly, I don't have a good intuition for that right now.

Maybe the detail is “only decoding happens, but no coerction, given that thes values are only used in parametric ways anyways”?

Yes, that would be my take exactly.

In terms of bang-for-the-complexity-buck, the dyn is most appealing :-D

Agreed, and would potentially be useful independent of more explicit quantification.

Btw, I just reused the name dyn from literature. But here its use seems closer to type any as in gradual typing. So perhaps we should find a better name. Though I'd obviously avoid any itself. The only thing that comes to my mind right now is gen.

@matthewhammer:

Specifically, I am using a Motoko datatype to encode the fully-parsed AST of the Candid value, from which I can recover the Candid type, when needed.

I see what you mean. But this is effectively an alternative representation for the dyn type inside Motoko (one that you can pattern-match over). So instead of first having to convert one type to the other, wouldn't it be nicer to provide the Dyn type in Motoko and add ways to reflect on it directly?

I believe it would be more systematic and more efficient if this kind of thing were supported by Candid directly, somehow.

Hm, not sure I follow. In what way would it show up in Candid definitions?

@crusso:

I'm a bit surprised by the key-value store interfaces above. The one I primarily want is a generic class that I can instantiate to specific K, V types and then use monomorphically (a homogeneous store):

In an ideal world I totally agree with you, but in practice I assume that many use cases will want heterogeneous storage canisters in which one can stuff what ever. The problem with a homogeneous store on this level is that you'd need to spawn a separate canister for every distinct type. That seems cumbersome and I assume doesn't really scale for most apps (especially when types are compositional and their set thus infinite), so they will instead opt for defining some kind of universal type anyway.

But I admit that this might not be the best example to demonstrate function polymorphism. I just picked it because it was simple and common. :)

@matthewhammer
Copy link
Contributor

matthewhammer commented Jun 23, 2021

But this is effectively an alternative representation for the dyn type inside Motoko (one that you can pattern-match over). So instead of first having to convert one type to the other, wouldn't it be nicer to provide the Dyn type in Motoko and add ways to reflect on it directly?

Absolutely.

Also, what is happening now is not ideal because it consists of sending an inefficient representation over the wire.

Hm, not sure I follow. In what way would it show up in Candid definitions?

What I mean is that Candid and Motoko should both permit sub-values that are self-describing, and permit pattern matching. (That is, I want the abilities that come from having the fully type-annotated AST of the candid value, but with some more efficiency than actually creating this AST and sending it over the wire, as I do now in Candid Spaces.)

I've been wanting Candid and Motoko to share a type for this kind of data (call it Dyn), as in the first and last parts of your proposal (Ideas 1 and Idea 4, IIUC).

I do not want, however, for the the service to fix these types at any point in the signature of the methods (unlike ideas 2 and 3, IIUC). By not doing so, the service API remains fixed as the services that use it "evolve" their types in and out of the generic key-value store.

As you suggest above, I'd like to use Dyn in place of Blob when the service is also permitting some extra inspection of the data, because it has a structure that is self-describing and inspectable.

I often imagine a key-value store that also permits some searching for lists of matches. In these cases, I'd want to use Dyn in place of Blob for the "generic data" (a similar idea about using Candid to encode blob data was also advocated by @kritzcreek in our last meeting, IIRC):

/// Simple logic for pattern-matching generic candid values via search criteria
type Pattern = variant {
  hasField : Text ;
  hasNat : Nat ;
  hasText : Text ;
  and : vec Pattern ;
  or : vec Pattern
};

service {
  putKeyValue : (Dyn, Dyn) -> ?();
  getValue : Dyn -> ?Dyn query;
  
  createIndex : Pattern -> ?();
  getMatches : Pattern -> ?[(Dyn, Dyn)] query;
}

@crusso
Copy link
Contributor

crusso commented Jun 23, 2021

We should all probably take another look at this before proceeding....

https://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-47.pdf

@rossberg
Copy link
Contributor Author

rossberg commented Jun 24, 2021

Yeah, that's what I based the name dyn on. Note that that paper assumes a monomorphic type system, though. There is a follow-up by the same authors plus Remy that adds polymorphic types. Spoiler: it gets complicated very fast. No wonder nobody ever implemented it.

@matthewhammer
Copy link
Contributor

Spoiler: it gets complicated very fast. No wonder nobody ever implemented it.

I don't think we are discussing the same ideas, though there seems to be a lot of overlap, and thus, a lot of confusion when I try to advocate for my own.

Specifically, I am more than satisfied with a solution that reuses the existing (value-level) pattern-matching of Motoko, via switch, and eschews this typecase feature entirely. So, for me, most of the paper is a nice exercise for some academics that is causing a lot of distraction when we discuss these ideas in our (very distinct) context.

Unlike the DEC TR with all of the famous PL people, I am advocating for (first order) candid values as monomorphic syntax trees, not a hybrid dynamic-static type system. In particular, typecase is not interesting to me. I would not use it. Instead, I'd just write functions that parse ASTs into more specific datatypes, or fail with a meaningful error that is domain-specific.

I understand that the paper assumes that typecase is the "principled" way to integrate that pattern into the language, but again, I don't want that.

To be clear, there is nothing exotic about what I want: Imagine an SML compiler's orientation toward the abstract syntax of SML --- That's precisely the relationship that I want between my programs and the candid ASTs that I want to have language (or library) support for producing and consuming (and possibly, for efficiently representing on the wire). The type for this AST may just as well be an ordinary Motoko type (from an appropriate compiler-provided def, like our other Prims).

We do not need to extend the Motoko language to support this, as Candid Spaces is not doing this and is accomplishing what it wants without special language support.

What is missing for Candid Spaces and similar projects is a missing feature in the serialization/deserialization layer that connects Candid and Motoko. In particular, there should be a way of saying that, instead of deserializing a candid-typed sub-value into a specific Motoko type, it should go into the AST monotype within Motoko for representing "Candid as syntax data".

In my proposal, there is no "type tag" here beyond perhaps, another Motoko value that says what Candid type the Candid AST would synthesize, if we were to walk over it and synthesize a type for it. For this to be possible, the Motoko value must already know the Candid type that it represents (to distinguish number types). But this Motoko value is not a Motoko type; it's yet another Candid (type) AST, represented in Motoko as a value, with ordinary pattern matching (switch).

I considered adding this AST for candid types to Candid Spaces, but it wasn't useful (yet) for anything, so I haven't. I could though, and it wouldn't be a type tag in the sense of that paper that introduces typecase.

In sum, I think the DEC paper is going off in direction A, and we want direction B, and there are resemblances in what we do along these paths, but they can be very different in terms of complexity for implementors and users.

I think I'm advocating for something that is both simple and useful. They are scratching some academic itch that is neither simple, nor (arguably) useful in that it was never used.

For instance, I see that this paper tries to go very far with the types that they make dynamic (including unrestricted arrows, quantifiers, etc). I'd even say that supporting higher order types within this feature is not obviously useful, and may just be a large headache for a low value. Most of what we want, 99% of the time, is just algebraic data. Hence, everything that paper says about higher order types could be made irrelevant, along with what it says about typecase, at least in the short term.

ericswanson-dfinity added a commit to dfinity/agent-rs that referenced this issue Jul 2, 2021
Candid 0.7.1 does not support decoding a Record into an IDLValue.  We were using this in an unsupported way with Candid 0.6.x, but it doesn't work anymore.

So, this commit specifies the exact type of the streaming http response token.  Unfortunately, this means that we will no longer support arbitrary token types, but this is the best that we can do until Candid supports generic types: dfinity/candid#245
ericswanson-dfinity added a commit to dfinity/agent-rs that referenced this issue Jul 6, 2021
* fix: icx-proxy panic when decoding a response to http_request

Candid 0.7.1 does not support decoding a Record into an IDLValue.  We were using this in an unsupported way with Candid 0.6.x, but it doesn't work anymore.

So, this commit specifies the exact type of the streaming http response token.  Unfortunately, this means that we will no longer support arbitrary token types, but this is the best that we can do until Candid supports generic types: dfinity/candid#245
@matthewhammer
Copy link
Contributor

matthewhammer commented Nov 3, 2021

Going back to the top, the issue's original motivation is:

Candid cannot currently describe services or functions with a type-generic interface

But as reviewers to #292 are saying, this idea has some unresolved issues around the second class nature of argument sequences and response sequences within the Candid type system.

That is, at least in some cases, we want dynamic to be an entire argument sequence or response, not merely one component of them.

But how do we generalize a service's interface if we cannot name or abstract over an entire argument sequence, or response sequence, at the level of Candid types? It seems like we need an additional kind in the Candid type system, for argument sequences, before we can generalize that idea and abstract over it further.

Likewise, Motoko needs some way to intro/elim these sequences, since otherwise, there would be no clear way to define the lifecycle of a value sequence of type dyns (the sequence version) -- unless we overload records in Motoko to play this role, which could also be reasonable and probably preferable. If we do that, though, it seems like we need to distinguish the record as a dyn record versus the record as a dyns argument sequence, each a dyn value. As I recall, the two wire representations are quite different.

@borovan
Copy link

borovan commented Apr 14, 2022

Dragginz team would much appreciate this!

https://gist.github.com/borovan/697dc7c58fccc5bc85f15499e732898b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants