-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quo vadis blob
#966
Comments
BTW, I my preference is 3. If we also pack arrays of 0-bit-sized elements we may even use this to gracefully deal with an IDL bomb of a large |
@eftychis @enzoh @matthewhammer You folks asked for
Since we don't have |
If only for debugging, we can return |
More elimination forms are defined in #1100. What intro forms do we need? From |
|
Doesn't option 3 need type passsing? What's the representation for [T] and what happens when you instantiate T at Word8 etc? @nomeata |
Option 3 would be different heap representation ( |
That could work, but not sure it fits our KISS (Keep it slow stupid) motto. |
I think we are fine: It’s also slow in its own way, due to the branch on the tag |
Should we revise that discussion? It irks me that Oh, but there is
Maybe we need quality and comparison on arrays? :-) |
Now that we have |
Yes, we have
in
Binary data of all sorts, in compact form? But yes, we should just make |
Nice! I didn't realize this implication when you add
I think |
Andreas points out that this can be |
@rossberg, as the language design lead, can you make a judgment call? |
If it wasn't for the Candid mapping, then the conservative right now would still be to keep Blob its own type, separate from arrays. Unifying them later would only make more programs type-check. However, if we map the Blob type to Candid blob (as seems logical), then we won't be able to do that change anymore without breaking existing canisters. Yet, the simplest and most efficient solution is to make Blob its own type. Not the absolute purism, but I began to think that it's the right pragmatic choice. So unless anybody objects, I suggest leaving it at that. FWIW, somewhat tangential, I started growing skeptical of having regular equality extend to types like Text or Blob, where it's not constant time. I guess it's just too damn convenient for Text, but is it ever advisable for blobs? Edit: Fixed some confusing typos. |
Why? In Candid,
Works for me. This means we have to add indexing? What else?
Not sure if it makes sense to distinguish the two. I don’t think that |
Ah right, thinko. So even more reason to leave it as is.
We could add the same pseudo methods as for arrays (get, keys), but that does not seem urgent.
While there may be long instances of both, equality on strings is typically applied only to short fragments, like individual words or names. There is no obvious analogue for blobs. |
We already have
Sure there are: comparing hashes would be a pretty common example. Sequences of bytes and sequences of unicode characters – seems quite similar to me. |
Yes, but that corresponds to vals, not keys. But as I said, it's not urgent.
You mean hashes that are represented as blobs? Fair enough, though it would be more natural to represent these as big numbers. |
@nomeata, do you still have appetite for a non-uniform array representation, or should we close this? |
I still have appetite at having a more coherent plan for |
Another direction would be to think of |
until we have the full story for where we go with `Blob` (#966), we should at least provide sufficient ways to create blobs, as requested, for example, in dfinity/motoko-base#242. This adds ``` func blobToArray(b : Blob) : [Nat8] func blobToArrayMut(b : Blob) : [var Nat8] func arrayToBlob(a : [Nat8]) : Blob func arrayMutToBlob(a : [var Nat8]) : Blob ``` to `Prim`, and I plan to expose them in base as `Blob.ofArray(Mut)` and `Blob.toArray(Mut)`. Performance is bad right now, with all the copying (and bounds checks in `Arr.idx`…), but that gives us some low hanging fruit for later.
until we have the full story for where we go with `Blob` (#966), we should at least provide sufficient ways to create blobs, as requested, for example, in dfinity/motoko-base#242. This adds ``` func blobToArray(b : Blob) : [Nat8] func blobToArrayMut(b : Blob) : [var Nat8] func arrayToBlob(a : [Nat8]) : Blob func arrayMutToBlob(a : [var Nat8]) : Blob ``` to `Prim`, and I plan to expose them in base as `Blob.ofArray(Mut)` and `Blob.toArray(Mut)`. Performance is bad right now, with all the copying (and bounds checks in `Arr.idx`…), but that gives us some low hanging fruit for later.
I'd still love concatenation. Current use case from the IC Spec: Representation-independent hashing of structured data Structured data, such as (recursive) maps, are authenticated by signing a representation-independent hash of the data. This hash is computed as follows (using SHA256 in the steps below): For each field that is present in the map (i.e. omitted optional fields are indeed omitted): concatenate the hash of the field's name (in ascii-encoding, without terminal \x00) and the hash of the value (with the encoding specified below). Concatenate the sorted elements, and hash the result. The resulting hash of 256 bits (32 bytes) is the representation-independent hash. The following encodings of field values as blobs are used Binary blobs (canister_id, arg, nonce, module) are used as-is. Strings (request_type, method_name) are encoded in UTF-8, without a terminal \x00. Natural numbers (compute_allocation, memory_allocation, ingress_expiry) are encoded using the shortest form Unsigned LEB128 encoding. For example, 0 should be encoded as a single zero byte [0x00] and 624485 should be encoded as byte sequence [0xE5, 0x8E, 0x26]. Arrays (paths) are encoded as the concatenation of the hashes of the encodings of the array elements. Maps (sender_delegation) are encoded by recursively computing the representation-independent hash. |
@skilesare, you never want concatenation for any use case where you have to do it repeatedly, since that results in quadratic cost. I think what you'd want is a Buffer-like thing to construct Blobs. |
In #963 I am introducing
Blob
type. This is initially just scaffolding to not have #789 blocked on the question of whether we want a (more) abstractEntity
/Principal
/Caller
/Canister
/User
type, and be able to write some code.Presumably, the caller will turn into some kind of abstract type, possibly with conversions to and from
Text
and/orBlob
.But what about
Blob
itself? I see a few directions this can take:Blob
as a primitive type, just likeText
, and add operations (various conversion to and from text, maybe even indexing) as needed.type Blob = [Word8]
, and live with the size penalty due to the uniform representation.type Blob = [Word8]
, but change the representation to pack arrays of 8, 16, 32 and 64bit sized types upon construction, with separate heap object types for these. All array access would dynamically dispatch as needed. We’d be trading execution speed for space usage (but may be a good deal on our platform).With 2 and 3 there is the problem that we would lose the equality and comparison operators, which are really kinda useful.
Somewhat independent there is the question if we want blob literal syntax.
The text was updated successfully, but these errors were encountered: