Quo vadis `blob` #966

nomeata · 2019-12-02T16:51:42Z

In #963 I am introducing Blob type. This is initially just scaffolding to not have #789 blocked on the question of whether we want a (more) abstract Entity/Principal/Caller/Canister/User type, and be able to write some code.

Presumably, the caller will turn into some kind of abstract type, possibly with conversions to and from Text and/or Blob.

But what about Blob itself? I see a few directions this can take:

We keep Blob as a primitive type, just like Text, and add operations (various conversion to and from text, maybe even indexing) as needed.
We define type Blob = [Word8], and live with the size penalty due to the uniform representation.
We define type Blob = [Word8], but change the representation to pack arrays of 8, 16, 32 and 64bit sized types upon construction, with separate heap object types for these. All array access would dynamically dispatch as needed. We’d be trading execution speed for space usage (but may be a good deal on our platform).

With 2 and 3 there is the problem that we would lose the equality and comparison operators, which are really kinda useful.

Somewhat independent there is the question if we want blob literal syntax.

The text was updated successfully, but these errors were encountered:

nomeata · 2019-12-11T15:49:37Z

BTW, I my preference is 3.

If we also pack arrays of 0-bit-sized elements we may even use this to gracefully deal with an IDL bomb of a large vec null or vec any array.

ggreif · 2020-01-09T14:22:12Z

@eftychis @enzoh @matthewhammer You folks asked for Blob inspection routines. Here is my current plan on providing you functionality. Please chime in if you need something that is not implementable in user space using the below.

(short term, implement crc32 hashing on blobs #1089 – done) hashBlob : Blob -> Word32, a standard CRC32 digest
(middle term, Iterate over Blobs #1100 – done) .size() and .bytes() accessors for Blob, similar to Text.

Since we don't have Blob literals, I am hesitant to include Blob into debug_show, but I am open for arguments (with formatting suggestion).

chenyan-dfinity · 2020-01-09T19:02:05Z

If only for debugging, we can return Blob and inspect the IDL bytes.

ggreif · 2020-01-12T02:09:30Z

More elimination forms are defined in #1100. What intro forms do we need? From [Word8] and [var Word8] perhaps? Text? Probably not.

nomeata · 2020-01-12T15:37:35Z

[Word8] is the natural one

crusso · 2020-01-13T14:10:05Z

Doesn't option 3 need type passsing? What's the representation for [T] and what happens when you instantiate T at Word8 etc? @nomeata

nomeata · 2020-01-13T14:13:31Z

Option 3 would be different heap representation (ArrayBoxed, ArrayWord8, ArrayWord16, ArrayWord32, ArrayWord64), with different heap object tags, and a dynamic dispatch on every array operation (no matter what the static type). Not quite type passing.

crusso · 2020-01-13T14:22:08Z

That could work, but not sure it fits our KISS (Keep it slow stupid) motto.

nomeata · 2020-01-13T14:34:36Z

I think we are fine: It’s also slow in its own way, due to the branch on the tag

nomeata · 2020-04-16T09:59:09Z

Should we revise that discussion? It irks me that Blob doesn’t round-trip through Candid: We export Blob as [nat8], but import [nat8] as [Nat8], which is quite expensive.

Oh, but there is

With 2 and 3 there is the problem that we would lose the equality and comparison operators, which are really kinda useful.

Maybe we need quality and comparison on arrays? :-)

chenyan-dfinity · 2020-04-16T17:37:26Z

Now that we have Principal, what's the real use case for Blob? I think we will need an intro form for Principal. This makes testing easy, and we can already round-trip Principal.

nomeata · 2020-04-16T21:26:16Z

Yes, we have

func blobOfPrincipal(id : Principal) : Blob
func principalOfActor(act : actor {}) : Principal

in Prim. So you can already turn a blob into a principal using principalOfActor(actor (textual_representation(blob)) if you implement the textual representation… so no reason to not add principalOfBlob directly.

what's the real use case for Blob?

Binary data of all sorts, in compact form? But yes, we should just make [Nat8] usable, right?

chenyan-dfinity · 2020-04-16T22:05:16Z

you can already turn a blob into a principal using principalOfActor(actor (textual_representation(blob))

Nice! I didn't realize this implication when you add principalOfActor :)

Binary data of all sorts

I think [Nat8] doesn't necessarily mean binary data, so we need a dedicated Blob type. But this argument feels nominal then.

nomeata · 2020-04-20T13:37:45Z

With 2 and 3 there is the problem that we would lose the equality and comparison operators, which are really kinda useful.

Maybe we need quality and comparison on arrays? :-)

Andreas points out that this can be Array.eq in the stdlib, so maybe not blocked on this.

nomeata · 2020-06-07T16:37:35Z

@rossberg, as the language design lead, can you make a judgment call?

rossberg · 2020-06-08T14:56:40Z

If it wasn't for the Candid mapping, then the conservative right now would still be to keep Blob its own type, separate from arrays. Unifying them later would only make more programs type-check.

However, if we map the Blob type to Candid blob (as seems logical), then we won't be able to do that change anymore without breaking existing canisters.

Yet, the simplest and most efficient solution is to make Blob its own type. Not the absolute purism, but I began to think that it's the right pragmatic choice. So unless anybody objects, I suggest leaving it at that.

FWIW, somewhat tangential, I started growing skeptical of having regular equality extend to types like Text or Blob, where it's not constant time. I guess it's just too damn convenient for Text, but is it ever advisable for blobs?

Edit: Fixed some confusing typos.

nomeata · 2020-06-08T19:48:08Z

However, if we map the Blob type to Candid blob (as seems logical), then we won't be able to do that change anymore without breaking existing canisters.

Why? In Candid, blob is just a shorthand for vec nat8. So we can still do that without breaking canisters.

So unless anybody objects, I suggest leaving it at that.

Works for me. This means we have to add indexing? What else?

I guess it's just too damn convenient for Text, but is it ever advisable for blobs?

Not sure if it makes sense to distinguish the two. I don’t think that Blobs are inherently larger than Texts, or something like that.

rossberg · 2020-06-09T06:19:37Z

Why? In Candid, blob is just a shorthand for vec nat8.

Ah right, thinko. So even more reason to leave it as is.

This means we have to add indexing? What else?

We could add the same pseudo methods as for arrays (get, keys), but that does not seem urgent.

I guess it's just too damn convenient for Text, but is it ever advisable for blobs?

Not sure if it makes sense to distinguish the two. I don’t think that Blobs are inherently larger than Texts, or something like that.

While there may be long instances of both, equality on strings is typically applied only to short fragments, like individual words or names. There is no obvious analogue for blobs.

nomeata · 2020-06-09T07:17:56Z

We could add the same pseudo methods as for arrays (get, keys), but that does not seem urgent.

We already have blob.bytes()

While there may be long instances of both, equality on strings is typically applied only to short fragments, like individual words or names. There is no obvious analogue for blobs.

Sure there are: comparing hashes would be a pretty common example.

Sequences of bytes and sequences of unicode characters – seems quite similar to me.

rossberg · 2020-06-09T07:47:27Z

We already have blob.bytes()

Yes, but that corresponds to vals, not keys. But as I said, it's not urgent.

Sure there are: comparing hashes would be a pretty common example.

You mean hashes that are represented as blobs? Fair enough, though it would be more natural to represent these as big numbers.

rossberg · 2020-10-27T09:18:49Z

@nomeata, do you still have appetite for a non-uniform array representation, or should we close this?

nomeata · 2020-10-27T09:49:37Z

I still have appetite at having a more coherent plan for Blob, and I wouldn’t mind the non-uniform array representation if that’s part of the solution (and since that would also benefit people who use [Nat64] etc.).

nomeata · 2020-10-27T09:55:30Z

Another direction would be to think of Blob less of a variant of [Nat8], but rather a variant of Text (with different “characters”), i.e. no indexing, maybe cheap concatentation, iterators. No strong convictions here at the moment.

until we have the full story for where we go with `Blob` (#966), we should at least provide sufficient ways to create blobs, as requested, for example, in dfinity/motoko-base#242. This adds ``` func blobToArray(b : Blob) : [Nat8] func blobToArrayMut(b : Blob) : [var Nat8] func arrayToBlob(a : [Nat8]) : Blob func arrayMutToBlob(a : [var Nat8]) : Blob ``` to `Prim`, and I plan to expose them in base as `Blob.ofArray(Mut)` and `Blob.toArray(Mut)`. Performance is bad right now, with all the copying (and bounds checks in `Arr.idx`…), but that gives us some low hanging fruit for later.

skilesare · 2022-10-25T21:22:33Z

I'd still love concatenation. Current use case from the IC Spec:

Representation-independent hashing of structured data

Structured data, such as (recursive) maps, are authenticated by signing a representation-independent hash of the data. This hash is computed as follows (using SHA256 in the steps below):

For each field that is present in the map (i.e. omitted optional fields are indeed omitted):

concatenate the hash of the field's name (in ascii-encoding, without terminal \x00) and the hash of the value (with the encoding specified below).
Sort these concatenations from low to high

Concatenate the sorted elements, and hash the result.

The resulting hash of 256 bits (32 bytes) is the representation-independent hash.

The following encodings of field values as blobs are used

Binary blobs (canister_id, arg, nonce, module) are used as-is.

Strings (request_type, method_name) are encoded in UTF-8, without a terminal \x00.

Natural numbers (compute_allocation, memory_allocation, ingress_expiry) are encoded using the shortest form Unsigned LEB128 encoding. For example, 0 should be encoded as a single zero byte [0x00] and 624485 should be encoded as byte sequence [0xE5, 0x8E, 0x26].

Arrays (paths) are encoded as the concatenation of the hashes of the encodings of the array elements.

Maps (sender_delegation) are encoded by recursively computing the representation-independent hash.

rossberg · 2022-11-02T11:41:10Z

@skilesare, you never want concatenation for any use case where you have to do it repeatedly, since that results in quadratic cost. I think what you'd want is a Buffer-like thing to construct Blobs.

nomeata changed the title ~~Quot valid blob~~ Quot vadis blob Dec 2, 2019

nomeata changed the title ~~Quot vadis blob~~ Quo vadis blob Dec 2, 2019

nomeata added the feature New feature or request label Dec 11, 2019

nomeata mentioned this issue Jan 7, 2020

feat(dfx/build): insert assets into canisters with frontend dfinity/sdk#286

Merged

nomeata mentioned this issue Jan 22, 2020

Rust-like include_str macro #892

Open

rossberg added library Base or other libraries language design Requires design work P2 medium priority, resolve within a couple of milestones labels Apr 23, 2020

ghost added P1 high priority, resolve before the next milestone and removed P2 medium priority, resolve within a couple of milestones labels May 5, 2020

chenyan-dfinity mentioned this issue May 13, 2020

Need Prim.principalToWord8Array. dfinity/motoko-base#21

Closed

nomeata mentioned this issue Jun 7, 2020

Motoko Source: Blob literals #1581

Closed

nomeata assigned rossberg Jun 7, 2020

nomeata mentioned this issue Jun 15, 2020

Candid→Motoko: Map blob to Blob? #1618

Closed

rossberg mentioned this issue Feb 2, 2021

Meta-issue: Performance improvements #2302

Open

11 tasks

nomeata mentioned this issue Feb 23, 2021

Prim: Expose raw access to certified data api #2374

Merged

nomeata mentioned this issue Mar 9, 2021

Conversions between Blob and Text #2406

Closed

nomeata mentioned this issue Apr 8, 2021

Prim: Conversion between Blob and [Nat8] #2480

Merged

nomeata mentioned this issue Dec 16, 2021

GC optimization: don't scan immutable arrays of scalars. #3011

Open

nomeata mentioned this issue May 5, 2023

feat: add blob get and slice prim functions #3965

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quo vadis `blob` #966

Quo vadis `blob` #966

nomeata commented Dec 2, 2019 •

edited

Loading

nomeata commented Dec 11, 2019

ggreif commented Jan 9, 2020 •

edited

Loading

chenyan-dfinity commented Jan 9, 2020

ggreif commented Jan 12, 2020

nomeata commented Jan 12, 2020

crusso commented Jan 13, 2020

nomeata commented Jan 13, 2020

crusso commented Jan 13, 2020 •

edited by ggreif

Loading

nomeata commented Jan 13, 2020

nomeata commented Apr 16, 2020

chenyan-dfinity commented Apr 16, 2020

nomeata commented Apr 16, 2020

chenyan-dfinity commented Apr 16, 2020 •

edited

Loading

nomeata commented Apr 20, 2020

nomeata commented Jun 7, 2020

rossberg commented Jun 8, 2020 •

edited

Loading

nomeata commented Jun 8, 2020

rossberg commented Jun 9, 2020

nomeata commented Jun 9, 2020

rossberg commented Jun 9, 2020

rossberg commented Oct 27, 2020

nomeata commented Oct 27, 2020

nomeata commented Oct 27, 2020

skilesare commented Oct 25, 2022

rossberg commented Nov 2, 2022

Quo vadis blob #966

Quo vadis blob #966

Comments

nomeata commented Dec 2, 2019 • edited Loading

nomeata commented Dec 11, 2019

ggreif commented Jan 9, 2020 • edited Loading

chenyan-dfinity commented Jan 9, 2020

ggreif commented Jan 12, 2020

nomeata commented Jan 12, 2020

crusso commented Jan 13, 2020

nomeata commented Jan 13, 2020

crusso commented Jan 13, 2020 • edited by ggreif Loading

nomeata commented Jan 13, 2020

nomeata commented Apr 16, 2020

chenyan-dfinity commented Apr 16, 2020

nomeata commented Apr 16, 2020

chenyan-dfinity commented Apr 16, 2020 • edited Loading

nomeata commented Apr 20, 2020

nomeata commented Jun 7, 2020

rossberg commented Jun 8, 2020 • edited Loading

nomeata commented Jun 8, 2020

rossberg commented Jun 9, 2020

nomeata commented Jun 9, 2020

rossberg commented Jun 9, 2020

rossberg commented Oct 27, 2020

nomeata commented Oct 27, 2020

nomeata commented Oct 27, 2020

skilesare commented Oct 25, 2022

rossberg commented Nov 2, 2022

Quo vadis `blob` #966

Quo vadis `blob` #966

nomeata commented Dec 2, 2019 •

edited

Loading

ggreif commented Jan 9, 2020 •

edited

Loading

crusso commented Jan 13, 2020 •

edited by ggreif

Loading

chenyan-dfinity commented Apr 16, 2020 •

edited

Loading

rossberg commented Jun 8, 2020 •

edited

Loading