-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
experiment: precise heap tags #4544
base: master
Are you sure you want to change the base?
Conversation
…t.Array in compile.ml
…stream_write_text to accept blob or text
pub const TAG_NULL: Tag = 29; | ||
pub const TAG_ONE_WORD_FILLER: Tag = 31; | ||
pub const TAG_FREE_SPACE: Tag = 33; | ||
pub const TAG_ARRAY_I: Tag = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Only a very little stylistic thought: Maybe we could name the tags with a unabbreviated suffix like TAG_ARRAY_IMMUTABLE
etc. or just have a short comment here at the declaration.)
pub fn slice_tag(array_tag: Tag, slice_start: u32) -> Tag { | ||
debug_assert!(array_tag == TAG_ARRAY_I || array_tag == TAG_ARRAY_M || array_tag == TAG_ARRAY_T); | ||
debug_assert!(slice_start >= TAG_ARRAY_SLICE_MIN && slice_start < (1 << 30)); | ||
(((array_tag - TAG_ARRAY_I) / 2) << 30) | slice_start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, maybe, a debug assertion could be helpful, to ensure that array_tag - TAG_ARRAY
is even and has a distance of maximum 3 (to be encoded in two bits).
The 30 bits is also related to MAX_ARRAY_SIZE
in constant.rs
that it must cover the maximum size of the array, i.e. MAX_ARRAY_SIZE < (1 << 30)
. Maybe, subsummarizing it as a local constant could be helpful (just that I catch all occurrences for the 64-bit porting, where maximum array size is larger).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: Maybe the odd numbering is no longer needed in future if we focus on incremental GC. In this case the division and multiplication by 2 would no longer be necessary...
|| tag >= TAG_ARRAY_SLICE_MIN | ||
); | ||
if tag >= TAG_ARRAY_SLICE_MIN { | ||
(TAG_ARRAY_I + (tag >> 30) * 2, tag << 2 >> 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, the the second and third occurrence of 2 could be expressed as usize::BITS - 30
(where 30 is the constant of the shift to encode the tag in the slicing info)
|| tag >= TAG_ARRAY_SLICE_MIN | ||
); | ||
if tag >= TAG_ARRAY_SLICE_MIN { | ||
TAG_ARRAY_I + (tag >> 30) * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a possible sanity check: The result would be tag == TAG_ARRAY_I || tag == TAG_ARRAY_M || tag == TAG_ARRAY_T
(based on the assumption that the three array tags are consecutively numbered and odd).
@@ -506,7 +558,9 @@ impl Obj { | |||
} | |||
|
|||
pub unsafe fn as_blob(self: *mut Self) -> *mut Blob { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether it could make sense to use dedicated as_blob
functions, depending on the context, e.g. for texts etc.
} | ||
|
||
pub unsafe fn restore_tag(self: *mut Self, array_tag: Tag) -> () { | ||
(*self).header.tag = array_tag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the restored tag would fulfil array_tag == TAG_ARRAY_I || arraty_tag == TAG_ARRAY_M || array_tag == TAG_ARRAY_T
(no slicing any longer)? Not sure, if a debug assertion would make sense.
@@ -389,7 +399,7 @@ impl Value { | |||
/// In debug mode panics if the value is not a pointer or the | |||
/// pointed object is not a `Blob`. | |||
pub unsafe fn as_stream(self) -> *mut Stream { | |||
debug_assert_eq!(self.tag(), TAG_BLOB); | |||
debug_assert!(self.is_blob()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the stream would always be a TAG_BLOB_B
? Maybe, a stricter assertion could be applied.
@@ -373,14 +383,14 @@ impl Value { | |||
/// Get the pointer as `Blob` using forwarding. In debug mode panics if the value is not a pointer or the | |||
/// pointed object is not a `Blob`. | |||
pub unsafe fn as_blob(self) -> *const Blob { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea: We could expose distinct access function, on for pure blob and one for text (maybe also for principal) with corresponding specific tag checks, and then call the corresponding specific function depending of the context. This would only serve for making sanity checks stricter.
|
||
use motoko_rts_macros::ic_mem_fn; | ||
|
||
// CRC32 for blobs. Loosely based on https://rosettacode.org/wiki/CRC-32#Implementation_2 | ||
|
||
#[no_mangle] | ||
pub unsafe extern "C" fn compute_crc32(blob: Value) -> u32 { | ||
if blob.tag() != TAG_BLOB { | ||
if !blob.is_blob() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be a case where the blob is a principal? The check could then be constrained to TAG_BLOB_P
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably wrong. Below it seems to be a TAG_BLOB_B
that is passed to this function. It is hard to follow...
let r = alloc_blob(mem, Bytes((n.as_u32() + 4 + 4) / 5 * 8)); // contains padding | ||
let r = alloc_blob( | ||
mem, | ||
TAG_BLOB_T, /*?*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. The base32 would be a textual encoding?
@@ -225,7 +229,7 @@ unsafe fn base32_to_principal<M: Memory>(mem: &mut M, b: Value) -> Value { | |||
let mut data = blob.payload_const(); | |||
|
|||
// Every group of 5 characters will yield 6 bytes (due to the hypen) | |||
let r = alloc_blob(mem, Bytes(((n.as_u32() + 4) / 5) * 6)); | |||
let r = alloc_blob(mem, TAG_BLOB_T, Bytes(((n.as_u32() + 4) / 5) * 6)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure: Would the result maybe be a TAG_BLOB_P
? (I am just trying to infer it by the function name.)
@@ -172,7 +174,7 @@ unsafe extern "C" fn text_to_buf(mut s: Value, mut buf: *mut u8) { | |||
unsafe extern "C" fn stream_write_text(stream: *mut Stream, mut s: Value) { | |||
loop { | |||
let s_ptr = s.as_obj(); | |||
if s_ptr.tag() == TAG_BLOB { | |||
if s_ptr.tag() == TAG_BLOB_B || s_ptr.tag() == TAG_BLOB_T { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: The TAG_BLOB_B
would occur for the Candid streaming. And the TAG_BLOB_T
because of the principal to text conversions?
@@ -570,7 +570,7 @@ fn create_static_heap( | |||
// root. | |||
let array_addr = u32::try_from(heap.as_ptr() as usize).unwrap(); | |||
let mut offset = 0; | |||
write_word(heap, offset, TAG_ARRAY); | |||
write_word(heap, offset, TAG_ARRAY_M); // TODO: TBR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could also use TAG_ARRAY_I
for the static roots. But it is quite uncritical as it is not scanned by graph stabilization.
|
||
let text = text_of_str(&mut heap, "bfozs-kwa73-7nadi"); | ||
assert_eq!( | ||
text_compare( | ||
blob_compare( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not mistaken, this compares a TAG_BLOB_B
with TAG_BLOB_T
. Perhaps, the text_of_str()
is here more like a blob encoder in this specific test...
@@ -1996,18 +1999,22 @@ module Tagged = struct | |||
so update all! | |||
*) | |||
|
|||
type bits_sort = U | S | F |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just cosmetics: Maybe a brief comment could be helpful saying what the abbreviations stand for.)
@@ -2197,7 +2212,7 @@ module Tagged = struct | |||
let (set_o, get_o) = new_local env "o" in | |||
let prep (t, code) = (t, get_o ^^ code) | |||
in set_o ^^ get_o ^^ branch_default env retty def (List.map prep cases) | |||
|
|||
(* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good that you noticed. Maybe we can delete this dead code...
@@ -2571,7 +2587,11 @@ module BoxedWord64 = struct | |||
|
|||
let payload_field = Tagged.header_size | |||
|
|||
let heap_tag env pty = Tagged.Bits64 (* TODO *) | |||
let heap_tag env pty = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice: The TODO is now addressed.
G.if1 I32Type (Opt.inject_simple env get_blob) (Opt.null_lit env) | ||
G.if1 I32Type | ||
(get_blob ^^ Blob.as_ptr_len env ^^ | ||
of_ptr_size env ^^ (* creates text blob *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good that you covered this case: We indeed need to create a copy of the blob, as it has a different purpose (and tag)...
let alloc env array_sort len = | ||
compile_unboxed_const Tagged.(int_of_tag (Array array_sort)) ^^ | ||
len ^^ | ||
E.call_import env "rts" "alloc_array" (* TODO *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about the TODO
: Do you have a specific open aspect in mind?
@@ -7519,11 +7583,11 @@ module MakeSerialization (Strm : Stream) = struct | |||
*) | |||
get_thing ^^ set_result ^^ | |||
get_memo ^^ get_result ^^ store_unskewed_ptr ^^ | |||
get_memo ^^ compile_add_const 4l ^^ Blob.lit env (typ_hash t) ^^ store_unskewed_ptr | |||
get_memo ^^ compile_add_const 4l ^^ Blob.lit env Tagged.T (typ_hash t) ^^ store_unskewed_ptr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could also consider the hash fingerprint (to double check the aliasing deserialization mechanism) as pure blob?
@@ -8639,7 +8703,7 @@ module GCRoots = struct | |||
}) | |||
|
|||
let store_static_roots env = | |||
Arr.vanilla_lit env (E.get_static_roots env) | |||
Arr.vanilla_lit env Tagged.M (E.get_static_roots env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether the static roots could be tagged as immutable, as they reside in static space and should not change at runtime (references to static MutBoxes that are not moved by the GC). This would be different with EOP.
@@ -8747,7 +8812,10 @@ module StackRep = struct | |||
| Const.Unit -> Tuple.unit_vanilla_lit env | |||
| Const.Array cs -> | |||
let ptrs = List.map (materialize_const_t env) cs in | |||
Arr.vanilla_lit env ptrs | |||
Arr.vanilla_lit env Tagged.I ptrs | |||
| Const.Tuple cs -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, as constant tuples are now generated (I noticed it below).
Implement precise tagging of heap allocated objects.
Splits
TAG_ARRAY
into consecutiveTAG_ARRAY_I
,TAG_ARRAY_M
andTAG_ARRAY_T
(immutable array, array, tuple).TAG_BLOB
into consecutiveTAG_BLOB_B,
TAB_BLOB_Tand TAG_BLOB_P
(blob, text, principal).TAG_BITS_32
into consecutiveTAG_BITS32_U
,TAG_BITS32_S
andTAG_BITS32_F
((boxed) Nat32/Int32, Float32 (reserved for future use).TAG_BITS_64
into consecutiveTAG_BITS64_U
,TAG_BITS64_S
andTAG_BITS64_F
((boxed) Nat64/Int32, Float64).Modifies array slicing logic to preserve the base tag of the underlying array, by stealing the top 2 bits of the slice start field, relying on the fact that array indices are < 2^30. For clarity, the tag in decompressed on each iteration. For efficiency, it could actually be maintained during slicing compressed form as a sort ( see 448962f).
TODO:
[ ] print refined tags in debug output?
TBD:
[ ] should we also reserve TAG_BLOB_M for (future) mutable blobs and maybe TAB_BLOB_X for internal blobs?
[ ] should we also reserve TAG_ARRAY_X for things like roots?
[ ] I currently just added a tag argument to rts.alloc_blob and rts.alloc_array. Would it be preferable to have six allocation functions?