Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experiment: custom RTS functions #4438

Draft
wants to merge 77 commits into
base: master
Choose a base branch
from
Draft

Conversation

rvanasa
Copy link
Contributor

@rvanasa rvanasa commented Mar 9, 2024

This PR adds utility macros, traits, and functions which can be used to implement Rust FFI bindings in the RTS.

I'm currently focusing on just a few Motoko types (primarily Blob, Array, tuples, and numeric primitives). It's relatively simple to expand support by implementing FromValue and IntoValue for other Rust types.

Rust functions (included in PR for now):

#[motoko]
unsafe fn empty() {}

#[motoko]
unsafe fn identity(value: Value) -> Value {
    value
}

#[motoko]
unsafe fn div_rem(a: u32, b: u32) -> (u32, u32) {
    (a / b, a % b)
}

#[motoko]
unsafe fn array_concat(a: Vec<Value>, b: Vec<Value>) -> Vec<Value> {
    [a, b].concat()
}

#[motoko]
unsafe fn blob_modify(mut blob: BlobVec) -> BlobVec {
    blob.0.push('!' as u8);
    blob
}

#[motoko]
unsafe fn manual_alloc(#[memory] mem: &mut impl Memory) -> Value {
    // Low-level access to memory allocation
    let value = alloc_blob(mem, Bytes(3 as u32));
    let blob = value.as_blob_mut();
    let mut dest = blob.payload_addr();
    for i in 0..3 {
        *dest = (i + 1) * 0x11;
        dest = dest.add(1);
    }
    allocation_barrier(value)
}

#[motoko]
unsafe fn bool_swap(a: bool, b: bool) -> (bool, bool) {
    (b, a)
}

#[motoko]
unsafe fn check_numbers(
    a: u8,
    b: i8,
    c: u16,
    d: i16,
    e: u32,
    f: i32,
    g: u64,
    h: i64,
) -> (u8, i8, u16, i16, u32, i32, u64, i64) {
    (a, b, c, d, e, f, g, h)
}

Motoko usage (ffi.mo):

import Prim "mo:prim";
import Array "mo:base/Array";
import Blob "mo:base/Blob";

// Rust bindings
func empty() : () = (prim "rts:empty" : () -> ())();
func identity<T>(value : T) : T = (prim "rts:identity" : T -> T)(value);
func blob_modify(value : Blob) : Blob = (prim "rts:blob_modify" : Blob -> Blob)(value);
func array_concat<T>(a : [T], b : [T]) : [T] = (prim "rts:array_concat" : ([T], [T]) -> [T])(a, b);
func manual_alloc() : Blob = (prim "rts:manual_alloc" : () -> Blob)();
func div_rem(a : Nat32, b : Nat32) : (Nat32, Nat32) = (prim "rts:div_rem" : (Nat32, Nat32) -> (Nat32, Nat32))(a, b);
func bool_swap(a : Bool, b : Bool) : (Bool, Bool) = (prim "rts:bool_swap" : (Bool, Bool) -> (Bool, Bool))(a, b);
type Numbers = (Nat8, Int8, Nat16, Int16, Nat32, Int32, Nat64, Int64);
func check_numbers(a : Nat8, b : Int8, c : Nat16, d : Int16, e : Nat32, f : Int32, g : Nat64, h : Int64) : Numbers = (prim "rts:check_numbers" : Numbers -> Numbers)(a, b, c, d, e, f, g, h);

// `empty`
assert empty() == ();

// `identity`
let echoValue = identity(5);
Prim.debugPrint(debug_show echoValue);
assert echoValue == 5;

// `div_rem`
let (div, rem) = div_rem(7, 2);
assert (div, rem) == (3, 1);

// `array_concat`
let a = Array.freeze(Array.init<Nat8>(10_000_000, 123 : Nat8));
let b = Array.freeze(Array.init<Nat8>(500_000, 234 : Nat8));
let concat = array_concat(a, b);
assert concat.size() == a.size() + b.size();
assert concat[0] == 123;
assert concat[concat.size() - 1] == 234;

// `blob_modify`
let inputBlob = Blob.fromArray(Array.freeze(Array.init<Nat8>(10_000_000, 123 : Nat8)));
let blob = blob_modify(inputBlob);
let array = Blob.toArray(blob);
assert array[0] == 123;
assert array[array.size() - 1] == 33; // '!'
assert blob.size() == inputBlob.size() + 1;

// `manual_alloc`
let allocValue = manual_alloc();
assert Blob.toArray(allocValue) == [0x11, 0x22, 0x33];

// `bool_swap`
for (a in [true, false].vals()) {
    for (b in [true, false].vals()) {
        assert bool_swap(a, b) == (b, a);
    }
};

// `check_numbers`
let numbers: Numbers = (1, -2, 3333, -4444, 5_000_000, -5_000_000, 0, -1_000_000_000_000_000);
assert check_numbers(numbers) == numbers;

Try this yourself (with placeholders ffi.mo and ../motoko-base):

MOC_UNLOCK_PRIM=1 moc -c ffi.mo -wasi-system-api --package base ../motoko-base/src && wasmtime ffi.wasm

Changes:

  • rts_sections in Wasm module decoder
  • custom_rts_functions field in compilation environment
  • "rts:*" primitive functions which refer to names in the custom section
  • Bugfix for decoding custom sections with UTF-8 content
  • FromValue and IntoValue traits in RTS
  • #[motoko] procedural macro attribute which wraps #[ic_mem_fn] and generates a custom section
  • Bump proc-macro2 and syn in the motoko-rts-macros crate
  • Example RTS functions using #[motoko] attribute
  • Macro to implement FromValue and IntoValue for tuples
  • Re-vendor Cargo dependencies in Nix (are these instructions up to date?)
  • Type checking or runtime error for unknown "rts:*" primitive expressions?
  • Convert examples into tests

@@ -0,0 +1,326 @@
// Custom RTS function utilities

use alloc::vec::Vec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would potentially involve dynamic allocation using the Rust-interop EmphermalAllocator in allocator.rs allocating Blob behind the scenes. Did you check how many implicit allocations happen? I wonder whether we could possibly avoid the Vec and use custom convenience wrapper around Blob and Array for implementing FFI functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can definitely do this using structs which implement the FromValue and IntoValue traits. I'll add a partial implementation for Blob and Array<T> to give an initial starting point.

fn u16_from_nat16(value: Value) -> u16;
fn u32_from_nat32(value: Value) -> u32;
fn u64_from_nat64(value: Value) -> u64;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we could also add float conversions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'char' would be useful too.

(0..len)
.into_iter()
.map(|i| T::from_value(array.get(i), mem))
.collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether a dynamic allocation here happens via our EmphemeralAllocator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Do we want to prevent that or just live with the unintended consequences?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't define this trait, do we steer people into avoiding arguably dubious examples like array_concat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced this with two examples named blob_alloc_fast and blob_alloc_slow. I agree that this example should clearly indicate that this is slower than directly working in the Motoko values themselves.

Comment on lines 260 to 261
let dest = array.payload_addr();
$(*(dest.add($index)) = $name::into_value(self.$index, mem)?;)+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should be (not sure whether it compiles, better double check):

Suggested change
let dest = array.payload_addr();
$(*(dest.add($index)) = $name::into_value(self.$index, mem)?;)+
$(array.initialize($index, $name::into_value(self.$index, mem)?, mem);)+

Copy link
Contributor Author

@rvanasa rvanasa Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That worked; thanks! Also updated everywhere else that was allocating an array.

Feel free to resolve this discussion if everything looks right.

};
}

// Temporary examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure on what to ship as inbuilt examples. Some cases look general, some less (e.g. manual_alloc). Maybe we could restrict the predefined set to general cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to move these out of this file and into test cases rather than including these in the RTS by default. This is next on the priority list now that we're planning to potentially merge this PR.


export_fun env "i8_from_int8" (
Func.of_body env ["v", I32Type] [I32Type] (fun env ->
G.i (LocalGet (nr 0l)) ^^ TaggedSmallWord.lsb_adjust Type.Int8));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, TaggedSmallWord.untag should be applied too before lsb_adjust, e.g. to sanity check that the tags are correct. Analogously for the subsequent conversions.

Comment on lines 11822 to 11830
(* TODO: type checking error *)
Printf.printf "%s" Diag.(string_of_message {
sev = Error;
code = "M0199";
at;
cat = "RTS";
text = Printf.sprintf "custom function '%s' not found" s';
});
exit 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the TODO, raising an error would be the nicer way. This would need some handling by the compiler pipeline, e.g. similar to errors raised during the linker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very much agreed. I investigated this with @crusso earlier this week, and it seems like this may require significant refactoring to get this to work (at least as part of the type checking phase).

Using the pipeline error handling would also be important for moc.js if someone wanted to compile Motoko programs with a custom RTS in the browser.

.collect();

// Motoko return value
let ret = quote!(crate::types::Value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether all FFI functions would have a Value return type. Maybe there are some functions without return type (e.g. that implement checks with traps or do some updates).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had IntoArgs and FromArgs traits to account unit and tuple return types, but it turns out that the current implementation actually works better in these cases due to how the RTS functions add return values to the stack. We could potentially optimize this to use a variable number of Wasm function outputs, although I'm seeing this as probably out of scope for this PR.

@luc-blaeser
Copy link
Contributor

luc-blaeser commented Mar 19, 2024

Very nice PR, Ryan. Thanks a lot. This offers a well-structured, convenient small framework for FFI implementations. As you say, it would still require advanced knowledge to implement FFI. (Especially, also the GC aspects, e.g. keep Rust pointers only temporarily, applying the right GC barriers etc.).
The only worry I have is that users could easily break the memory safety and that we would then get issue reports of memory corruptions in Motoko (which could be time-consuming to invest and also maybe influence the safety reputation of Motoko). I guess people could still do this today by adjusting Motoko compiler/RTS on their own, but I believe now it would be easier. I wonder if we could reduce this risk, i.e. instruct users about all the safety/security rules and aspects for FFI functions, have an explicit opt-in for this, and/or apply additional steps when triaging issue reports that we can filter out Motoko code where users apply FFI functions (e.g. having a question before reporting to indicate whether FFI was used). Maybe my worry is exaggerated. I am interested what others team colleagues think, @crusso , @ggreif , @chenyan-dfinity.

@luc-blaeser
Copy link
Contributor

PS: I believe we could add some more tests for the FFI. I could also do some more stress testing with the GC, e.g. composing an additional GC random test or benchmark case that makes use of FFI.

TAG_BLOB => {
let blob = value.as_blob();
let len = blob.len().as_u32();
Ok(BlobVec((0..len).into_iter().map(|i| blob.get(i)).collect()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably also hiding uses of the ephemeral allocator. I assume collect is allocating a Vec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's correct. This is equivalent to something along the lines of let vec = Vec::with_capacity(len); for i in 0..len { ... }; vec.

@rvanasa
Copy link
Contributor Author

rvanasa commented Mar 19, 2024

The only worry I have is that users could easily break the memory safety and that we would then get issue reports of memory corruptions in Motoko (which could be time-consuming to invest and also maybe influence the safety reputation of Motoko).

This is a really good point @luc-blaeser. Because safety is a key part of Motoko's brand, this by itself makes a fairly strong case for developers to avoid this FFI approach when possible. One possibility could be to repurpose this PR as an internal refactor (implementing the built-in RTS functions using the #[motoko] macro). We could potentially keep the Wasm custom section to give advanced developers the option to extend the RTS where it would otherwise be impossible to use Motoko for their use case.

While this functionality is currently opt-in via the MOC_*_RTS environment variables, I suppose we could also include a compiler flag or something that explicitly allows custom RTS functions (or maybe even switch back to using the original logic in #4413). Also interested to hear more opinions from the rest of the team about how we could address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants