-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the wasm-suricata wiki!
disclaimer: this is a experimental project, and the notes here describes what I have found or encountered during testing. I'm not expert in all the projects and technologies cited here and may have missed something, or made errors. If you have any comment or mistake to report, please open an issue, it will be very welcome.
WebAssembly, aka wasm, is neither web nor assembly, but is a portable bytecode intended for fast and secure execution of programs across different systems.
WebAssembly has a binary format and a text format. The binary format (.wasm) is a compact binary instruction format for a stack-based virtual machine and is designed to be a portable compilation target for other higher-level languages such as C, C++, Rust, C#, Go, Python and many more. The text format (.wat) is a human readable format designed to help developers view the source of an WebAssembly module.
There are many projects and acronyms, so here are those that are relevant to this project.
-
wasm
: "machine code" A low-level bytecode designed for portable, fast and easy execution. An open standard being built by the WebAssembly WG of the W3C -
wat
: "web assembly text format" human readable format, similar to assembly -
wasi
: "system calls" An API for doing basic system stuff, mainly I/O. We currently do not usewasi
, but onlywasm
.
wasm
in itself is not related to web or browsers, but its main use cases today are compiled modules for web pages, designed to be executed by the browser, and communicating with Javascript.
This is not the case here, as the intended use is to provide plugins for suricata parsers/output modules/detection plugins. This is important though, because many tools are designed for the 'in-browser' case, and rely on objects/data types/APIs that will not be provided by suricata.
Main Interpreter/JIT projects:
-
wasmer
: wasmer.io. Good documentation and examples, and has support for several backends (Cranelift, LLVM, standalone). -
wasmtime
: wasmtime.dev. Not so many examples, but clean API and good compilation speed, as well as execution. Seems to host the cranelift compiler development. -
wasm3
. A high performance WebAssembly interpreter written in C -
wamr
and others -
lucet
: a native WebAssembly compiler and runtime.
Only wasmer
and wasmtime
provide a nice integration/embedding with Rust code. See below for design choices.
lucet
looks really interesting (for ex. speed, code signing features, etc.), but was rather complex to install and does not work "out of the box". It may still be interesting to follow, and test again later.
This part is related to "source -> WASM".
Tests were successful with modules written in:
- Rust (native support)
- C (built with emscripten)
- AssemblyScript (build with npm)
Other experiments failed (which does not mean it is not possible, but it may take some more time):
- Go (native support,
GOOS=js GOARCH=wasm go build
): produced a huge file,wasmer
took too long to compile it in debug mode. Ok-ish in release mode (10s to build), but requires a full go runtime and many imports that are not provided - Tinygo (tinygo): works, but requires
wasi
by default (https://github.com/tinygo-org/tinygo/issues/1383)
This part is related to "WASM -> internal bytecode".
The runtime provides an engine. To execute a WASM file, the following steps must be executed:
- load the file to memory (fast)
- compile the file to a
module
(long) - create one or more
instance
for this module (fast)
The compilation happens during suricata's start. Even if this is done once per run, it can take some time. To speed up this part, a cache mechanism has been added to use pre-compiled files:
- After file is loaded to memory, a fast hash is performed using fxhash
- If a file with the same name as the hex value of the hash exists, it is de-serialized as the module
- Otherwise, the module is compiled, and then serialized to the cache file.
To enable the cache file, the cache_dir
value must be set in suricata's configuration file.
The WASM virtual machine memory is entirely isolated. This has several important consequences:
- data has to be copied to/from guest memory to be used
- the host can easily read/write in the guest memory. However, it does not know of its layout, nor what is currently allocated or free.
To solve this problem, the host requires that the guest provides 2 functions:
sc_allocate
andsc_free
. This is usually easy to map to functions to language allocation (malloc
/free
in C,Vec/Box
allocation in Rust, etc.)
- A host call is a function (service) provided by the host, and called by the guest
- A guest call is a guest function called by the host
Wasm allows only primitive types i32
, u32
, i64
, u64
, f32
and f64
for function arguments and return value. Pointers are usually a u32
value, relative to guest memory.
Because of this, passing more other types (like strings) more complicated.
Usually, for arguments, a string is passed as a pointer (to guest memory) and a length. For return values, this is not possible, so depending on the case another solution must be used (a single pointer to a zero-terminated string, or a serialized couple pointer+length, etc.).
Some special care has to be taken for strings:
- encoding must be UTF-8 (AssemblyScript uses UTF-16 by default)
- some calls require the string to be NULL-terminated
- if a host call returns a newly allocated buffer, it must be freed to avoid memory leaks. The choice of who (guest or host) must free memory depends on the situation (so docs should clearly state it).
The output module is instantiated in the C code in src/output-wasm.c. The C part is mostly a passthrough to Rust code.
The rust wasm module contains all the interesting files:
- compile.rs: WASM compiler calls
- output.rs: output module calls, modules/instance initialization and function calls
- runtime.rs: host calls
- runtime_util.rs: helper functions to simplify wrapping host call arguments/return values, accessing memory, etc.
TBD: I have not run any benchmarks at the moment
- Choice 1: use
wasi
or not
wasi
is interesting if file I/O can be performed directly by modules. This is not required, IMHO, and will add lots of code and useless complexity.
However, certain compilers like tinygo always target wasi
, so the produced modules cannot be run if not provided.
- Choice 2:
wasmer
orwasmtime
Difficult choice, both are interesting. I've done 2 different tests (one with each), both works and did not raise any blocking issue. This is totally subjective, but it seems wasmtime
provides a nicer API and is a bit faster, so I went for wasmtime
.
- WASM Code Explorer: useful to explore/decompile a binary WASM file