Skip to content
Pierre Chifflier edited this page Dec 22, 2020 · 6 revisions

Welcome to the wasm-suricata wiki!

disclaimer: this is a experimental project, and the notes here describes what I have found or encountered during testing. I'm not expert in all the projects and technologies cited here and may have missed something, or made errors. If you have any comment or mistake to report, please open an issue, it will be very welcome.

Table of contents

WASM

WebAssembly, aka wasm, is neither web nor assembly, but is a portable bytecode intended for fast and secure execution of programs across different systems.

WebAssembly has a binary format and a text format. The binary format (.wasm) is a compact binary instruction format for a stack-based virtual machine and is designed to be a portable compilation target for other higher-level languages such as C, C++, Rust, C#, Go, Python and many more. The text format (.wat) is a human readable format designed to help developers view the source of an WebAssembly module.

Ecosystem

There are many projects and acronyms, so here are those that are relevant to this project.

  • wasm: "machine code" A low-level bytecode designed for portable, fast and easy execution. An open standard being built by the WebAssembly WG of the W3C
  • wat: "web assembly text format" human readable format, similar to assembly
  • wasi: "system calls" An API for doing basic system stuff, mainly I/O. We currently do not use wasi, but only wasm.

wasm in itself is not related to web or browsers, but its main use cases today are compiled modules for web pages, designed to be executed by the browser, and communicating with Javascript.

This is not the case here, as the intended use is to provide plugins for suricata parsers/output modules/detection plugins. This is important though, because many tools are designed for the 'in-browser' case, and rely on objects/data types/APIs that will not be provided by suricata.

Main Interpreter/JIT projects:

  • wasmer: wasmer.io. Good documentation and examples, and has support for several backends (Cranelift, LLVM, standalone).
  • wasmtime: wasmtime.dev. Not so many examples, but clean API and good compilation speed, as well as execution. Seems to host the cranelift compiler development.
  • wasm3. A high performance WebAssembly interpreter written in C
  • wamr and others
  • lucet: a native WebAssembly compiler and runtime.

Only wasmer and wasmtime provide a nice integration/embedding with Rust code. See below for design choices. lucet looks really interesting (for ex. speed, code signing features, etc.), but was rather complex to install and does not work "out of the box". It may still be interesting to follow, and test again later.

Languages

This part is related to "source -> WASM".

Tests were successful with modules written in:

  • Rust (native support)
  • C (built with emscripten)
  • AssemblyScript (build with npm)

Other experiments failed (which does not mean it is not possible, but it may take some more time):

  • Go (native support, GOOS=js GOARCH=wasm go build): produced a huge file, wasmer took too long to compile it in debug mode. Ok-ish in release mode (10s to build), but requires a full go runtime and many imports that are not provided
  • Tinygo (tinygo): works, but requires wasi by default (https://github.com/tinygo-org/tinygo/issues/1383)

Compilation and cache

This part is related to "WASM -> internal bytecode".

The runtime provides an engine. To execute a WASM file, the following steps must be executed:

  • load the file to memory (fast)
  • compile the file to a module (long)
  • create one or more instance for this module (fast)

The compilation happens during suricata's start. Even if this is done once per run, it can take some time. To speed up this part, a cache mechanism has been added to use pre-compiled files:

  • After file is loaded to memory, a fast hash is performed using fxhash
  • If a file with the same name as the hex value of the hash exists, it is de-serialized as the module
  • Otherwise, the module is compiled, and then serialized to the cache file.

To enable the cache file, the cache_dir value must be set in suricata's configuration file.

Memory

The WASM virtual machine memory is entirely isolated. This has several important consequences:

  • data has to be copied to/from guest memory to be used
  • the host can easily read/write in the guest memory. However, it does not know of its layout, nor what is currently allocated or free. To solve this problem, the host requires that the guest provides 2 functions: sc_allocate and sc_free. This is usually easy to map to functions to language allocation (malloc/free in C, Vec/Box allocation in Rust, etc.)

Host/Guest calls

  • A host call is a function (service) provided by the host, and called by the guest
  • A guest call is a guest function called by the host

Wasm allows only primitive types i32, u32, i64, u64, f32 and f64 for function arguments and return value. Pointers are usually a u32 value, relative to guest memory. Because of this, passing more other types (like strings) more complicated.

Usually, for arguments, a string is passed as a pointer (to guest memory) and a length. For return values, this is not possible, so depending on the case another solution must be used (a single pointer to a zero-terminated string, or a serialized couple pointer+length, etc.).

Some special care has to be taken for strings:

  • encoding must be UTF-8 (AssemblyScript uses UTF-16 by default)
  • some calls require the string to be NULL-terminated
  • if a host call returns a newly allocated buffer, it must be freed to avoid memory leaks. The choice of who (guest or host) must free memory depends on the situation (so docs should clearly state it).

Suricata/WASM output module

The output module is instantiated in the C code in src/output-wasm.c. The C part is mostly a passthrough to Rust code.

The rust wasm module contains all the interesting files:

  • compile.rs: WASM compiler calls
  • output.rs: output module calls, modules/instance initialization and function calls
  • runtime.rs: host calls
  • runtime_util.rs: helper functions to simplify wrapping host call arguments/return values, accessing memory, etc.

Performance

TBD: I have not run any benchmarks at the moment

Design choices and questions

  • Choice 1: use wasi or not

wasi is interesting if file I/O can be performed directly by modules. This is not required, IMHO, and will add lots of code and useless complexity. However, certain compilers like tinygo always target wasi, so the produced modules cannot be run if not provided.

  • Choice 2: wasmer or wasmtime

Difficult choice, both are interesting. I've done 2 different tests (one with each), both works and did not raise any blocking issue. This is totally subjective, but it seems wasmtime provides a nicer API and is a bit faster, so I went for wasmtime.

Links/Other Resources