Major refactoring (#125)

deislabs · Oct 6, 2021 · c48bdd6 · c48bdd6
1 parent 9abecad
commit c48bdd6
Show file tree

Hide file tree

Showing 48 changed files with 3,790 additions and 2,852 deletions.
diff --git a/.gitignore b/.gitignore
@@ -3,3 +3,4 @@ wagi-cache/
 /ssl-example.*
 .vscode/
 _scratch/
+tests_working_dir/
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -35,3 +35,4 @@ wasmtime                        = "0.30"
 wasmtime-wasi                   = "0.30"
 wasmtime-cache                  = "0.30"
 wat                             = "1.0.37"
+chrono = "0.4.19"
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -55,6 +55,85 @@ In the future, as WASI matures, we will relax the restrictions on outbound netwo
 - WAGI does NOT support NPH (Non-Parsed Header) mode
 - The value of `args` is NOT escaped for borne-style shells (See section 7.2 of CGI spec)
 
-It should be noted that while the daemon (the WAGI server) runs constantly, both the `modules.toml` and the `.wasm` file are loaded for each request, much as they were for CGI.
-In the future, the WAGI server may cache the WASM modules to speed loading.
-But in the near term, we are less concerned with performance and more concerned with debugging.
+In previous releases, although the daemon (the WAGI server) runs constantly,
+both the `modules.toml` and the `.wasm` file were loaded from disk each request, much as they were for CGI.
+As of the time of writing, the WAGI server now reads the WASM modules at startup and keeps
+them in memory.  This abstracts serving code away from filesystem interactions, and also
+improves performance.
+
+## Design notes
+
+This implementation of WAGI falls into two parts:
+
+* Initialisation (the bulk of `main.rs`)
+* Request serving (`wagi_server` and the components it calls)
+
+After initialisation we should know everything we need to know to handle requests,
+and we should have failed if we can determine that anything is missing or
+invalid. All configuration files have been parsed and validated, all modules have
+been downloaded and read, all dependencies have been readied, etc.  If initialisation
+fails, WAGI stops rapidly with an error message.
+
+**Caveat:** We could probably perform even more validation during the initialisation
+phase. For example, at the time of writing, we don't check if route entry points
+exist.
+
+Because any failure during initialisation should cause an immediate exit, we do
+only minimal tracing during this phase; the exit and error message should provide
+enough information to diagnose any problems.
+
+### Principles of the initialisation phase
+
+* Parse, don't validate.  That is, convert raw data such as config files into a
+  form that minimises further checking or special case handling later on.
+* Don't make downstream components care about how upstream components got their
+  data.  This is not always practical, but the idea is to minimise how much, say,
+  the route builder needs to care about whether it is dealing with an OCI reference
+  in a `modules.toml` or a parcel in a local standalone bindle. Separate the stages;
+  keep `main()` as simple and as linear as possible.
+* Fail fast.  Related to the above, check that everything
+  you need is present, in the right place, and usable.  Ideally parse it into
+  a form such that the next stage doesn't need to repeat the checks.
+* Fail informatively.  Be generous with error context and values.  Rust has
+  an awful habit of reporting things like "key not in dictionary" and "file
+  or directory does not exist."  Err on the side of saying _which_ thing
+  went wrong.
+* Provide entry points for automated testing.
+
+### Key types and function groups
+
+* Initialisation is geared to producing a `RoutingTable` which maps routes to handlers.
+  A `RoutingTable` consists primarily of a vector of `RoutingTableEntry`. ('Map' is
+  a slight misnomer here, because of ordering and wildcard routes.)
+* `RoutingTableEntry` contains a route (represented by `RoutePattern`) and all the data
+  required to handle that route (represented by the `RouteHandler` enum).
+* The types with "handler" in the name can be a bit confusing.  We need them because
+  we have different representations of handlers as we assemble the data we need to
+  run them.
+  - `RouteHandler` is the final, "runnable" form of handler.
+  - `WasmRouteHandler` is the data for the interesting case of `RouteHandler`.
+  - `WagiHandlerInfo` aggregates the information about a route and associated parcels
+    specified in a bindle.
+  - `HandlerConfigurationSource` represents the combination of flags passed on the
+    command line to say where routing and handling is specified, e.g. a `modules.toml`
+    file or a bindle.
+  - `HandlerConfiguration` represents the parsed form of whatever the
+    `HandlerConfigurationSource` points to. Note that `HandlerConfigurationSource` is the
+    _reference_ to the source (e.g. file path or bindle ID); `HandlerConfiguration` is
+    _the content of that the file or bindle_.
+  - `LoadedHandlerConfiguration` is a `HandlerConfiguration` augmented with the binary
+    content of the Wasm modules specified in that configuration.
+  - Note that all those last three are different _again_ from `WagiConfiguration`
+    which contains a whole bunch of other configuration like TLS and stuff.
+  - I am very very sorry for everything.
+* `WasmModuleSource` represents data that can be instantiated as a Wasm module. At the
+  time of writing, the only case is `Blob`, which is the raw bytes of the Wasm binary.
+  In future, this could have an additional case (or have a single different case!) of
+  a pre-instantiated module - the point of the type is to insulate other code from making
+  assumptions about the representation.
+* The `wasm_runner` module provides services for executing Wasm modules that communicate
+  via stdin/stdout.  This allows commonality between dynamic route discovery and handler
+  execution.  There is scope for more encapsulation here though!
+
+We welcome improvements to and tidying of the module structure and placement of
+functions.