Skip to content

Bacalhau project report 20220624

lukemarsden edited this page Jun 24, 2022 · 12 revisions

Progress on WASM, usability & production-readiness

Several areas of progress this week

CLI UX design

We've continued to develop the CLI UX design and achieved consensus (I think!) on the design doc.

The first change to "make room" for new user experiences in the CLI has been made: bacalhau run is now bacalhau docker run.

This will be out in the next release (not released yet).

This will make way for other experiences, like natively specifying python in bacalhau run python -c "print(1+1)" or bacalhau run python -r requirements.txt main.py

WASM

We've got the test infrastructure in place for the WASM changes, and started work on the implementation.

The first command to use WASM under the hood will be bacalhau run python --deterministic which will force deterministic execution of a Python script in a locked down wasm environment.

This requires a new concept of "context" (like a docker build context) which is a tar file of the local python files and requirements.txt. We don't want to broadcast these contexts on libp2p, since they might be large, so we are going to POST them to the requestor node's REST API, and have the requestor node pin them to IPFS for the duration of the job execution. Then the wasm executor can mount them into the executor's context at runtime.

For the Pyodide Python runtime in WASM, we'll also need to run a local webserver inside the pyodide container so that the user can choose from a pre-selected set of dependencies they can install. See loading Pyodide packages. Remember, from the workload's perspective, the network is disabled for security and reproducibility reasons.

This is nice because the programs themselves will be content-addressed.

Initially WASM workloads will be run in a Node.js context inside a Docker container.

Stress testing

Work has continued on the stress testing framework. This is essential to understand where we are against our scale/reliability goals. Looking forward to getting results soon, probably next week!

Code cleanup and strong types

We've switched to strongly typed enums for the types in the system. Guy picked this one up.

UX feedback!

We've iterated on some UX feedback we had from a user in the channel trying to use the tool: they didn't have ipfs running locally, and struggled to have the ipfs daemon running in the background on the machine where they were submitting jobs to the Bacalhau network with the CLI. Now, with bacalhau get, the Bacalhau CLI itself will temporarily connect to the IPFS network to download result files, which dramatically simplifies the UX for users! Guy did a great job with this one.

Terraform improvements

We've got a staging cluster now with its own DNS name, with continuous delivery via terraform with state properly stored in a GCS bucket - so every commit to main redeploys the VMs on google cloud. Continuous delivery for the win!

Production monitoring

We've made progress on getting Grafana cloud and OpenTelemetry all running for the production network.

What's next

  • Land Python/WASM demo
  • Get scaling numbers!
  • More CLI improvements
  • Scale up the production cluster as we start to get more usage
Clone this wiki locally