Skip to content

Bacalhau project report 20220516

lukemarsden edited this page May 16, 2022 · 3 revisions

Big refactor

Great progress this week on the big refactor: we now have the entire test suite passing against the new codebase!

The new codebase supports arbitrary docker images, but has also had a significant rewrite so that objects have in-memory mocks, which opens the door to unit testing and integrating testing as well as end-to-end testing.

The IPFS-with-Docker storage driver has been battle hardened, after fighting significant flakiness with Docker networking interactions with IPFS, we believe the remaining daemons lurking in that particular machine have been exorcised by using the host network as well as a set of other tweaks.

We also discovered that the IPFS FUSE driver actually copies the file locally before it makes it available over FUSE, which rather obviates the value of using FUSE at all. To that end, we implemented an equivalent but simpler "IPFS API copy" storage driver which just uses the IPFS API to copy a CID to temporary storage before running the job on it. We'll implement a more performant streaming interface later, when we get onto the performance milestones.

Wiring up the CLI is probably 1-2 days away so that we can merge the branch and have the new system ready for folks to play with, demo and contribute to.

Storage interface

We now have a storage interface as well as a compute interface.

One important concept to mention on the storage interfaces is that we have decided that the concept of a generic storage driver doesn't make sense. Storage drivers only make sense in terms of a certain execution interface. So, for example we have an "IPFS-with-Docker" storage engine, not a generic "IPFS" storage engine. There's a new distinction between "backing store" and "mounting mechanism". This distinction means that the user can specify storage as IPFS, and multiple storage-with-compute implementations can express that they support IPFS backing store, but their actual implementations can differ.

We'll add a diagram of the updated interfaces to the docs before we launch at the end of the month.

WASM / Python FaaS

We've done further investigation into the "deterministic Python function-as-a-service" and have a plan to implement packaging it as simply a function.py + requirements.txt files. The function.py function takes a defined signature (one argument of which is a file like input file which will be bytes of file if the CID is a file or a tar stream if it's a directory CID).

The Python standard library includes support for tar streams so it should be natural to be able to write code which e.g. iterates over the files in the CID just using this file descriptor approach as long as it's seekable. In other words we'll be using language primitives rather than needing to expose a Bacalhau-specific SDK in Python. This will make code more portable.

As soon as the big refactor branch lands we'll start working on an execution implementation for deterministic python FaaS in WASM.

The Python FaaS executor is a stretch goal for the launch at the end of the month.

Next

  • Wire up CLI and land big refactor branch
  • Docs for bacalhau.org
  • DevOps to get our first 10 nodes deployed to GCP
  • Bootstrapping list
  • Manual QA
  • Implement Python FaaS in beta
Clone this wiki locally