Skip to content

Bacalhau project report 20220715

lukemarsden edited this page Jul 15, 2022 · 7 revisions

Lots of little improvements

CI was a right pain in the ass this week, we had issues both with test flakiness and CircleCI giving us grief with tag builds failing not letting us release.

CI is GREEN baby 🍏🟢💚🍏

The good news is that Guy made a heroic effort to fix test flakiness, and we've got CI much much more reliable than it was before:

image

And the releasing issue is fixed (gives CircleCI a funny look about persisting via workspaces randomly giving you old versions of files), so we were able to release to production again!

Traces in production

We did the final piece of DevOps work to enable production metrics and traces in honeycomb now:

Screenshot 2022-07-15 at 12 41 36 Screenshot 2022-07-15 at 12 40 54

This will be of great value to us in investigating performance and behavior on the production network. We also have prometheus instrumentation ready in the code, and a plan for how to deploy Prometheus to each production node and remote_write to Grafana Cloud.

Input/output volumes in Python/WASM

You can now use -v and -o to mount input and output volumes to Python code running in the WASM runtime (with bacalhau run python --deterministic). These volumes are now plumbed all the way through to the CPython instance's virtual filesystem running inside the Node.js WebAssembly runtime. This opens the door to deterministic and therefore verifiable data processing on content-addressible data in IPFS, which is very exciting as it starts to build out a global graph structure of verified y=f(x) where x, f and y are content-addressed, and f is deterministic and y is verified. For all human knowledge!

Datastore interface

Kai is heads down on a big change under the banner of the "datastore interface" - this unifies state on the system under a local interface, removing race conditions about updates to local knowledge that were starting to be problematic - and which would slow us down in the future. This will also unlock persistence across Bacalhau nodes restarting, so for example we'll eventually stop forgetting about old jobs every time we upgrade the production network. This is a good foundational improvement to the codebase.

The datastore interface is a prerequisite to implementing the parallelism/sharding design.

Fixed some bugs!

One fun bug was that running the same job twice on the same node didn't work because the CID was already downloaded - so we fixed that, taking care to ensure atomicity for concurrent use - and fixed bacalhau get repeatedly downloading the same CID at the same time!

Changing of the guard

This week we were sad to say goodbye to the most excellent Guy Paterson-Jones, who is off to try his hand at being a rock-climbing instructor. Guy's contributions have been epic and he's awesome, and will be missed.

We are also thrilled to welcome Phil Winder and Enrico Rotundo to the team. Kai and I have worked with Phil and Enrico on several other projects and we know they'll be able to start making productive contributions to the project in no time! The first issues they'll pick up are GPU support and loading external files via HTTP (including e.g. large lists of URLs) as inputs into Bacalhau jobs, both of which will directly drive user success.

So: quite a bit of my time was spent doing handover with Guy and onboarding Phil and Enrico. Onwards and upwards!

What's next?

  • Land datastore interface
  • More scaling work
  • Implement parallelism/sharding
  • GPUs
  • HTTP input files
Clone this wiki locally