Skip to content

Bacalhau project report 20220808

lukemarsden edited this page Aug 8, 2022 · 4 revisions

Sharding, globbing & parallelism! ⛓️

We now support parallelizing a job across many nodes.

You can have an IPFS CID which is a directory of many files, and you can submit the job such that Bacalhau will automatically split the work across multiple servers. Results from the sub-jobs will be combined by the CLI when it downloads them. This assumes that each input file gets processed as a corresponding output file (that these files are separable).

Demo here: https://drive.google.com/file/d/1eSaECJ4IT5mEk_sWwmMuxdKJhzb5KVq1/view?usp=sharing

This was a major goal for July, so we are glad to have shipped it :-) https://github.com/filecoin-project/bacalhau/blob/main/ROADMAP.md#july

More detail on how this works here: https://github.com/filecoin-project/bacalhau/pull/442

Performance improvements in large networks ⚡

This is a work in progress still, but we are working on improving performance in large networks. So far, we've 3x'd performance on 250 node clusters. Lots to do here still.

The 3x improvement came from making the nodes not bid on a job if they've seen concurrency-many BidAccepted messages for a job, or 1.5x concurrency-many Bid messages. This means the number of messages exchanged for a small job in a large network has gone down from O(250) to O(15).

We're still seeing a fairly high error rate (timeouts) in large networks so addressing this is the next priority.

Design for Filecoin integration

We have started the process of designing how we will integrate Bacalhau with Filecoin. The main initial idea is to support writing to Filecoin via the lotus CLI for outputs of jobs. This way, Bacalhau can be used as a bridge between IPFS (or HTTP!) datasets, processing those datasets and publishing the results to Filecoin as well.

We'll also look at how we can make it easy to integrate with gateways like Web3.storage and Estuary in the future.

Various bug fixes

There were various bug fixes and improvements to bacalhau list and bacalhau describe, e.g. supporting short ids. Also, supporting -w working directory argument in bacalhau docker run.

Next

  • Continue performance work
  • Filecoin integration implementation
Clone this wiki locally