Skip to content

Bacalhau project report 20220815

lukemarsden edited this page Aug 15, 2022 · 8 revisions

Design and WIP heavy week ✍️

We've been doing a lot of design work and have quite a few WIP branches this week. That's fine, it's the middle of the month :-)

We are also working more on the "master plan part 2" and recruiting - more soon on those items!

Filecoin integration design & WIP

Our new publisher interface is how we will interface with Filecoin.

This will allow us to write results more permanently to upstream storage.

We are collaborating with the Estuary folks and have several SP convos lined up.

Phases

  1. Add support for >1 publish driver
    • Consider how this relates to the verifier (split verifier into output driver and actual verifier)
  2. Write output driver for Lotus
    • Shell out to CLI
  3. Write output driver for Web3.storage/Estuary
    • Simple API driver

Design

  • Results publishing interface
    • Executor finishes job
    • Tell network, I’ve finished, enough to verify me
    • Verification happens, it’s ok
    • Compute nodes publish local folder somewhere
    • [Create] -> [Shard] -> [Bid] -> [Lots of sub-jobs run] -> [Verify] -> [Publish]
  • Key question: should requestor node or compute node do the publishing?
    • NB: Publishing is really backing up
    • Lazy reassembling
    • Can we avoid any one machine having to have a big enough disk to reassemble the whole result?
    • Avoid reassembling!
    • Therefore, the compute nodes HAVE to do the publishing (they already have the data)
  • Example: S3 Publish drivers → 100 nodes publish individual objects to a bucket
  • For now, compute nodes will have to hold the credentials (e.g. keys) for any publish endpoint - we have no way to securely transmit secrets from clients to compute nodes, given that we can't trust the compute nodes

Implementation so far

  • Improve verifier interface to include verification steps
  • Add "publisher" interface that knows how to upload results of jobs once verified

Performance WIP

We are developing a "consistent hashing" style approach to picking a subset of nodes to bid on a given job. This should reduce network traffic still further, allowing us to scale performance much more easily. Only once we have non-O(N^2) network traffic we should start thinking about optimizing the transport.

Dashboard WIP

Enrico is working on the dashboard to show network stats over time:

image

This is still a WIP, more soon! This is the tracking issue.

What's next

  • Implement "lotus" and "estuary" publisher interfaces
  • Implement and test consistent hashing approach
Clone this wiki locally