Skip to content

Bacalhau project report 20220311

Kai Davenport edited this page Mar 11, 2022 · 12 revisions

Scheduler interface success

We started this week by landing a major PR which finishes porting our existing code to use the new Scheduler interface. This proves that the scheduler interface is usable and we've convinced ourselves that it can be replaced with a smart contract in due course.

This work introduces:

  • Introduced "Requester" node
  • Implemented Scheduler interface with libp2p messaging backend
  • Hooked up compute node and requester node subscriptions to scheduler interface
  • End to end job lifecycle
    • Submit job -> compute node considers job
    • Bid job -> requester node considering compute node's reputation (in the future)
    • Bid accepted -> compute node runs job
    • Result submitted -> leads to results accepted or rejected based on validation framework

Feature work

Then we got back to feature work, exciting! In particular, we worked on the Nondeterministic Tracing + Reputation Verifier prototype (milestones 13, 14, 15).

CLI changes:

  • Client can specify concurrency in job submit --concurrency=N and have the job only be run that many times.
  • Client can specify confidence in job submit --confidence=M<N (3 nodes can run it only 2 need to agree)
  • Client can specify tolerance in job submit --tolerance=Q which is a memory-MB-like number of how much tolerance to allow between traces when clustering them

Devstack changes:

  • When starting devstack (the development playground for demos and testing networks on a single machine), you can specify --bad-actors=N to tell N of the nodes started to do no work when asked to do a job, but claim they did the work 😈

Rooting out bad actors with K-means clustering, oh my

In particular, we added the ability for Bacalhau to start to distinguish bad actors from good actors on the network by comparing traces. Assuming only a minority of nodes running any given job are bad actors, we can distinguish bad actors from good actors by clustering the traces of the results. The memory usage trace for each run of a job is compared by looking at the distance from the mean at every 1/10th of the time period. This gives each job a score of how different it was to the others. We then use 1 dimensional K-means clustering to group these results into two buckets, and we assume the larger bucket is the majority.

Demo! https://user-images.githubusercontent.com/264658/157901296-2443fb79-0413-4903-8c75-0ebb2fddadaf.mp4

Getting Bacalhau into production

We have also done a ton of design and planning work on how to get Bacalhau to production, and have a 30 phase plan on how to achieve that. Expect to hear more about this soon, and in Paris!

Clone this wiki locally