Skip to content

Bacalhau project report 20221003

lukemarsden edited this page Oct 4, 2022 · 7 revisions

List performance now great! 🚀

Screenshot 2022-09-29 at 12 51 27

In what turned out to be a major change to the codebase, and Enrico's first large backend change (go Enrico! 🎉) we now have a high performance backend implementation of bacalhau list, rather than the very slow (and linear time, with respect to number of jobs) client side implementation we were using previously.

What's more, it now only returns your own jobs by default, based on the public key identity generated on the client. Jobs are still public on the network and anyone can see anyone's jobs with the --all flag, but this change is nice because it removes a lot of noise that you'd see otherwise.

Technical details:

  • Tinier client, now makes only 1 List call to server that (always) returns a sorted, filtered job list.
  • Changed List endpoint return type map[string]model.Job -> []model.JobWithInfo. Basically JobWithInfo embeds Job and JobState.
  • JobQuery in localdb includes new options for fine-grained querying e.g. ClientID, Limit, SortBy, etc.

This also paves the way for persistence, where the nodes remember their jobs on restart, because the performant backend implementation will be able to efficiently query a large number of jobs.

Other nice UX improvements 👩‍💻

Mainly due to Dave, thanks Dave!

  • describe is much much cleaner and uses actual caps now
  • bacalhau describe --include events if you want noise
  • bacalhau validate thanks to Vedant
  • We validate your job before you submit it
  • Wanna use another tool to validate your job (like VSCode?) bacalhau validate —output-schema gives you a json schema (useful for IDE debugging of your jobs
  • bacalhau describe output can be used as input to bacalahu create - making copying jobs much easier (we ignore ID, client ID, etc
  • Lots of fixing variable names - please don't use a variable name the same as an imported module
  • Merged job spec and job deal for most commands where it makes sense (one less parameter, one less return).

Examples 📖

We now have a total of four examples which have been recast into the new notebook format, which means you can run them interactively online in Colab!

New oceanography example 🌊

In particular, the Surface Ocean CO₂ Atlas (SOCAT) example is a brilliant example of using Bacalhau for a real scientific workload!

Progress on smart contract & simulator 📜

image (15)

As shown in the above screenshot, we now have a "central API" websocket implementation of a smart contract mock (top right), implementing the Transport interface. This turned out to be the key first step in both the Smart Contract and Simulator epics.

The current system of writing events to the transport and then applying those events to a) the localdb and then b) the subscribers: Simulator design doc

The new system where we are interacting with a smart contract: Simulator design doc (1)

Now that we have split this out, we can start implementing the simulator by adding in-memory wallet balances to the simulator API. We also see a possible world where we run a centralized accounting system for a distributed computation network, as a step towards a fully distributed system. This should allow us to create incentives and user facing benefits ahead of having a performant smart contract implementation fully implemented. More on this soon in the updated user-facing roadmap...

Team onboarding and ramping up ❤️

  • I'm really proud of Enrico for landing his first major backend change. Enrico is also taking responsibility for the DAG work on the roadmap since it aligns with his skill set as a data engineer & data scientist.
  • Simon is ramping up really effectively and has already done a major design cycle and is onto prototyping WASM work.
  • Will has already PR'd his first example and has provided valuable user feedback with fresh eyes.

I'm excited to be working with such a productive team!

FIL+ design 👀

  • write the block explorer server so we have a server than can answer "is this CID a fil+ CID"
    • this will be a long running service deployed with Terraform but separately to our prod cluster
    • all clusters (even dev ones) can use the singleton service
  • by doing above - understand the notary flow of blessing a deal with DataCap and how that all works
  • take a good hard look at the filecoin docker image that Prash setup
    • understand how we might hook into deals with if we have datacap to assign to them
  • write a design doc based on above
    • variations on how the "notary providing datacap" will work
      • when the bacalhau team is a notary (so can dish out data cap ourselves)
      • when we are not a notary but feed requests to another notary to bless deals
    • the design of the workflow
      • job is submitted and we notice there is a FIL+ CID mentioned
      • alerts are triggered to let human notaries know there is a thing to look at
      • dashboard that let's the notaries inspect the job
      • dashboard that let's the notaries accept or reject
      • if accepted - trigger the "notary providing datacap" so the publisher (a.k.a compute node) is given the datacap

Another outage, and learnings from it 🔥

We had an outage yesterday which Walid dealt with excellently, incident report here.

At this stage I'm OK with the network having outages, they are excellent ways to learn and improve the quality of the software. Forged in the fires of production and all that.

What's next? ⏭️

  • FIL+! FIL+! FIL+!
  • Wallet balances in the simulator
  • More examples landing
  • Native WASM prototype executor
  • State machines in requestor node for retries/resilience
  • bacalhau cat to get a specific file out of a large results set
  • expanding prod network to include huge nodes, testing & refining SP onboarding docs as a result
  • other bugfixes
Clone this wiki locally