Skip to content
lukemarsden edited this page Jan 28, 2022 · 3 revisions

Latest Demo:

Current Status:

  • Investigate ipfs fuse mounts in order to efficiently mount a dataset inside a firecracker vm running the docker container with the job in it (avoiding any copying)

Questions:

  • Where should we discuss this?

  • What are you working on right now?

  • Is whether bacalhau is intended to run on top of ipfs storage, or filecoin storage?

    • Background:
      • To be clear IPFS doesn’t have storage, it is a protocol to transfer and index data.
      • Filecoin is the storage (and permanence) story.
      • In a very simple way to think of it (IPFS -> Distributed HTTP; Filecoin -> Decentralized S3)
    • There are a multitude of efforts to ensure that IPFS can index and interact with Filecoin stored data
    • We are building decentralized Lambda (or Fargate, etc..) to operate on decentralized S3
    • Bacalhau should focus on interacting with Filecoin, not IPFS
  • Can you actually “ignore IPFS completely” for Bacalhau?

  • Has IPFS thought about Compute Over Data before?

  • In the context of this statement "You will use libp2p to connect with Filecoin nodes, which is the largest chunk of IPFS codebase", how will Bacalhau know which node to execute compute on?

    • Specifically:
      • Which blocks have the (most/all) of the data?
      • How it will get at the plaintext (i.e. unsealed) versions of those files?
    • OPEN Could we allow nodes to hear about all jobs and then ignore the ones they don't want to take on aka self selection?
    • OPEN Do we envision that we are using libp2p to get the contents of files needed by a job or just to ask which nodes have that file?
    • OPEN How does a bacalhau node knows if it has the data locally and what api it will use to know?
  • What actually runs a Bacalhau process?

    • Current assumption: Bacalhau runs on the same machine as lotus and so we can get at plain-text versions of the files on disk without having to copy them across the network somehow
  • Should we interact with FUSE?

  • How do you run jobs in a "disconnected" mode, where you're separated from the main chain?

    • https://cluster.ipfs.io, which is not designed for computation for data storage acros multiple nodes and has a bunch of primitives you can reuse (RPC, Raft Consensus over libp2p and other things)