Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Section 3 Disagreement #30
I disagree with the intro paragraphs of section 3.
This is no different than running Dropbox or other cloud-syncing tool. Let's take Transmission (a popular bittorrent client) as an example. I run this on my linux NAS. What actually runs is
When I want to share/download a torrent file, I use
I see storj (from client/user perspective, not farmer) no differently. Even on Mac/Windows, a process (actually, probably a couple threads) runs/lives in the background, joining with other nodes, doing PING/PONGs, participating in DHT, etc.
When I want to store a file, I interact with the GUI and the app slices up the file, encrypts it, stores audit/challenge information locally, sends out contract requests, and uploads to farmers. It would be an app preference as to how many mirror copies the user wants living on the network and would be the responsibility of the app to ensure this.
Since the app is intended to always be running (like DB/ACD/etc), it would also be the apps responsibility to do hourly/daily/periodic audits of all files.
As it's written, the paper seems to lean towards the bridge being a required piece to this puzzle. Yet, this creates a dependency and a single-point-of-failure and defeats the whole purpose behind a distributed/shared-nothing storage platform.
I may have jumped the gun. I just read 3.1 and this indicates that me, an end user, can run my own bridge "daemon" and my client communicates with it. This seems to fit into the transmission model I described above. I guess I was under this impression that only "Storj, Inc" would ever run bridge nodes. Am I reading that correct? I can run my own storj-bridge, which connects to the network like any other node?
Hey thanks for your comments. I'll try to provide some explanation here, and get helpful revisions into the next version.
It's a bit different in that with Dropbox you trust the infrastructure and the risks are low. One of the design goals is resiliency to unreliable farmers, which is not a goal of Dropbox.
We've considered the daemon model for a Bridge client, and may do that in the future. If you were also running a local Bridge it'd be a complete system.
Dropbox is still designed to allow the computer to turn off, while being offline for an extended period could have very bad results for a hypothetical Storjbox that incorporates its own audits and file state management.
More like you can run your own Bridge server. Code is all AGPL licensed, go wild.
It's a little difficult right now, but the tooling is getting better. See the following repos:
This is a confusion of terminology. The Bridge as described in the whitepaper is a system composed of several parts including Bridge API, Complex, Audits, a shared Mongo document store. The Bridge API cannot function without Complex. Should clarify that.
Also, I really like the idea of a self-managing file storage app. Sometime I'd like to figure out a good way to solve the problems for this. Maybe increased redundancy would be good enough?
In thinking about this, overall, at some point, the end-user has to place their trust somewhere. My issue is, and has been, the notion that Storj is supposed to be "distributed" and "fault-tolerant" where any node could go offline at anytime and your data still safe.
Yet, the requirement of having Bridge is not distributed and is a SPOF. A DDOS attack on "Storj headquarters" could result in a Bridge outage: unable to download/upload files, unable to do audits, unable to take/give payments, etc.
This is what confuses me. How can you tout "distributed" to the world, yet be 100% dependent on this 1 piece of software "controlling" the entire operation?
In the BitTorrent world, you don't technically need trackers. You can use the DHT and PEX to discover peers/seeds of a particular torrent. The trackers can help by giving you an initial seed of information; this is no different from a first-launch of a bitcoin client. But eventually you (the client/end-user) maintain your own local cache of all nodes you've interacted with and when you (re)connect to them, you get a list of their connected nodes too.
I was expecting/thinking Storj to work similar. On first launch of the client-daemon, I would connect to an initial seed node, controlled/owned by Storj corp, to get a list of several hundred nodes all connected to it. I'd connect to all those, and get their peers too. Now I have this local list of thousands(?) of other nodes on the network.
Any "commands" I need to send to the network, such as uploading, downloading, contracts, ping/pong, etc would be sent to my "local" nodes (neighbors) and potentially be re-broadcast to their peers looking for the information requested.
But since we can't trust that each node stays online all the time, to create a storage network, we have to have something that is "guaranteed" to be online all the time. I guess that is where we (the users/clients) must place all the trust with Storj corp.
If I wanted to write my own software, say a backup/dropbox-like client, utilizing Storj as the backend storage, I'd have to also create my own Bridge servers and provide that guarantee to the purchasers of my software.
This does happen. We use a Kademlia DHT, like BitTorrent. Technically torrents use sloppy Kad, but it's not a huge difference.
We're not relying on a single instance of a piece of software to control the network. Anyone can run their own Bridge or Bridges. If you're not comfortable using a Bridge we run, you can run your own. Run 2 or 200. As many as it takes. It's 100% free software. You don't have to place any trust in Storj Labs Inc.'s infrastructure. Take the code and run a node on the distributed network.
Had a chat with a developer. Maybe the disconnect here is that there's a critical difference between Storj and other DHTs/distributed storage systems.
With BitTorrent, you expect to be able to find anything on the network, and know its state. With Storj you expect to only be able to find and know about only your own things.
If you run your own Bridge, you have no interaction with Storj Labs or our Bridges. You have all your file metadata, and you manage your file state, and nobody else's. We have only our file metadata (and our customers') and manage state for those files only.
There's no reliance on a single central node for the whole network. Everyone manages their own stuff in a shared ecosystem.
Does running multiple bridges help with anything other than high-availability? If they all connect to the same Landlord (which is just an RPC interface to mongo and rabbitmq), then they share all information, right?
I would imagine that api.storj.io is actually several bridges behind a load-balancer?