New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Section 3 Disagreement #30

Closed
utdrmac opened this Issue Dec 29, 2016 · 8 comments

Comments

Projects
None yet
3 participants
@utdrmac

utdrmac commented Dec 29, 2016

I disagree with the intro paragraphs of section 3.

Many of these functions require high uptime and significant infrastructure, especially for an active set of files. User run applications, like a file syncing application, cannot be expected to efficiently manage files on the network.

This is no different than running Dropbox or other cloud-syncing tool. Let's take Transmission (a popular bittorrent client) as an example. I run this on my linux NAS. What actually runs is transmission-daemon, a background process that runs constantly, persisting across crashes/reboots/etc. Even if I am not downloading/seeding a torrent, this is running and my node is actively participating in DHT/PEX.

When I want to share/download a torrent file, I use transmission-client to make RPC calls to the daemon and provide the file or magnet URL. If I want to stop a download, or delete a torrent out of my offering, I use the -client.

I see storj (from client/user perspective, not farmer) no differently. Even on Mac/Windows, a process (actually, probably a couple threads) runs/lives in the background, joining with other nodes, doing PING/PONGs, participating in DHT, etc.

When I want to store a file, I interact with the GUI and the app slices up the file, encrypts it, stores audit/challenge information locally, sends out contract requests, and uploads to farmers. It would be an app preference as to how many mirror copies the user wants living on the network and would be the responsibility of the app to ensure this.

Since the app is intended to always be running (like DB/ACD/etc), it would also be the apps responsibility to do hourly/daily/periodic audits of all files.

As it's written, the paper seems to lean towards the bridge being a required piece to this puzzle. Yet, this creates a dependency and a single-point-of-failure and defeats the whole purpose behind a distributed/shared-nothing storage platform.

@utdrmac

This comment has been minimized.

utdrmac commented Dec 29, 2016

I may have jumped the gun. I just read 3.1 and this indicates that me, an end user, can run my own bridge "daemon" and my client communicates with it. This seems to fit into the transmission model I described above. I guess I was under this impression that only "Storj, Inc" would ever run bridge nodes. Am I reading that correct? I can run my own storj-bridge, which connects to the network like any other node?

@utdrmac

This comment has been minimized.

utdrmac commented Dec 29, 2016

Bummer. It just got more complicated.

Note: Storj Bridge cannot communicate with the network on it's own, but instead must communicate with a running Storj Complex instance.

@prestwich

This comment has been minimized.

Contributor

prestwich commented Dec 29, 2016

Hey thanks for your comments. I'll try to provide some explanation here, and get helpful revisions into the next version.

This is no different than running Dropbox or other cloud-syncing tool.

It's a bit different in that with Dropbox you trust the infrastructure and the risks are low. One of the design goals is resiliency to unreliable farmers, which is not a goal of Dropbox.

We've considered the daemon model for a Bridge client, and may do that in the future. If you were also running a local Bridge it'd be a complete system.

Since the app is intended to always be running (like DB/ACD/etc), it would also be the apps responsibility to do hourly/daily/periodic audits of all files.

Dropbox is still designed to allow the computer to turn off, while being offline for an extended period could have very bad results for a hypothetical Storjbox that incorporates its own audits and file state management.

run my own bridge "daemon" and my client communicates with it.
I guess I was under this impression that only "Storj, Inc" would ever run bridge nodes.

More like you can run your own Bridge server. Code is all AGPL licensed, go wild. 👍

It's a little difficult right now, but the tooling is getting better. See the following repos:
https://github.com/storj/complex
https://github.com/storj/bridge
https://github.com/storj/storage-service-models

Note: Storj Bridge cannot communicate with the network on it's own, but instead must communicate with a running Storj Complex instance.

This is a confusion of terminology. The Bridge as described in the whitepaper is a system composed of several parts including Bridge API, Complex, Audits, a shared Mongo document store. The Bridge API cannot function without Complex. Should clarify that.

Also, I really like the idea of a self-managing file storage app. Sometime I'd like to figure out a good way to solve the problems for this. Maybe increased redundancy would be good enough?

@utdrmac

This comment has been minimized.

utdrmac commented Jan 2, 2017

In thinking about this, overall, at some point, the end-user has to place their trust somewhere. My issue is, and has been, the notion that Storj is supposed to be "distributed" and "fault-tolerant" where any node could go offline at anytime and your data still safe.

Yet, the requirement of having Bridge is not distributed and is a SPOF. A DDOS attack on "Storj headquarters" could result in a Bridge outage: unable to download/upload files, unable to do audits, unable to take/give payments, etc.

This is what confuses me. How can you tout "distributed" to the world, yet be 100% dependent on this 1 piece of software "controlling" the entire operation?

In the BitTorrent world, you don't technically need trackers. You can use the DHT and PEX to discover peers/seeds of a particular torrent. The trackers can help by giving you an initial seed of information; this is no different from a first-launch of a bitcoin client. But eventually you (the client/end-user) maintain your own local cache of all nodes you've interacted with and when you (re)connect to them, you get a list of their connected nodes too.

I was expecting/thinking Storj to work similar. On first launch of the client-daemon, I would connect to an initial seed node, controlled/owned by Storj corp, to get a list of several hundred nodes all connected to it. I'd connect to all those, and get their peers too. Now I have this local list of thousands(?) of other nodes on the network.

Any "commands" I need to send to the network, such as uploading, downloading, contracts, ping/pong, etc would be sent to my "local" nodes (neighbors) and potentially be re-broadcast to their peers looking for the information requested.

But since we can't trust that each node stays online all the time, to create a storage network, we have to have something that is "guaranteed" to be online all the time. I guess that is where we (the users/clients) must place all the trust with Storj corp.

If I wanted to write my own software, say a backup/dropbox-like client, utilizing Storj as the backend storage, I'd have to also create my own Bridge servers and provide that guarantee to the purchasers of my software.

@prestwich

This comment has been minimized.

Contributor

prestwich commented Jan 5, 2017

On first launch of the client-daemon, I would connect to an initial seed node, controlled/owned by Storj corp, to get a list of several hundred nodes all connected to it. I'd connect to all those, and get their peers too. Now I have this local list of thousands(?) of other nodes on the network.

This does happen. We use a Kademlia DHT, like BitTorrent. Technically torrents use sloppy Kad, but it's not a huge difference.

This is what confuses me. How can you tout "distributed" to the world, yet be 100% dependent on this 1 piece of software "controlling" the entire operation?

We're not relying on a single instance of a piece of software to control the network. Anyone can run their own Bridge or Bridges. If you're not comfortable using a Bridge we run, you can run your own. Run 2 or 200. As many as it takes. It's 100% free software. You don't have to place any trust in Storj Labs Inc.'s infrastructure. Take the code and run a node on the distributed network.

@prestwich

This comment has been minimized.

Contributor

prestwich commented Jan 5, 2017

Had a chat with a developer. Maybe the disconnect here is that there's a critical difference between Storj and other DHTs/distributed storage systems.

With BitTorrent, you expect to be able to find anything on the network, and know its state. With Storj you expect to only be able to find and know about only your own things.

If you run your own Bridge, you have no interaction with Storj Labs or our Bridges. You have all your file metadata, and you manage your file state, and nobody else's. We have only our file metadata (and our customers') and manage state for those files only.

There's no reliance on a single central node for the whole network. Everyone manages their own stuff in a shared ecosystem.

@utdrmac

This comment has been minimized.

utdrmac commented Jan 5, 2017

Does running multiple bridges help with anything other than high-availability? If they all connect to the same Landlord (which is just an RPC interface to mongo and rabbitmq), then they share all information, right?

I would imagine that api.storj.io is actually several bridges behind a load-balancer?

@RichardLitt

This comment has been minimized.

Member

RichardLitt commented Oct 30, 2018

This work is being continued in our forthcoming whitepaper. Follow along at storj/storj.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment