Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed video storage #2115

Closed
poperigby opened this issue Sep 7, 2019 · 23 comments
Closed

Distributed video storage #2115

poperigby opened this issue Sep 7, 2019 · 23 comments

Comments

@poperigby
Copy link

It seems video storage is a bit of a problem for instance administrators. They have to store all videos that get uploaded to their instance, and this leads a lot of them to set a cap on how many GBs of video you can upload. YouTube has no such limit, and this gives them a pretty big advantage over PeerTube. Could a peer-to-peer solution solve this?

@joas8211
Copy link

joas8211 commented Sep 7, 2019

I think it's an feature that is required for instance maintainers to admin their server.

@ivan386
Copy link

ivan386 commented Sep 8, 2019

@poperigby Maybe use IPFS for that?

@poperigby
Copy link
Author

@rigelk what is this a duplicate of?

@6543
Copy link

6543 commented Sep 18, 2019

P2P or not, it has to be stored somewhere...

@poperigby
Copy link
Author

poperigby commented Sep 18, 2019

Of course, but that's the point of P2P. It's distributed so the admin doesn't have to store all the videos by themselves.

@6543
Copy link

6543 commented Sep 19, 2019

Nice Idear, but wuldnt focus to mouch until PT has become a competitor to vimeo/youtubte/...

For example I use IPFS a lot but to get big files it can take hours (eaven if you jnow the data is on the next node or minutes -> not relaiable jet)

@poperigby
Copy link
Author

It doesn't have to be IPFS. That's just what @ivan386 suggested. Do you know of any other P2P solutions that would work better?

@6543
Copy link

6543 commented Sep 19, 2019

Then i wuld sugest an feature request for costomize storage bakend with plugis if thats not already posible and then develop one for ... example IPFS,DART,...

@joas8211
Copy link

So every user would act as server for the videos they have watched... Like BitTorrent. BitTorrent actually can already be used as video streaming platform if @poperigby is interested in such a thing. PopcornTime is a great example of what can be achived with BitTorrent.

I'm personally looking PeerTube as a YouTube replacement, so the current server -> client -> client relations works perfectly for high bitrate video distribution.

@poperigby
Copy link
Author

poperigby commented Sep 20, 2019

Something like that. Then small parts of every video on the instance would be stored on the user's computer, alleviating the admins' issue with video storage. I'm not an expert in P2P, but I think something like that would work, right?

@Findus23
Copy link
Contributor

The issues I see are:

  • nearly everyone uses Peertube over browsers, which are not able to randomly store gigabytes of video data on the users disks (at least not in a way that allows them to upload that data again)
  • what happens when the server deletes the file and no one watches the video? How can one make sure that the video isn't accidently deleted from the last node (after all everyone can easily delete their local copy thinking that everyone else still has one)
  • by storing the data on many clients (as not all of them are always online) one needs more disk space (globally) than now, which goes against the issue of needing much space
  • disk space is cheaper and has a better network throughput on servers, but then the current replication system mostly already solves this issue (apart from allowing the original server to delete the file)

@6543
Copy link

6543 commented Sep 20, 2019

@Findus23 ...
... disk space is cheaper and has a better network throughput on servers ...

Thats the reason I would always prefer the server side solution for my instance(s).

But if someone like to use another storage - as log as it is optional i see no issue ;)

@swedneck
Copy link

swedneck commented Sep 26, 2019

@Findus23

  • Viewers don't need to store the entire video in their browser, and they don't need to store any data permanently either.
  • It's not like you can guarantee this now. At least with IPFS a video can be re-uploaded in the future and as long as it's identical all references to it will start working again, thanks to the content-addressed nature of IPFS.
  • Once again there's no need for viewers to store data permanently, they can just seed whatever data they have cached currently. However people who want to make sure that a video isn't lost can choose to mirror it.

Also, one major benefit of IPFS is that it acts as a CDN, which means that people will fetch the data from the closest (and thus fastest) peer that has it. Say you're watching a video and someone else in your city starts watching it at the same time, you can give them the data which means they don't even need to connect outside of that local network.

@rigelk
Copy link
Collaborator

rigelk commented Sep 26, 2019

@swedneck right now having a P2P storage adapter only means that the storage of the server is put on some P2P network. It doesn't automatically mean this will act as a CDN, which would require the client to be able to join that network via yet another client library.

@swedneck
Copy link

@rigelk yes that would require running a node in the viewer's browser, which isn't much different from running webrtc.

@rigelk
Copy link
Collaborator

rigelk commented Sep 27, 2019

@rigelk yes that would require running a node in the viewer's browser, which isn't much different from running webrtc.

no, that is very different: WebRTC is most likely already supported by your browser. There is no shim, no polyfill needed to bring that functionality to the client.

What I guess you refer to is the applicative layer on top of that. That would be WebTorrent.

@Chocobozzz Chocobozzz changed the title P2P video storage? Distributed video storage Oct 18, 2019
@grimmthetallest
Copy link

First we should probably clarify where the video is being distributed, on the Peertube network or on a separate network? As it's already possible to use a storage backend like Sia or IPFS for large scale video storage serverside we can assume this is an effort to store content on viewer devices. Unfortunately this runs into the issue of Webtorrent storing data in volatile memory and there's not a good way to write out to persistent disk as you'd need to store not just the chunks of the files as they were downloaded but also persist the full metadata needed to reassemble those chunks. You would also need to validate that hosts had the files/chunks they were advertising due to intentional or unintentional data manipulation (corruption or malicious alteration of data or attempts to poison the network). By the time you've solved those issues you've just reinvented most of the Bittorrent protocol...

Likely a better solution would be to make it easier for clients to share copies of files they have already downloaded via the existing Webtorrent backend. Since Webtorrent supports HTTP seeding standards for the Bittorrent protocol and webseeding is used to "kickstart" the swarm for each video from the initial instance, it should be possible to adopt a model where peers not wanting to run a full Peertube instance or manage individual torrents in a desktop application (webtorrent-desktop or Vuze) could still contribute their copies to the swarm from long term, persistent storage. Consider it a "thin client" for some of the content/functionality of the "full" Peertube instance/client application.

It would be ideal if there was a way for the CLIENT to advertise webseed (on a server supporting byte-range requests as per BEP-19) where one or many videos are available at the same HTTP root (i.e http to mysite/videos/[filename.vid]). When the server receives a request to serve a video from the Bittorrent/Webtorrent swarm in the background, it could include with its own webseed a list of HTTP endpoints as available seeders. In this way a "Master" Peertube server could have "Thin" Peertube clients/hosts for some content. There may be many peers who are willing to administer an HTTP root where they save finished videos to who would not want to load a torrent client and then load and manage torrents in the client. This would also give clustering and caching opportunities to admins as the files can be mirrored to multiple webservers or even simple file storage that is exposing itself over S3. Adding multiple webseed directories to each connection then becomes like a list of trackers for files in a traditional, non-Webtorrent based torrent client.

Lastly with support for external webseed sources a server could be configured to store original quality content elsewhere and store lower bandwidth transcodes on its own storage to be pruned after a certain period of inactivity. There are modes for ffmpeg that allow for deterministic transcoding so a server with surplus CPU but limited storage could recreate intermediate quality options from a remote source in such a way that torrent hashes remain reusable. This would greatly alleviate the storage demands of running a Peertube instance as fewer transcoded copies would be needed for each uploaded video.

@shleeable
Copy link

I'd love to see support for multiple S3.. I could easily imagine the transcoding happening on the local then upload to N+1 number of S3ish places....

eg: I'd love to upload to DO and Wabasi (AWS S3 and DO and Wusabi and X number of other S3-compatibile services).

@ghost
Copy link

ghost commented Nov 6, 2019

For cheap and huge storage, things like Siacoin, sponsored storage and selfhosted (home) storage is probably the most realistic. I don't see how P2P in PeerTube would solve the issue of limited storage.

Not only will storage be a problem, but also distribution. Eventually, a single host won't be able to deal with the bandwidth required, so the whole federation is dependent on instances support each other with redundancy.

@gustavklopp
Copy link

gustavklopp commented Dec 28, 2019

Thanks @Findus23 it's indeed the same discussion, so I continue here.

The main issue is that browsers can't arbitrarily store multiple gigabytes of data and distribute them further just because a website (in this case peertube) tells them to do. That would be a huge security issue and would upset a ton of users (who are on metered internet connections). One could argue that instead of a browser these users could use special clients, but at that point one could argue that these special clients should be put on computers with fast upload speeds and one has recreated the existing peertube redundancy feature.

Yes, a client, not a simple code running inside the browser, certainly.
I don't think it's a big change from the BitTorrent network that is actually running. And it's running very well. So it's the proof that there are a lot of people with fast enough internet speed and who store a lot Gb (even Tb) of data on their HD just for torrenting.

But I get the difference of incentive: torrenting the last Marvel movie (even if it's 3Gb) and watching the last Cute cats video is different. People don't expect to store these videos: the whole process should be almost invisible: Thanks to a a capped space on his HD which is dynamically erased, the load for each peer is adapted to his bandwidth/space available. But still, there should be an advantage for dedicating part of your storage with, for example, a bonus of download speed for the more space you offer.

Splitting up like this increases the chance of failure instead of decreasing it, because now only one person going offline (or deleting their cache as they need the disk space) will result in the video not being complete any more.

No, I meant: In addition to the original server which keeps the full video. So it's an added redundancy in case the original server gets down.

@gustavklopp
Copy link

gustavklopp commented Dec 28, 2019

You forgot that there must be a torrent tracker. If the server hosting the video goes down, the tracker too.

It can be trackless torrents.

And I'm not sure to understand your point. When you say «The level of centralization is increased by the need to create yourself a server and then host a PeerTube instance». Not at all. You don't have to create yourself a server.

Somebody had to create a server at first, if you don't want to use the one provided by YouTube, Bitchute etc... And since it's not easy/cheap, this limits the number of servers in total.
The idea here is that as soon as a user creates a video, parts of it propagate on each viewer so that even if the original disappears, we can reconstruct the whole video if we want.

@KeithCu
Copy link

KeithCu commented Mar 27, 2020

I'd also like to be able to run my Peertube using an object-store type backend, which is cheaper than block store. For example, Linode offers 1 TB of object store for $20 / month, whereas block storage for the same amount of space is $100 / month. In addition the object storage gets replicated to multiple datacenters for extra redundancy and lower latency whereas a volume is only accessible from one machine and needs to be backed up.

There is a library called LibCloud which works with multiple storage backends:
https://libcloud.readthedocs.io/en/stable/storage/supported_providers.html

Maybe this could be used or borrowed from.

@Chocobozzz
Copy link
Owner

We already have an issue for IPFS (#494), and another one for external storage (#2232, #147) so I think we can close this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests