Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upReduce file padding #1921
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nitronick600
Jun 12, 2017
I just found the Trello board and it appears that's being scheduled soon. Thanks!
nitronick600
commented
Jun 12, 2017
|
I just found the Trello board and it appears that's being scheduled soon. Thanks! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lukechampine
Jun 12, 2017
Member
In case you're curious, here's the technical breakdown:
Sia operates on 4MB "sectors;" the minimum you can upload to a host is 4MB. The 40MB padding comes from the fact that we upload redundantly across many hosts. So even if you're uploading a 100KB file, it will be padded to 4MB and the padded version will be uploaded to many hosts. This also affects downloading: when you download the 100KB file, you have to download the full 4MB sector.
There's a few ways of fixing this. One is to "pack" files together during upload. For example, if you're uploading a whole folder of photos, they could all be packed into a single 4MB sector before being sent off to the host. The obvious downside of this approach is that you need to have all the files grouped for upload in advance, but it has the advantage of being possible today without any modifications to the host or the upload protocol.
Another approach is to allow the uploader to modify the 4MB sector by sending additional data. Then you could store multiple files in the same sector via a series of modifications. This is actually specified in our protocol, but it isn't currently used. The downside of this approach is that it's taxing for the host; since the Merkle root of the sector changes, they may have to shuffle things around in their database. If this is coded poorly, it could be a DoS vector. @DavidVorick can speak to this aspect better than I.
A final consideration here is that storing "partial sector" files will require adding checksums to the download code. Currently, since we always download a full sector, the Merkle root of the sector can double as a checksum. But if we start downloading less than one sector, we need another way to verify the integrity of the data. (Note that we can't simply use a single checksum for the entire file, because if the checksum failed, we wouldn't know which host was at fault.) So we need to store the checksums in the new .sia file format.
Bonus consideration: downloading partial sectors leaks metadata to the host about what you're storing. For example, there aren't many files with exactly 1,193,254,020 bytes in them. So for better privacy, you'd want to download a little more than you need, and strip off the extra afterward.
|
In case you're curious, here's the technical breakdown: Sia operates on 4MB "sectors;" the minimum you can upload to a host is 4MB. The 40MB padding comes from the fact that we upload redundantly across many hosts. So even if you're uploading a 100KB file, it will be padded to 4MB and the padded version will be uploaded to many hosts. This also affects downloading: when you download the 100KB file, you have to download the full 4MB sector. There's a few ways of fixing this. One is to "pack" files together during upload. For example, if you're uploading a whole folder of photos, they could all be packed into a single 4MB sector before being sent off to the host. The obvious downside of this approach is that you need to have all the files grouped for upload in advance, but it has the advantage of being possible today without any modifications to the host or the upload protocol. Another approach is to allow the uploader to modify the 4MB sector by sending additional data. Then you could store multiple files in the same sector via a series of modifications. This is actually specified in our protocol, but it isn't currently used. The downside of this approach is that it's taxing for the host; since the Merkle root of the sector changes, they may have to shuffle things around in their database. If this is coded poorly, it could be a DoS vector. @DavidVorick can speak to this aspect better than I. A final consideration here is that storing "partial sector" files will require adding checksums to the download code. Currently, since we always download a full sector, the Merkle root of the sector can double as a checksum. But if we start downloading less than one sector, we need another way to verify the integrity of the data. (Note that we can't simply use a single checksum for the entire file, because if the checksum failed, we wouldn't know which host was at fault.) So we need to store the checksums in the new .sia file format. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
lukechampine
Jul 25, 2017
Member
A final consideration here is that storing "partial sector" files will require adding checksums to the download code. Currently, since we always download a full sector, the Merkle root of the sector can double as a checksum. But if we start downloading less than one sector, we need another way to verify the integrity of the data. (Note that we can't simply use a single checksum for the entire file, because if the checksum failed, we wouldn't know which host was at fault.) So we need to store the checksums in the new .sia file format.
This is actually incorrect: we don't need to store extra checksums. We're using AEAD to encrypt the sector data, so we get authentication "for free" -- the checksum of the data is stored on the host, prepended to the data itself.
This is actually incorrect: we don't need to store extra checksums. We're using AEAD to encrypt the sector data, so we get authentication "for free" -- the checksum of the data is stored on the host, prepended to the data itself. |
lukechampine
added this to Under Consideration
in Renter overhaul
Jul 28, 2017
lukechampine
referenced this issue
Jul 28, 2017
Closed
When you have lots of files, it tries to upload ALL of them at once #2053
lukechampine
added
the
Feature Request
label
Oct 16, 2017
lukechampine
moved this from Under Consideration
to Planned
in Renter overhaul
Oct 16, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
calchulus
commented
Jul 30, 2018
|
What's the current status on this? |
nitronick600 commentedJun 12, 2017
I don't know the history of why SIA had to implement 40Mb padding, but reducing it would open up a lot more possibilities for application developers.