Resumable download #21

dmkl · 2024-01-29T05:41:28Z

Hi Anantha, I have an interesting question. Let's imagine the following case:

We have hundreds of files that we stream into a zip-archive from some cloud locations. We know the size of each file and its CRC upfront. File order within the stream is always the same. Connection interrupts and browser wants to resume the download sending us the range header. So, we know that browser has received the first X bytes of the stream. (Very theoretically) we can calculate all the file/header/descriptor offsets and continue sending bytes to the stream starting from a particular byte of a particular file.

Do you think it's doable? What would it take to implement it?

ananthakumaran · 2024-01-29T13:33:07Z

This is an interesting problem to solve, though it might not be a common use case. I have open PR #19 which allows the user to bring in the CRC and size, but that's the extent of the support today.

It's definitely possible, but I am not sure about the amount of work involved. It's more than a couple of years since my last commit to master, so the code structure is not very fresh in my mind.

A very hacky and bad way to do this would be to write a wrapper stream that skips the first X bytes. This can be done very easily. The downside is that you end up loading all the data again and there might be a long initial delay before the client receives the first byte.

Adding the capability in Zstream will make it possible to avoid a lot of work (skipping all the unnecessary file reads), but it might also require a lot of work. I don't know if the current code structure would work, might have to do a rearchitecture/refactor first. Hard to say without looking at the code and spending more time.

Also, some of the features like compression/encryption won't work as well because they would make the size unpredictable.

dmkl · 2024-01-29T15:48:53Z

Yes, it's not for compression/encryption, but for plain "storage" only. Our customers download big archives (few GBs sometimes) with lots of photos in them. Sometimes they have slow connection and downloading stops. Starting from scratch is very painful for them. We could improve their life a lot if we could support resumable downloads.

I'm ready to contribute to the gem, but I'd need some guidance from you. I have no idea how much refactoring might be needed with the current code structure. Would you have any time for the initial investigation and then for guiding me through the calculations?

ananthakumaran · 2024-01-29T15:59:55Z

Would you have any time for the initial investigation and then for guiding me through the calculations?

I will try maybe on weekends (can't guarantee the timeline). I would suggest you start independently without waiting for me. I will be able to review it once you have something.

dmkl · 2024-02-02T06:47:36Z

Can you please direct me to some article where the zip-file structure describes current implementation? I find few different descriptions. Maybe it's related to the zip-format versioning. Is this one precise?

If I get it right, the cut of X bytes to skip can happen at any byte including local or central headers or data descriptor. It's not that hard to calculate correct offsets I guess, but it's not very clear to me how to start a new stream from the middle of some local file header for example.

I'd be very thankful if you could give me some directions on what to start looking at or just share your thoughts on possible implementation.

ananthakumaran · 2024-02-02T08:43:12Z

You can find the full spec here. The cut could happen anywhere. You likely need to decide on the granularity of skip, because it might not be possible to skip to the exact byte, you might have to generate some blocks and skip the bytes at another layer. As for advice, I have none as of now, I need to spend some time, and think through it to figure out the approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resumable download #21

Resumable download #21

dmkl commented Jan 29, 2024

ananthakumaran commented Jan 29, 2024 •

edited

Loading

dmkl commented Jan 29, 2024

ananthakumaran commented Jan 29, 2024

dmkl commented Feb 2, 2024

ananthakumaran commented Feb 2, 2024

Resumable download #21

Resumable download #21

Comments

dmkl commented Jan 29, 2024

ananthakumaran commented Jan 29, 2024 • edited Loading

dmkl commented Jan 29, 2024

ananthakumaran commented Jan 29, 2024

dmkl commented Feb 2, 2024

ananthakumaran commented Feb 2, 2024

ananthakumaran commented Jan 29, 2024 •

edited

Loading