Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resumable download #21

Open
dmkl opened this issue Jan 29, 2024 · 5 comments
Open

Resumable download #21

dmkl opened this issue Jan 29, 2024 · 5 comments

Comments

@dmkl
Copy link

dmkl commented Jan 29, 2024

Hi Anantha, I have an interesting question. Let's imagine the following case:

We have hundreds of files that we stream into a zip-archive from some cloud locations. We know the size of each file and its CRC upfront. File order within the stream is always the same. Connection interrupts and browser wants to resume the download sending us the range header. So, we know that browser has received the first X bytes of the stream. (Very theoretically) we can calculate all the file/header/descriptor offsets and continue sending bytes to the stream starting from a particular byte of a particular file.

Do you think it's doable? What would it take to implement it?

@ananthakumaran
Copy link
Owner

ananthakumaran commented Jan 29, 2024

This is an interesting problem to solve, though it might not be a common use case. I have open PR #19 which allows the user to bring in the CRC and size, but that's the extent of the support today.

It's definitely possible, but I am not sure about the amount of work involved. It's more than a couple of years since my last commit to master, so the code structure is not very fresh in my mind.

A very hacky and bad way to do this would be to write a wrapper stream that skips the first X bytes. This can be done very easily. The downside is that you end up loading all the data again and there might be a long initial delay before the client receives the first byte.

Adding the capability in Zstream will make it possible to avoid a lot of work (skipping all the unnecessary file reads), but it might also require a lot of work. I don't know if the current code structure would work, might have to do a rearchitecture/refactor first. Hard to say without looking at the code and spending more time.

Also, some of the features like compression/encryption won't work as well because they would make the size unpredictable.

@dmkl
Copy link
Author

dmkl commented Jan 29, 2024

Yes, it's not for compression/encryption, but for plain "storage" only. Our customers download big archives (few GBs sometimes) with lots of photos in them. Sometimes they have slow connection and downloading stops. Starting from scratch is very painful for them. We could improve their life a lot if we could support resumable downloads.

I'm ready to contribute to the gem, but I'd need some guidance from you. I have no idea how much refactoring might be needed with the current code structure. Would you have any time for the initial investigation and then for guiding me through the calculations?

@ananthakumaran
Copy link
Owner

Would you have any time for the initial investigation and then for guiding me through the calculations?

I will try maybe on weekends (can't guarantee the timeline). I would suggest you start independently without waiting for me. I will be able to review it once you have something.

@dmkl
Copy link
Author

dmkl commented Feb 2, 2024

Can you please direct me to some article where the zip-file structure describes current implementation? I find few different descriptions. Maybe it's related to the zip-format versioning. Is this one precise?

1

If I get it right, the cut of X bytes to skip can happen at any byte including local or central headers or data descriptor. It's not that hard to calculate correct offsets I guess, but it's not very clear to me how to start a new stream from the middle of some local file header for example.

I'd be very thankful if you could give me some directions on what to start looking at or just share your thoughts on possible implementation.

@ananthakumaran
Copy link
Owner

You can find the full spec here. The cut could happen anywhere. You likely need to decide on the granularity of skip, because it might not be possible to skip to the exact byte, you might have to generate some blocks and skip the bytes at another layer. As for advice, I have none as of now, I need to spend some time, and think through it to figure out the approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants