HttpZipStream

A simple library to extract specific entries from a remote http zip archive without the need to download the entire file.

Understanding the magic

When opening a zip archive using a remote url, the zip library will need to download the entire file to be able to read its contents. So if you had a 90 mega zipfile and wanted only a 100 kbyte file from within it, you will end doing the entire 90 mega download anyway.
The zip format defines a directory pointing to all it's inner entries. Containing properties like names, starting offset, size, and other stuff. And this directory is pretty small, just a few bytes placed on the very end of the archive. So, if we could just read this directory, we could know where, on the entire zip archive, is stored the file we want.
And if we could just request from the remote url, just that part of the content, we could get a smaller download, with just what we want and need.
Turns out that the http protocol supports a technique called byte serving. That states that we could define some header parameters on the http request specifying the byte ranges we want for that request.
With that in mind, what we do it's pretty simple. We make a first http request asking just for the http headers (not its content) and from that we know the content size. Then we make a small range requests at the end of the file, extracting all the directory info. Then, for the entries we want, we make requests for just that ranges. Apply the deflate algoritm and it's done.
With this approach, we end doing more http requests, so its only good to use if the desired content represents a small part of the entire zip archive.
More on this, can be found on my medium article.

Install instructions

You can add the library to your project using the nuget package:

dotnet add package HttpZipStream

Sample of how to use the library

Extracting just the first entry from a remote zip archive:

   var httpUrl = "http://MyRemoteFile.zip"; 
   using (var zipStream = new System.IO.Compression.HttpZipStream(httpUrl)) 
   { 
      var entryList = await zipStream.GetEntriesAsync(); 
      var entry = entryList.FirstOrDefault(); 
      byte[] entryContent = await zipStream.ExtractAsync(entry);
      /* do what you want with the entry content */
   }

Build using

Changelog

v0.1.*

Some minor documentation adjust.
Proper name convention for async methods.
Preparing projects to be build, packed and deploy by the server.

v0.2.*

Implementing a ExtractAsync overload that results just the entry content byte array.
BUG #13: Some entries are not deflate correctly.

v0.3.*

Upgrading dotnet version to 3.1

Authors

Leverson Carlos

License

MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.github/workflows		.github/workflows
srcs		srcs
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HttpZipStream

Understanding the magic

Install instructions

Sample of how to use the library

Build using

Changelog

v0.1.*

v0.2.*

v0.3.*

Authors

License

About

Releases 1

Packages

Languages

License

LeversonCarlos/HttpZipStream

Folders and files

Latest commit

History

Repository files navigation

HttpZipStream

Understanding the magic

Install instructions

Sample of how to use the library

Build using

Changelog

v0.1.*

v0.2.*

v0.3.*

Authors

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages