Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream-based zip encoder and decoder #880

Merged
merged 3 commits into from Dec 19, 2023
Merged

Stream-based zip encoder and decoder #880

merged 3 commits into from Dec 19, 2023

Conversation

adamziel
Copy link
Collaborator

@adamziel adamziel commented Dec 18, 2023

Description

Part of #851.
Depends on #875.

Implements stream-based ZIP encodes and decoder using the CompressionStream and DecompressionStream class.

Here's what we get:

  • Native ZIP support without having to rely on PHP's ZipArchive
  • Download and unzip WordPress plugins at the same time. Before this PR we had to download the entire bundle, pass it to PHP, run PHP code, and only then the file would be unzipped.
  • Partial download of large zip files.

To that last point:

ZIP as a remote, virtual filesystem

This change enables fast previewing of even 10GB-large zipped exports via partial downloads.

Imagine previewing a large site export with many photos and videos. The decodeRemoteZip function knows how to request just the list of files first, filter out the large ones, and then issue multiple fetch() requests to download the rest.

Effectively, we would only download ~5MB - 10MB of data for the initial preview, and then only download these larger assets once they're needed.

Technical details

Here's a few interesting functions shipped by this PR. Note the links point to a specific commit and may get outdated:

  • nextZipEntry() that decodes a zipped file
  • decodeRemoteZip() lists ZIP files in a remote archive, filters them, and then downloads just the subset of bytes we need to get those files
  • encodeZip() turns a stream of File objects into a zip archive (as stream of bytes)

Remaining work

There's a few more things to do here, but I still wanted to get some reviews in before spending the time on these just in case the API would substantially change:

  • Add unit tests.
  • Solve conflicts
  • Get the CI checks to pass.

API changes

Breaking changes

This PR isn't a breaking change yet. One of the follow-up PRs will very likely propose some breaking changes, but this one only extends the available API.

Without this PR

Without this PR, unzipping a file requires writing it to Playground, calling PHP's unzip, and removing the temporary zip file:

const response = await fetch(remoteUrl);
// Download the entire byte array first
const bytes = new Uint8Array(await response.arrayBuffer());
// Copy those bytes into Playground memory
await writeFile(playground, {
	path: tmpZipPath,
	data: zipFile,
});
// Run PHP code and use `ZipArray` via unzip()
await unzip(playground, {
	zipPath: tmpZipPath,
	extractToPath: targetPath,
});
// Only now is the ZIP file extracted.
// We still need to clean up the temporary file:
await playground.unlink(tmpZipPath);

With this PR

With this PR, unzipping can be done as follows:

const response = await fetch(remoteUrl);
// We can now unzip as we stream response bytes
decodeZip( response.body )
	// We also write the stream of unzipped files to PHP as new entries become available
	.pipeTo( streamWriteToPhp( playground, targetPath ) )

More examples

Here's what else the streaming API unlocks. Not all of these functions are shipped here, but they are quite easy to implement:

// In the browser, fetch a zip file:
(await fetch(url))
	.body
	.pipeThrough(decodeZip())
	.pipeTo(streamWriteToPhp(php, pluginsDirectory))

// In the browser, install from a VFS directory:
iteratorToStream(iteratePhpFiles(path))
	.pipeTo(streamWriteToPhp(php, pluginsDirectory))

// In the browser, install from a .zip inside VFS:
streamReadPhpFile(php, path)
	.pipeThrough(decodeZip())
	.pipeTo(streamWriteToPhp(php, pluginsDirectory))

// Funny way to do a recursive copy
iteratorToStream(iteratePhpFiles(php, fromPath))
	.pipeTo(streamWriteToPhp(php, toPath))

// Process a doubly zipped artifact from GitHub CI
(await fetch(artifactUrl))
	.body
	.pipeThrough(decodeZip())
	.pipeThrough(readBody())
	.pipeThrough(decodeZip())
	.pipeTo(streamWriteToPhp(php, pluginsDirectory))

// Export Playground files as zip
iteratorToStream(iteratePhpFiles(php, fromPath))
	.pipeThrough(encodeZip())
	.pipeThrough(concatBytes())
	.pipeTo(downloadFile('playground.zip'))

// Export Playground files to OPFS
iteratorToStream(iteratePhpFiles(php, fromPath))
	.pipeTo(streamWriteToOpfs('/playground'))

// Compute changeset to export to GitHub
changeset(
	iterateGithubFiles(org, repo, branch, path),
	iteratePhpFiles(php, fromPath)
);

// Read a subdirectory from a GitHub repo
decodeRemoteZip(
	zipballUrl,
	({ path }) => path.startsWith("themes/adventurer")
)
	.pipeThrough(enterDirectory('themes/adventurer'))
	.pipeTo(streamWriteToPhp(php, joinPath(themesPath, 'adventurer')))

// Write a single file from the zip into a path in PHP
decodeRemoteZip(
	artifactUrl,
	({ path }) => path.startsWith("path/to/README.md")
)
	.pipeTo(streamWriteToPhp(php, '/wordpress'))

// In node.js, install a plugin from a disk directory
iteratorToStream(iteratePhpFiles(php, path))
	.pipeTo(streamWriteToPhp(php, pluginsDir))
;

cc @dmsnell

@adamziel adamziel merged commit c363cfb into trunk Dec 19, 2023
5 checks passed
@adamziel adamziel deleted the compression-streams branch December 19, 2023 08:12
@eliot-akira
Copy link
Collaborator

eliot-akira commented Dec 19, 2023

Impressive! I love that it's published as its own package @wp-playground/stream-compression. I imagine it can be useful by itself, a streaming zip de/compressor using browser-native API with no dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants