Skip to content

Commit

Permalink
Stream-based zip encoder and decoder (#880)
Browse files Browse the repository at this point in the history
Implements stream-based ZIP encodes and decoder using the
[CompressionStream](https://developer.mozilla.org/en-US/docs/Web/API/CompressionStream)
and
[DecompressionStream](https://developer.mozilla.org/en-US/docs/Web/API/DecompressionStream)
class.

Here's what we get:

* Native ZIP support without having to rely on PHP's `ZipArchive`
* Download and unzip WordPress plugins at the same time. Before this PR
we had to download the entire bundle, pass it to PHP, run PHP code, and
only then the file would be unzipped.
* Partial download of large zip files.

To that last point:

### ZIP as a remote, virtual filesystem

This change enables fast previewing of even 10GB-large zipped exports
via partial downloads.

Imagine previewing a large site export with many photos and videos. The
`decodeRemoteZip` function knows how to request just the list of files
first, filter out the large ones, and then issue multiple fetch()
requests to download the rest.

Effectively, we would only download ~5MB - 10MB of data for the initial
preview, and then only download these larger assets once they're needed.

## Technical details

Here's a few interesting functions shipped by this PR. _Note the links
point to a specific commit and may get outdated_:

*
[nextZipEntry()](https://github.com/WordPress/wordpress-playground/blob/726fc68309b0dcffc40402d7e0e7e68ed5fee01a/packages/playground/stream-compression/src/zip/parse-stream.ts#L88-L94)
that decodes a zipped file
*
[decodeRemoteZip()](https://github.com/WordPress/wordpress-playground/blob/726fc68309b0dcffc40402d7e0e7e68ed5fee01a/packages/playground/stream-compression/src/zip/parse-remote.ts#L86-L92)
lists ZIP files in a remote archive, filters them, and then downloads
just the subset of bytes we need to get those files
*
[encodeZip()](https://github.com/WordPress/wordpress-playground/blob/726fc68309b0dcffc40402d7e0e7e68ed5fee01a/packages/playground/stream-compression/src/zip/compress.ts#L34)
turns a stream of File objects into a zip archive (as stream of bytes)

## Remaining work

There's a few more things to do here, but I still wanted to get some
reviews in before spending the time on these just in case the API would
substantially change:

- [x] Add unit tests.
- [x] Solve conflicts
- [x] Get the CI checks to pass.

## API changes

### Breaking changes

This PR isn't a breaking change **yet**. One of the follow-up PRs will
very likely propose some breaking changes, but this one only extends the
available API.

### Without this PR

Without this PR, unzipping a file requires writing it to Playground,
calling PHP's `unzip`, and removing the temporary `zip` file:

```ts
const response = await fetch(remoteUrl);
// Download the entire byte array first
const bytes = new Uint8Array(await response.arrayBuffer());
// Copy those bytes into Playground memory
await writeFile(playground, {
	path: tmpZipPath,
	data: zipFile,
});
// Run PHP code and use `ZipArray` via unzip()
await unzip(playground, {
	zipPath: tmpZipPath,
	extractToPath: targetPath,
});
// Only now is the ZIP file extracted.
// We still need to clean up the temporary file:
await playground.unlink(tmpZipPath);
```

### With this PR

With this PR, unzipping can be done as follows:

```ts
const response = await fetch(remoteUrl);
// We can now unzip as we stream response bytes
decodeZip( response.body )
	// We also write the stream of unzipped files to PHP as new entries become available
	.pipeTo( streamWriteToPhp( playground, targetPath ) )
```

### More examples

Here's what else the streaming API unlocks. Not all of these functions
are shipped here, but they are quite easy to implement:

```ts
// In the browser, fetch a zip file:
(await fetch(url))
	.body
	.pipeThrough(decodeZip())
	.pipeTo(streamWriteToPhp(php, pluginsDirectory))

// In the browser, install from a VFS directory:
iteratorToStream(iteratePhpFiles(path))
	.pipeTo(streamWriteToPhp(php, pluginsDirectory))

// In the browser, install from a .zip inside VFS:
streamReadPhpFile(php, path)
	.pipeThrough(decodeZip())
	.pipeTo(streamWriteToPhp(php, pluginsDirectory))

// Funny way to do a recursive copy
iteratorToStream(iteratePhpFiles(php, fromPath))
	.pipeTo(streamWriteToPhp(php, toPath))

// Process a doubly zipped artifact from GitHub CI
(await fetch(artifactUrl))
	.body
	.pipeThrough(decodeZip())
	.pipeThrough(readBody())
	.pipeThrough(decodeZip())
	.pipeTo(streamWriteToPhp(php, pluginsDirectory))

// Export Playground files as zip
iteratorToStream(iteratePhpFiles(php, fromPath))
	.pipeThrough(encodeZip())
	.pipeThrough(concatBytes())
	.pipeTo(downloadFile('playground.zip'))

// Export Playground files to OPFS
iteratorToStream(iteratePhpFiles(php, fromPath))
	.pipeTo(streamWriteToOpfs('/playground'))

// Compute changeset to export to GitHub
changeset(
	iterateGithubFiles(org, repo, branch, path),
	iteratePhpFiles(php, fromPath)
);

// Read a subdirectory from a GitHub repo
decodeRemoteZip(
	zipballUrl,
	({ path }) => path.startsWith("themes/adventurer")
)
	.pipeThrough(enterDirectory('themes/adventurer'))
	.pipeTo(streamWriteToPhp(php, joinPath(themesPath, 'adventurer')))

// Write a single file from the zip into a path in PHP
decodeRemoteZip(
	artifactUrl,
	({ path }) => path.startsWith("path/to/README.md")
)
	.pipeTo(streamWriteToPhp(php, '/wordpress'))

// In node.js, install a plugin from a disk directory
iteratorToStream(iteratePhpFiles(php, path))
	.pipeTo(streamWriteToPhp(php, pluginsDir))
;
```
  • Loading branch information
adamziel committed Dec 19, 2023
1 parent db306f7 commit c363cfb
Show file tree
Hide file tree
Showing 38 changed files with 1,894 additions and 0 deletions.
18 changes: 18 additions & 0 deletions packages/playground/stream-compression/.eslintrc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"extends": ["../../../.eslintrc.json"],
"ignorePatterns": ["!**/*"],
"overrides": [
{
"files": ["*.ts", "*.tsx", "*.js", "*.jsx"],
"rules": {}
},
{
"files": ["*.ts", "*.tsx"],
"rules": {}
},
{
"files": ["*.js", "*.jsx"],
"rules": {}
}
]
}
11 changes: 11 additions & 0 deletions packages/playground/stream-compression/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# playground-stream-compression

This library was generated with [Nx](https://nx.dev).

## Building

Run `nx build playground-stream-compression` to build the library.

## Running unit tests

Run `nx test playground-stream-compression` to execute the unit tests via [Jest](https://jestjs.io).
32 changes: 32 additions & 0 deletions packages/playground/stream-compression/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"name": "@wp-playground/stream-compression",
"version": "0.0.1",
"description": "Stream-based compression bindings.",
"repository": {
"type": "git",
"url": "https://github.com/WordPress/wordpress-playground"
},
"homepage": "https://developer.wordpress.org/playground",
"author": "The WordPress contributors",
"contributors": [
{
"name": "Adam Zielinski",
"email": "adam@adamziel.com",
"url": "https://github.com/adamziel"
}
],
"exports": {
".": {
"import": "./index.js",
"require": "./index.cjs"
},
"./package.json": "./package.json"
},
"publishConfig": {
"access": "public",
"directory": "../../../dist/packages/playground/stream-compression"
},
"license": "GPL-2.0-or-later",
"type": "module",
"private": true
}
34 changes: 34 additions & 0 deletions packages/playground/stream-compression/project.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"name": "playground-stream-compression",
"$schema": "../../../node_modules/nx/schemas/project-schema.json",
"sourceRoot": "packages/playground/stream-compression/src",
"projectType": "library",
"targets": {
"build": {
"executor": "@nx/vite:build",
"outputs": ["{options.outputPath}"],
"options": {
"outputPath": "dist/packages/playground/stream-compression"
}
},
"test": {
"executor": "@nx/vite:test",
"outputs": ["{options.reportsDirectory}"],
"options": {
"passWithNoTests": true,
"reportsDirectory": "../../../coverage/packages/playground/stream-compression"
}
},
"lint": {
"executor": "@nx/linter:eslint",
"outputs": ["{options.outputFile}"],
"options": {
"lintFilePatterns": [
"packages/playground/stream-compression/**/*.ts",
"packages/playground/stream-compression/package.json"
]
}
}
},
"tags": []
}
7 changes: 7 additions & 0 deletions packages/playground/stream-compression/src/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import '@php-wasm/node-polyfills';

export { collectBytes } from './utils/collect-bytes';
export { collectFile } from './utils/collect-file';
export { iteratorToStream } from './utils/iterator-to-stream';
export { streamWriteToPhp } from './utils/stream-write-to-php';
export { encodeZip, decodeZip, decodeRemoteZip } from './zip';
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import { appendBytes } from '../utils/append-bytes';

describe('appendBytes', () => {
it('Should append the specified number of bytes', async () => {
const stream = new ReadableStream<Uint8Array>({
type: 'bytes',
start(controller) {
controller.enqueue(new Uint8Array([1, 2, 3]));
controller.close();
},
}).pipeThrough(appendBytes(new Uint8Array([4, 5])));

const reader = stream.getReader();
const result1 = await reader.read();
expect(result1.value).toEqual(new Uint8Array([1, 2, 3]));
expect(result1.done).toBe(false);

const result2 = await reader.read();
expect(result2.value).toEqual(new Uint8Array([4, 5]));
expect(result2.done).toBe(false);

const result3 = await reader.read();
expect(result3.done).toBe(true);
});
});
22 changes: 22 additions & 0 deletions packages/playground/stream-compression/src/test/decode-zip.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import { decodeZip } from '../zip/decode-zip';
import { readFile } from 'fs/promises';

describe('decodeZip', () => {
it('Should uncompress compress files', async () => {
const zipBytes = await readFile(
__dirname + '/fixtures/hello-dolly.zip'
);
const zipStream = decodeZip(new Blob([zipBytes]).stream());

const files = [];
for await (const file of zipStream) {
files.push(file);
}
expect(files.length).toBe(3);
expect(files[0].name).toBe('hello-dolly/');
expect(files[1].name).toBe('hello-dolly/hello.php');
expect(files[1].size).toBe(2593);
expect(files[2].name).toBe('hello-dolly/readme.txt');
expect(files[2].size).toBe(624);
});
});
47 changes: 47 additions & 0 deletions packages/playground/stream-compression/src/test/encode-zip.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import { collectBytes } from '../utils/collect-bytes';
import { encodeZip } from '../zip/encode-zip';
import { decodeZip } from '../zip/decode-zip';

describe('compressFiles', () => {
it('Should compress files into a zip archive', async () => {
const files: File[] = [
new File(
[new Uint8Array([1, 2, 3, 4, 5])],
'wp-content/plugins/hello.php'
),
new File(
[new Uint8Array([1, 2, 3, 4, 5])],
'wp-content/plugins/hello/hello.php'
),
new File(
[new Uint8Array([1, 2, 3, 4, 5])],
'wp-content/plugins/hello/hello2.php'
),
new File(
[new Uint8Array([1, 2, 3, 4, 5])],
'wp-content/plugins/hello/hello3.php'
),
];

const zipBytes = await collectBytes(
encodeZip(files[Symbol.iterator]())
);
const zipStream = decodeZip(new Blob([zipBytes!]).stream());

const reader = zipStream.getReader();
let i = 0;
for (i = 0; i < files.length; i++) {
const { value: receivedFile, done } = await reader.read();
const receivedBytes = new Uint8Array(
await receivedFile!.arrayBuffer()
);
const expectedBytes = new Uint8Array(await files[i].arrayBuffer());
expect(receivedBytes).toEqual(expectedBytes);
expect(done).toBe(false);
}
expect(i).toBe(files.length);

const { done } = await reader.read();
expect(done).toBe(true);
});
});
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import { prependBytes } from '../utils/prepend-bytes';

describe('prependBytes', () => {
it('Should prepend the specified number of bytes', async () => {
const stream = new ReadableStream<Uint8Array>({
type: 'bytes',
start(controller) {
controller.enqueue(new Uint8Array([4, 5]));
controller.close();
},
}).pipeThrough(prependBytes(new Uint8Array([1, 2, 3])));

const reader = stream.getReader();
const result1 = await reader.read();
expect(result1.value).toEqual(new Uint8Array([1, 2, 3]));
expect(result1.done).toBe(false);

const result2 = await reader.read();
expect(result2.value).toEqual(new Uint8Array([4, 5]));
expect(result2.done).toBe(false);

const result3 = await reader.read();
expect(result3.done).toBe(true);
});
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import { skipFirstBytes } from '../utils/skip-first-bytes';

describe('skipFirstBytes', () => {
it('Should skip the specified number of bytes', async () => {
const stream = new ReadableStream<Uint8Array>({
type: 'bytes',
start(controller) {
controller.enqueue(new Uint8Array([1, 2, 3, 4, 5]));
controller.close();
},
}).pipeThrough(skipFirstBytes(3));

const reader = stream.getReader();
const result1 = await reader.read();
expect(result1.value).toEqual(new Uint8Array([4, 5]));
expect(result1.done).toBe(false);

const result2 = await reader.read();
expect(result2.done).toBe(true);
});

it('Should skip the specified number of bytes across multiple pulls', async () => {
const stream = new ReadableStream<Uint8Array>({
type: 'bytes',
start(controller) {
controller.enqueue(new Uint8Array([1]));
controller.enqueue(new Uint8Array([2, 3]));
controller.enqueue(new Uint8Array([4, 5, 6]));
controller.close();
},
}).pipeThrough(skipFirstBytes(4));

const reader = stream.getReader();
const result1 = await reader.read();
expect(result1.value).toEqual(new Uint8Array([5, 6]));
expect(result1.done).toBe(false);

const result2 = await reader.read();
expect(result2.done).toBe(true);
});
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import { skipLastBytes } from '../utils/skip-last-bytes';

describe('skipLastBytes', () => {
it('Should skip the specified number of bytes', async () => {
const stream = new ReadableStream<Uint8Array>({
type: 'bytes',
start(controller) {
controller.enqueue(new Uint8Array([1, 2, 3, 4, 5]));
controller.enqueue(new Uint8Array([6, 7]));
controller.enqueue(new Uint8Array([8, 9]));
controller.close();
},
}).pipeThrough(skipLastBytes(3, 9));

const reader = stream.getReader();
const result1 = await reader.read();
expect(result1.value).toEqual(new Uint8Array([1, 2, 3, 4, 5]));
expect(result1.done).toBe(false);

const result2 = await reader.read();
expect(result2.value).toEqual(new Uint8Array([6]));
expect(result2.done).toBe(false);

const result3 = await reader.read();
expect(result3.done).toBe(true);
});
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
/**
* This file is executed before vitests are run, and ensures
* the required polyfills are loaded.
*
* @see tests.setupFiles in vite.config.ts
*/
import '@php-wasm/node-polyfills';
16 changes: 16 additions & 0 deletions packages/playground/stream-compression/src/utils/append-bytes.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
/**
* Appends bytes to a stream.
*
* @param bytes The bytes to append.
* @returns A transform stream that will append the specified bytes.
*/
export function appendBytes(bytes: Uint8Array) {
return new TransformStream<Uint8Array, Uint8Array>({
async transform(chunk, controller) {
controller.enqueue(chunk);
},
async flush(controller) {
controller.enqueue(bytes);
},
});
}
24 changes: 24 additions & 0 deletions packages/playground/stream-compression/src/utils/collect-bytes.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import { concatBytes } from './concat-bytes';
import { limitBytes } from './limit-bytes';

/**
* Collects the contents of the entire stream into a single Uint8Array.
*
* @param stream The stream to collect.
* @param bytes Optional. The number of bytes to read from the stream.
* @returns The string contents of the stream.
*/
export async function collectBytes(
stream: ReadableStream<Uint8Array>,
bytes?: number
) {
if (bytes !== undefined) {
stream = limitBytes(stream, bytes);
}

return await stream
.pipeThrough(concatBytes(bytes))
.getReader()
.read()
.then(({ value }) => value!);
}
16 changes: 16 additions & 0 deletions packages/playground/stream-compression/src/utils/collect-file.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import { collectBytes } from './collect-bytes';

/**
* Collects the contents of the entire stream into a single File object.
*
* @param stream The stream to collect.
* @param fileName The name of the file
* @returns The string contents of the stream.
*/
export async function collectFile(
fileName: string,
stream: ReadableStream<Uint8Array>
) {
// @TODO: use StreamingFile
return new File([await collectBytes(stream)], fileName);
}
25 changes: 25 additions & 0 deletions packages/playground/stream-compression/src/utils/collect-string.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import { concatString } from './concat-string';
import { limitBytes } from './limit-bytes';

/**
* Collects the contents of the entire stream into a single string.
*
* @param stream The stream to collect.
* @param bytes Optional. The number of bytes to read from the stream.
* @returns The string contents of the stream.
*/
export async function collectString(
stream: ReadableStream<Uint8Array>,
bytes?: number
) {
if (bytes !== undefined) {
stream = limitBytes(stream, bytes);
}

return await stream
.pipeThrough(new TextDecoderStream())
.pipeThrough(concatString())
.getReader()
.read()
.then(({ value }) => value);
}

0 comments on commit c363cfb

Please sign in to comment.