Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompressing gzip stream from node-fetch response leads to different output in Deno vs Node #20456

Closed
dsherret opened this issue Sep 11, 2023 · 7 comments · Fixed by #21297
Closed
Assignees
Labels
bug Something isn't working correctly node compat

Comments

@dsherret
Copy link
Member

dsherret commented Sep 11, 2023

package.json

{
  "dependencies": {
    "node-fetch": "^3.3.2"
  }
}

main.mjs

import crypto from "node:crypto";
import fs from "node:fs";
import zlib from "node:zlib";
import { pipeline } from 'node:stream';
import { promisify } from 'node:util'
import fetch from "node-fetch";

const streamPipeline = promisify(pipeline);

try {
  fs.unlinkSync("./schema-engine.exe");
} catch {}

const response = await fetch('https://binaries.prisma.sh/all_commits/2804dc98259d2ea960602aca6b8e7fdc03c1758f/windows/schema-engine.exe.gz');
const gunzip = zlib.createGunzip();
gunzip.on("error", (err) => console.log(err));
streamPipeline(response.body, gunzip);
const output = fs.createWriteStream("./schema-engine.exe");
const writeStream = gunzip.pipe(output);
writeStream.on("close", () => {
  console.log("Done");

  getHash("./schema-engine.exe").then((hash) => {
    console.log("Expected: d71565ea5e98b3cbbced66f4220d62bd221fa5cebd038d2875d66c83b29643c6")
    console.log("Actual:   " + hash);
  });
});

function getHash(filePath) {
  const hash = crypto.createHash("sha256");
  const input = fs.createReadStream(filePath);
  return new Promise((resolve) => {
    input.on("readable", () => {
      const data = input.read();
      if (data) {
        hash.update(data);
      } else {
        resolve(hash.digest("hex"));
      }
    });
  });
}
> deno run -A main.mjs
Done
Expected: d71565ea5e98b3cbbced66f4220d62bd221fa5cebd038d2875d66c83b29643c6
Actual:   cdc7fc9c9202eddad3ef16dabddc8f7be2876456baef268f4d32c6042fdfe8e8 <-- this changes between runs
> node main.mjs
Done
Expected: d71565ea5e98b3cbbced66f4220d62bd221fa5cebd038d2875d66c83b29643c6
Actual:   d71565ea5e98b3cbbced66f4220d62bd221fa5cebd038d2875d66c83b29643c6

Causes #20098
Causes #19544

@dsherret dsherret added bug Something isn't working correctly node compat labels Sep 11, 2023
@dsherret
Copy link
Member Author

Interestingly, if you remove the fs.unlinkSync line and run it twice, then it will return the correct output in Deno.

@littledivy
Copy link
Member

Also interesting that this does not happen on Linux (Ubuntu 22).

@littledivy
Copy link
Member

Aha, so the problem is in node-fetch (probably our node:http impl) it seems.

Removing import fetch from "node:http"; from the repro fixes it.

@kt3k kt3k self-assigned this Oct 11, 2023
@kt3k
Copy link
Member

kt3k commented Oct 12, 2023

It looks like the broken bytes are only at the beginning of the data. In case of cdc7fc9c92, the first 11KB (0.07%) looks broken out of 16MB

@kt3k
Copy link
Member

kt3k commented Oct 12, 2023

If I concat all chunks from gunzip Transform stream, then the data isn't corrupt.

import crypto from "node:crypto";
import fs from "node:fs";
import zlib from "node:zlib";
import { pipeline } from 'node:stream';
import { promisify } from 'node:util'
import fetch from "node-fetch";
import { Buffer } from "node:buffer"

const streamPipeline = promisify(pipeline);

try {
  fs.unlinkSync("./schema-engine.exe");
} catch {}

const response = await fetch('https://binaries.prisma.sh/all_commits/2804dc98259d2ea960602aca6b8e7fdc03c1758f/windows/schema-engine.exe.gz');
const gunzip = zlib.createGunzip();
streamPipeline(response.body, gunzip);
const output = fs.createWriteStream("./schema-engine.exe");
const writeStream = gunzip.pipe(output);

writeStream.on("close", () => {
  const writtenFile = fs.readFileSync("./schema-engine.exe")
  console.log("Expected: d71565ea5e98b3cbbced66f4220d62bd221fa5cebd038d2875d66c83b29643c6")
  console.log("Actual:   " + sha256(writtenFile));
  console.log("sha1(unzipped data)", sha256(gunzippedBuf));
});

let gunzippedBuf = Buffer.alloc(0);
gunzip.on("data", d => {
  gunzippedBuf = Buffer.concat([gunzippedBuf, d]);
})

function sha256(b) {
  return crypto.createHash("sha256").update(b).digest("hex")
}

This prints:

Expected: d71565ea5e98b3cbbced66f4220d62bd221fa5cebd038d2875d66c83b29643c6
Actual:   cdc7fc9c9202eddad3ef16dabddc8f7be2876456baef268f4d32c6042fdfe8e8
sha1(unzipped data) d71565ea5e98b3cbbced66f4220d62bd221fa5cebd038d2875d66c83b29643c6

I think there's something wrong with fs.createWriteStream() side

@kt3k
Copy link
Member

kt3k commented Nov 22, 2023

I found smaller reproduction, which doesn't depend on either gunzip stream or node-fetch:

import fs from "node:fs";
const writeStream = fs.createWriteStream("out.txt");
for (const i of Array(20).keys()) {
  writeStream.write(i + "\n");
  await new Promise((resolve) => setTimeout(resolve, 0));
}
$ deno run -A x.mjs && cat out.txt
1
0
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

(note: first line could be other numbers)

It seems that if writeStream.write is called at certain timing near the end of streamImpl._construct, then that chunk overtakes the buffered chunks in the writable stream.

@kt3k
Copy link
Member

kt3k commented Nov 22, 2023

Simpler repro (doesn't depend on even node:fs):

import { Writable } from "node:stream";

async function test() {
  const chunks = [];
  const writable = new Writable({
    construct(cb) {
      setTimeout(cb, 10);
    },
    write(chunk, _, cb) {
      chunks.push(chunk);
      cb();
    }
  });

  for (const i of Array(20).keys()) {
    writable.write(Uint8Array.from([i]));
    await new Promise((resolve) => setTimeout(resolve, 1));
  }

  if (chunks[0][0] === 0) {
    console.log("success");
  } else {
    console.log("fail, first chunk is ", chunks[0])
  }
}

for (const _ of Array(10)) {
  await test();
}

If I replace Writable implementation to one from npm:readable-stream@4.4.0, then it's fixed. If I replace it to npm:readable-stream@4.3.0, then it reproduces.

nodejs/node@355bcbc This seems like the fix we need. (Node.js had a very similar issue nodejs/node#46765

kt3k added a commit that referenced this issue Nov 23, 2023
This change applies the same fix as
nodejs/node#46818, and the original example
given in #20456 works as expected.

closes #20456
crowlKats pushed a commit that referenced this issue Nov 24, 2023
This change applies the same fix as
nodejs/node#46818, and the original example
given in #20456 works as expected.

closes #20456

(cherry picked from commit bf42467)
bartlomieju pushed a commit that referenced this issue Nov 24, 2023
This change applies the same fix as
nodejs/node#46818, and the original example
given in #20456 works as expected.

closes #20456
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly node compat
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants