Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug Report: fetch:plain:file fails with stream is not readable #17072

Closed
2 tasks done
punkle opened this issue Mar 24, 2023 · 39 comments · Fixed by #21417
Closed
2 tasks done

🐛 Bug Report: fetch:plain:file fails with stream is not readable #17072

punkle opened this issue Mar 24, 2023 · 39 comments · Fixed by #21417
Labels
backend bug Something isn't working help wanted Help/Contributions wanted from community members

Comments

@punkle
Copy link
Contributor

punkle commented Mar 24, 2023

📜 Description

When I created the following template step it fails with an error:

steps:
    - id: fetch-gitignore
      name: Fetch gitignore
      action: fetch:plain:file
      input:
        url: "https://github.com/AcmeInc/test123/blob/main/.gitignore"
        targetPath: '.gitignore'

👍 Expected behavior

i expect it to fetch the file and write it to the workspace directory at the targetPath

👎 Actual Behavior with Screenshots

It fails with the folling error:

2023-03-24T09:10:24.000Z Beginning step Fetch Repo
2023-03-24T09:10:24.000Z info: Fetching plain content from remote URL
2023-03-24T09:10:25.000Z InternalServerError: stream is not readable
    at readStream (/Users/brianfletcher/git-repos/backstage/node_modules/raw-body/index.js:185:17)
    at executor (/Users/brianfletcher/git-repos/backstage/node_modules/raw-body/index.js:120:5)
    at new Promise (<anonymous>)
    at getRawBody (/Users/brianfletcher/git-repos/backstage/node_modules/raw-body/index.js:119:10)
    at Object.buffer (webpack-internal:///../backend-common/src/reading/ReadUrlResponseFactory.ts:48:73)
    at fetchFile (webpack-internal:///../../plugins/scaffolder-backend/src/scaffolder/actions/builtin/fetch/helpers.ts:67:34)
    at async Object.handler (webpack-internal:///../../plugins/scaffolder-backend/src/scaffolder/actions/builtin/fetch/plainFile.ts:61:13)
    at async NunjucksWorkflowRunner.executeStep (webpack-internal:///../../plugins/scaffolder-backend/src/scaffolder/tasks/NunjucksWorkflowRunner.ts:196:13)
    at async NunjucksWorkflowRunner.execute (webpack-internal:///../../plugins/scaffolder-backend/src/scaffolder/tasks/NunjucksWorkflowRunner.ts:258:17)
    at async TaskWorker.runOneTask (webpack-internal:///../../plugins/scaffolder-backend/src/scaffolder/tasks/TaskWorker.ts:88:33)

👟 Reproduction steps

Create a template with the step shown above and run it.

📃 Provide the context for the Bug.

No response

🖥️ Your Environment

No response

👀 Have you spent some time to check if this bug has been raised before?

  • I checked and didn't find similar issue

🏢 Have you read the Code of Conduct?

Are you willing to submit PR?

Yes I am willing to submit a PR!

@punkle punkle added the bug Something isn't working label Mar 24, 2023
@punkle
Copy link
Contributor Author

punkle commented Mar 24, 2023

Im happy to provide a PR, but I wouldnt know where to start with this.

@punkle
Copy link
Contributor Author

punkle commented Mar 24, 2023

@gavlyukovskiy are you finding it works for you?

@gavlyukovskiy
Copy link
Contributor

@punkle It does work for me:

    - id: fetch-plain
      name: Fetch Plain
      action: fetch:plain:file
      input:
        url: https://github.com/gavlyukovskiy/spring-boot-data-source-decorator/blob/master/.gitignore
        targetPath: .gitignore

I can't access your repository, but for mine it works well.

@gavlyukovskiy
Copy link
Contributor

@punkle well, I tried different combinations and it looks like it fails occasionally, especially first couple of seconds after starting with yarn dev, but not only.

I have tried adding more logging to GithubUrlReader after getting a response in readUrl, to see whether GH returns proper status code and response is readable:

this.logger.info(`Request ${ghUrl} -> ${response.status}, readable: ${response.body.readable}`)

but didn't find anything unusual:

[1] 2023-03-24T16:44:59.345Z backstage info Request https://raw.githubusercontent.com/gavlyukovskiy/spring-boot-data-source-decorator/master/.gitignore -> 200, readable: true 
[1] 2023-03-24T16:44:59.557Z backstage info Request https://raw.githubusercontent.com/gavlyukovskiy/spring-boot-data-source-decorator/master/.gitignore -> 200, readable: true 

It's always 200 and readable true, even when Backstage task fails with

2023-03-24T16:44:59.000Z InternalServerError: stream is not readable
    at readStream (/Users/artur.h/dev/projects/backstage/node_modules/raw-body/index.js:185:17)
    at executor (/Users/artur.h/dev/projects/backstage/node_modules/raw-body/index.js:120:5)
    at new Promise (<anonymous>)
    at getRawBody (/Users/artur.h/dev/projects/backstage/node_modules/raw-body/index.js:119:10)
    at Object.buffer (webpack-internal:///../backend-common/src/reading/ReadUrlResponseFactory.ts:48:73)1
    at fetchFile (webpack-internal:///../../plugins/scaffolder-backend/src/scaffolder/actions/builtin/fetch/helpers.ts:67:34)1
    at async Object.handler (webpack-internal:///../../plugins/scaffolder-backend/src/scaffolder/actions/builtin/fetch/plainFile.ts:62:13)1
    at async NunjucksWorkflowRunner.executeStep (webpack-internal:///../../plugins/scaffolder-backend/src/scaffolder/tasks/NunjucksWorkflowRunner.ts:196:13)1
    at async NunjucksWorkflowRunner.execute (webpack-internal:///../../plugins/scaffolder-backend/src/scaffolder/tasks/NunjucksWorkflowRunner.ts:258:17)1
    at async TaskWorker.runOneTask (webpack-internal:///../../plugins/scaffolder-backend/src/scaffolder/tasks/TaskWorker.ts:88:33)

@gavlyukovskiy
Copy link
Contributor

I've added more debug logs in various places and also close listener

    response.body.on("close", () => {
      getRootLogger().info("buffer close")
    })

and I think I know what is the problem:

[1] 2023-03-24T17:56:23.513Z backstage info Request https://raw.githubusercontent.com/gavlyukovskiy/spring-boot-data-source-decorator/master/.gitignore -> 200, readable: true 
[1] 2023-03-24T17:56:23.518Z backstage info fromNodeJSReadable readable: true 
[1] 2023-03-24T17:56:23.519Z backstage info fromReadable readable: true 
[1] 2023-03-24T17:56:23.526Z backstage info buffer close 
[1] 2023-03-24T17:56:23.527Z backstage info fromReadable.buffer readable: false 

so body is closed before we could read it, but I have no idea what closes it, I didn't find any body.close() or body.destroy() calls. I'm not really node developer, but perhaps this helps somebody to investigate issue further 🤷

@Rugvip Rugvip added help wanted Help/Contributions wanted from community members backend labels Mar 27, 2023
@punkle
Copy link
Contributor Author

punkle commented Mar 27, 2023

@gavlyukovskiy thanks for looking into it. Unfortunately I may not be able to help with identifying what might be causing that.

@fabidick22
Copy link

Hi @gavlyukovskiy, I have the same problem, it seems to be related to private repository files. When I try to get the https://github.com/backstage/backstage/blob/master/mkdocs.yml file everything works fine, but when I try to get a repo from my organization (https://github.com/ORG/REPO/blob/main/mkdocs.yml) I get the same error.

2023-04-19T15:02:29.082Z Beginning step Get repo file
2023-04-19T15:02:29.091Z info: Fetching plain content from remote URL
2023-04-19T15:02:29.256Z InternalServerError: stream is not readable
    at readStream (/app/node_modules/raw-body/index.js:185:17)
    at executor (/app/node_modules/raw-body/index.js:120:5)

@punkle
Copy link
Contributor Author

punkle commented Apr 27, 2023

I wonder would someone in the @backstage/catalog-maintainers group be able to help us here. I believe the issue is being surfaced when accessing the buffer of the URL reader which I would expect would work in this case. Perhaps this task is not using the url reader correctly?

https://github.com/backstage/backstage/blob/master/plugins/scaffolder-backend/src/scaffolder/actions/builtin/fetch/helpers.ts#L79

@eytanhanig
Copy link

Ironically, it appears to work with the URL from the tests, https://github.com/backstage/community/tree/main/backstage-community-sessions/assets/Backstage%20Community%20Sessions.png.

Meanwhile, it fails for https://github.com/backstage/backstage/blob/master/plugins/scaffolder-backend/README.md.

@punkle
Copy link
Contributor Author

punkle commented May 2, 2023

@eytanhanig it is a little odd that it works some of the time. I am not sure yet what the pattern is.

@benjdlambert
Copy link
Member

I wonder if this could be a node-fetch issue? Maybe moving over to using the built in fetch in node, and undici.stream something might be another debugging tool to see if we can work out what's going on here?

@punkle
Copy link
Contributor Author

punkle commented May 2, 2023

@benjdlambert can you elaborate a little? Which import would get us the fetch you are suggesting? At the moment the github url reader is using this: import fetch, { RequestInit, Response } from 'node-fetch';

What are you suggesting to try?

@benjdlambert
Copy link
Member

benjdlambert commented May 3, 2023

I'm suggesting to use the new built in fetch with node, which is shipped as undici. You can see the docs for this https://undici.nodejs.org/#/?id=undicistreamurl-options-factory-promise and you can do undici.stream to basically give you a readable stream I think. It's not on the fetch like in node-fetch so you might have to change a few basic things to get it to work as it does today. Worth a try at least so we can see if it's anything to do with how we're using node-fetch or if it's something else.

require('undici').stream should be it

@benjdlambert
Copy link
Member

I think you can even install that package as a seperate dependency too to backport it to older versions of node if you don't have it already.

@benjdlambert
Copy link
Member

Sorry for the message spam, but there's also this docs here which might be of interest using the streams https://undici.nodejs.org/#/docs/api/Dispatcher?id=example-1-basic-get-stream-request

@punkle
Copy link
Contributor Author

punkle commented May 3, 2023

@benjdlambert something like this #17632

@eytanhanig
Copy link

Have there been any updates on this?

@pcgqueiroz
Copy link

Hey! I had the same issue and I have managed to fix the problem changing the order of calls in the fetchFile function.

    const res = await reader.readUrl(readUrl);
-   await fs.ensureDir(path.dirname(outputPath));
-   const buffer = await res.buffer();
+   const buffer = await res.buffer();
+   await fs.ensureDir(path.dirname(outputPath));    
    await fs.outputFile(outputPath, buffer.toString());

Please comment if this fix also works for you.

@benjdlambert
Copy link
Member

Hmm that's an unexpected fix if that works 🤷 Looks like a race condition to me, I wondered if this was also perhaps related to the #18002 issue that was previously fixed recently?

@pcgqueiroz
Copy link

Hey @benjdlambert , I agree with you that it is an unexpected fix. It took me a while to discover it and, sincerely, I don´t know why this change fixes the problem.
The thing is that it worked for me and changing the order things are called do no harm for the code. I mean, no side effects apparently. I think it's worth a try.

@benjdlambert
Copy link
Member

@eytanhanig @pcgqueiroz can you try the latest version of backstage 1.16 and see if you can reproduce it with this? And if you can, can we get a reproducible example that we can try with so we can dig into this further?

@athifongoqa
Copy link

Hey, has there been any progress/updates on this?

@pedronastasi
Copy link
Contributor

I've got

{
  "version": "1.15.0"
}

and I'm facing the issue

@benjdlambert
Copy link
Member

@pedronastasi can you update to the latest versions and see if the problem still persists? We made some changes in 1.16 but without reproduction case it's hard to see if we've fixed it or not.

@alper98
Copy link
Contributor

alper98 commented Oct 5, 2023

We're on 1.17.5, and have the same issue.

And can confirm that it works fine by fetching from public repositories, but when fetching from our org, it fails with stream is not readable

@benjdlambert
Copy link
Member

Hey @alper98 would it be possible for you to create a reproduction repo or something with a basic create-app? This is hard to debug without a reliable reproduction case.

@brunosantosnet
Copy link

I've used fetch:plain:file several times without any problems. Now I have this problem

@benjdlambert
Copy link
Member

@brunosantosnet is it possible that you can setup a minimal reproduction repository for this so we can dig in further? Also what change caused this to start breaking and failing? 🙏

@brunosantosnet
Copy link

@benjdlambert my scenario is the worst possible. There are repositories that using fetch:plain:file the error does not occur, I don't know why. When I use fetch:plain I only have problems in repositories with many objects :-(

@benjdlambert
Copy link
Member

For people that have this issue, is it Node 20 or are we seeing it on Node 18 too?

@benjdlambert
Copy link
Member

I think I have a fix, it's a similar fix to the other stream bug that we had. I managed to replicate this locally, but it doesn't happen every time, so I think that this works as I've not been able to reproduce it one using this fix.

#21417

@benjdlambert
Copy link
Member

Can we verify that this is fixed by bumping to the latest -next release?

yarn backstage-cli versions:bump --release=next

@pballandras
Copy link
Contributor

Can we verify that this is fixed by bumping to the latest -next release?

yarn backstage-cli versions:bump --release=next

@benjdlambert

Maybe you got your answer a while ago, but I was able to fix this issue by moving on to the 0.20.0-next.3 version of @backstage/backend-common. So it would look like this is fixed.

Cheers!

@benjdlambert
Copy link
Member

Yep I think that this got released in the mainline release of 1.20 so should be fixed for everyone from that version 🤞

@gavlyukovskiy
Copy link
Contributor

Hi @benjdlambert, unfortunately this is still the case on Backstage 1.20.3, but with gitlab url in format https://<url>/<group>/<repo>/blob/main/CODEOWNERS:

2023-12-21T13:34:06.384Z info: Fetching plain content from remote URL
2023-12-21T13:34:06.750Z InternalServerError: stream is not readable
    at readStream (/app/node_modules/@backstage/backend-common/node_modules/raw-body/index.js:185:17)
    at executor (/app/node_modules/@backstage/backend-common/node_modules/raw-body/index.js:120:5)
    at new Promise (<anonymous>)
    at Object.getRawBody [as default] (/app/node_modules/@backstage/backend-common/node_modules/raw-body/index.js:119:10)
    at Object.buffer (/app/node_modules/@backstage/backend-common/dist/index.cjs.js:1232:48)
    at Object.fetchFile (/app/node_modules/@backstage/plugin-scaffolder-node/dist/index.cjs.js:87:30)
    at async Object.handler (/app/node_modules/@backstage/plugin-scaffolder-backend/dist/cjs/router-68c854a3.cjs.js:724:7)
    at async NunjucksWorkflowRunner.executeStep (/app/node_modules/@backstage/plugin-scaffolder-backend/dist/cjs/router-68c854a3.cjs.js:7233:9)
    at async NunjucksWorkflowRunner.execute (/app/node_modules/@backstage/plugin-scaffolder-backend/dist/cjs/router-68c854a3.cjs.js:7306:9)
    at async TaskWorker.runOneTask (/app/node_modules/@backstage/plugin-scaffolder-backend/dist/cjs/router-68c854a3.cjs.js:7528:26)
    at async run (/app/node_modules/p-queue/dist/index.js:163:29)

but it's not consistent, approximately every second request fails.

@benjdlambert
Copy link
Member

@gavlyukovskiy can you create a reproduction repo or something so that we can reproduce this easier? These things are hard to dig into otherwise.

@gavlyukovskiy
Copy link
Contributor

gavlyukovskiy commented Dec 21, 2023

@benjdlambert I've created a reproducer, though I cannot reproduce it with latest version v1.21.0 (or master), but I still can with v1.20.3:

git checkout v1.20.3

# add to remote templates: https://gitlab.com/agavlyukovskiy/backstage-17072/-/blob/main/template.yaml
yq eval '.spec.targets += "https://gitlab.com/agavlyukovskiy/backstage-17072/-/blob/main/template.yaml"' -i plugins/scaffolder-backend/sample-templates/remote-templates.yaml

yarn dev

then after using the template with url https://gitlab.com/agavlyukovskiy/backstage-17072/-/blob/main/CODEOWNERS (default) it returns InternalServerError: stream is not readable error.

We will upgrade our Backstage to the latest version tomorrow and I'll let you know if we still see that issue on our production instance.

@gavlyukovskiy
Copy link
Contributor

@benjdlambert ah, I thought that your fix was in 1.20.3, but it only got into 1.21.0. Looks like it did fix the issue 👍

@benjdlambert
Copy link
Member

Ah apologies - yeah looks like it did go out in 1.21.0, nice at least it's fixed! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend bug Something isn't working help wanted Help/Contributions wanted from community members
Projects
None yet
Development

Successfully merging a pull request may close this issue.