Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fireEvent.sigkill fails on some CI systems #3

Closed
crutchcorn opened this issue Nov 27, 2021 · 23 comments
Closed

fireEvent.sigkill fails on some CI systems #3

crutchcorn opened this issue Nov 27, 2021 · 23 comments
Labels
💎 Bounty help wanted Extra attention is needed

Comments

@crutchcorn
Copy link
Owner

While this code runs on all the same versions of Node on Windows within CI, it seems to error out on platforms like Windows.

Log:

https://github.com/crutchcorn/cli-testing-library/runs/4339953929?check_suite_focus=true

It appears to be a permissions issue of being unable to taskkill (or similar) a PID on Windows machines on GH Actions CI. This makes sense to me, as it may introduce vulnerabilities.

As a temporary workaround, I will be removing that test from CI and simply doing tests manually on Windows myself. Please let us know if there's a better way you know of

@crutchcorn crutchcorn changed the title FireEvent SigKill works test often fails on CI systems FireEvent SigKill works test often fails on some CI systems Dec 2, 2021
@crutchcorn crutchcorn changed the title FireEvent SigKill works test often fails on some CI systems fireEvent.sigkill fails on some CI systems Dec 2, 2021
@crutchcorn
Copy link
Owner Author

OK I'm stumped. I've tried for hours and hours now with no avail. It seems to work fine in CircleCI, but not GitHub actions.

I'm going to mark this as "help needed" and hope the ecosystem has some kind of solution

Note: This library works fine for Windows itself. I will be manually testing everything on my machine to make sure it works and continue to write Windows-specific tests

@crutchcorn crutchcorn added the help wanted Extra attention is needed label Dec 2, 2021
@crutchcorn
Copy link
Owner Author

OK it's not just sigkill and it's not just Windows

https://github.com/crutchcorn/cli-testing-library/actions/runs/1558975085

There is some timing issue somewhere, and I don't think it's our code, but I could be wrong.

@crutchcorn
Copy link
Owner Author

I fixed the timing issue (Node 14 seems to have changed something with spawn behavior that Node 12 doesn't have)

Sure enough, both Windows and Ubuntu tests pass now

https://github.com/crutchcorn/cli-testing-library/runs/4522732155?check_suite_focus=true

https://github.com/crutchcorn/cli-testing-library/runs/4522795843?check_suite_focus=true

@crutchcorn
Copy link
Owner Author

We're running into this issue again in Plop as Windows devices are failing again.

I'm willing to pay a bounty of $50 if anyone is able to solve these issues well enough that Windows CI systems do not fail on GitHub actions on the Plop monorepo 5x in a row.

@crutchcorn
Copy link
Owner Author

/bounty $50

Copy link

algora-pbc bot commented Dec 31, 2023

💎 $50 bounty created by crutchcorn
🙋 If you start working on this, comment /attempt #3 to notify everyone
👉 To claim this bounty, submit a pull request that includes the text /claim #3 somewhere in its body
📝 Before proceeding, please make sure you can receive payouts in your country
💵 Payment arrives in your account 2-5 days after the bounty is rewarded
💯 You keep 100% of the bounty award
🙏 Thank you for contributing to crutchcorn/cli-testing-library!

Attempt Started (GMT+0) Solution
🟢 @smartbizlord Dec 31, 2023, 5:42:05 AM WIP
🟢 @Waldeedle Jan 1, 2024, 8:13:43 PM #26

@smartbizlord
Copy link

smartbizlord commented Dec 31, 2023

/attempt #3

Options

@Drex72
Copy link

Drex72 commented Dec 31, 2023

OK I'm stumped. I've tried for hours and hours now with no avail. It seems to work fine in CircleCI, but not GitHub actions.

I'm going to mark this as "help needed" and hope the ecosystem has some kind of solution

Note: This library works fine for Windows itself. I will be manually testing everything on my machine to make sure it works and continue to write Windows-specific tests

Hey @crutchcorn Is it possible for me to get the config that you used for circle ci?

@crutchcorn
Copy link
Owner Author

@Drex72 it might be in the commit history. Otherwise I wouldn't know where it is :( Sorry I'm not more help

@Drex72
Copy link

Drex72 commented Dec 31, 2023

@Drex72 it might be in the commit history. Otherwise I wouldn't know where it is :( Sorry I'm not more help

Alright, no problem
that's cool
i'd check for it
i cant see the logs because it says it doesnt exist so i want to create a repro :)

@LayZeeDK
Copy link

LayZeeDK commented Dec 31, 2023

Failing job in Plop
https://github.com/plopjs/plop/actions/runs/7304409077/job/19906537229 windows-latest, Node.js 20.x

plop:test
  cache miss, executing b00a87c8e47ca0af
  
  > plop@4.0.1 test:instrument
  > nyc instrument ./bin ./instrumented/bin && nyc instrument ./src ./instrumented/src && cp package.json ./instrumented
  
  Browserslist: caniuse-lite is outdated. Please run:
    npx browserslist@latest --update-db
    Why you should do it regularly: https://github.com/browserslist/browserslist#browsers-data-updating
  Browserslist: caniuse-lite is outdated. Please run:
    npx browserslist@latest --update-db
    Why you should do it regularly: https://github.com/browserslist/browserslist#browsers-data-updating
  
  > plop@4.0.1 vitest
  > vitest run
  
  
   RUN  v1.1.0 D:/a/plop/plop/packages/plop
  
   ✓ tests/actions.spec.js  (6 tests | 4 skipped) 6435ms
   ✓ tests/wrapper.spec.js  (5 tests) 15245ms
   ✓ tests/esm.spec.js  (4 tests) 11460ms
   ✓ tests/action-failure.spec.js  (1 test) 3675ms
   ✓ tests/typescript.spec.js  (1 test) 2179ms
   ✓ tests/input-processing.spec.js  (16 tests | 1 skipped) 35128ms
  
  ⎯⎯⎯⎯⎯⎯ Unhandled Errors ⎯⎯⎯⎯⎯⎯
  
  Vitest caught 1 unhandled error during the test run.
  This might cause false positive tests. Resolve unhandled errors to make sure your tests are not affected.
  
  ⎯⎯⎯⎯ Unhandled Rejection ⎯⎯⎯⎯⎯
  Error: Command failed: taskkill /pid 1544 /T /F
  ERROR: The process with PID 5052 (child process of PID 1544) could not be terminated.
  Reason: The operation attempted is not supported.
  
  ERROR: The process with PID 1544 (child process of PID 5[72](https://github.com/plopjs/plop/actions/runs/7304409077/job/19906537229#step:5:74)8) could not be terminated.
  Reason: The operation attempted is not supported.
  
  
   ❯ ChildProcess.exithandler node:child_process:422:12
   ❯ ChildProcess.emit node:events:517:28
   ❯ maybeClose node:internal/child_process:1098:16
   ❯ Process.ChildProcess._handle.onexit node:internal/child_process:303:5
  
  ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
  Serialized Error: { code: 128, killed: false, signal: null, cmd: 'taskkill /pid 1544 /T /F' }
  This error originated in "tests/input-processing.spec.js" test file. It doesn't mean the error was thrown inside the file itself, but while it was running.
  The latest test that might've caused the error is "Should bypass input prompt with name". It might mean one of the following:
  - The error was thrown, while Vitest was running this test.
  - This was the last recorded test before the error was thrown, if error originated after test finished its execution.
  ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
  
   Test Files  6 passed (6)
        Tests  28 passed | 5 todo (33)
       Errors  1 error
     Start at  22:39:53
     Duration  36.38s (transform 179ms, setup 1.64s, collect 657ms, tests [74](https://github.com/plopjs/plop/actions/runs/7304409077/job/19906537229#step:5:76).12s, environment 2ms, prepare 1.[78](https://github.com/plopjs/plop/actions/runs/7304409077/job/19906537229#step:5:80)s)
  
  npm ERR! Lifecycle script `vitest` failed with error: 
  npm ERR! Error: command failed 
  npm ERR!   in workspace: plop@4.0.1 
  npm ERR!   at location: D:\a\plop\plop\packages\plop 
  Error:  command finished with error: command (D:\a\plop\plop\packages\plop) C:\Users\RUNNER~1\AppData\Local\Temp\xfs-42d[81](https://github.com/plopjs/plop/actions/runs/7304409077/job/19906537229#step:5:83)1ce\yarn.cmd run test exited (1)
Error: plop#test: command (D:\a\plop\plop\packages\plop) C:\Users\RUNNER~1\AppData\Local\Temp\xfs-42d811ce\yarn.cmd run test exited (1)

 Tasks:    1 successful, 2 total
Cached:    0 cached, 2 total
  Time:    54.193s 
Failed:    plop#test

 ERROR  run failed: command  exited (1)
Error: Process completed with exit code 1.

@Waldeedle
Copy link
Contributor

@crutchcorn just want to be clear, you are not seeing this issue for yourself locally? I was able to reproduce it on my local windows 11 desktop

@crutchcorn
Copy link
Owner Author

@Waldeedle INTERESTING. I, in fact, was not able to reproduce it locally even recently on the Plop repo.

@Waldeedle
Copy link
Contributor

Waldeedle commented Jan 1, 2024

@crutchcorn try cloning the repo to a new folder and just run the exact commands from your test workflow:
I think your hunch is right, I wonder if turbo is not handling child processes correctly because I can only reproduce it on the initial clone of the repo, any subsequent run just works.
EDIT: A process that could not be killed from the first run continues to run in the background throughout these subsequent runs:
image

Here's some copy-pasta for your convenience

git clone git@github.com:plopjs/plop.git
cd plop/
yarn install --frozen-lockfile
yarn test

@crutchcorn
Copy link
Owner Author

@Waldeedle it's interesting it mentions Microsoft AV. What if you turn it off (temporarily) for running? [Only if you trust it, feel free to read the source]

I have mine disabled in favor of a different antivirus and maybe that could be part of the root.

@Waldeedle
Copy link
Contributor

@crutchcorn assuming you meant from my screenshot? Sorry should've clarified that this was from my task manager, its going to show all processes, I will switch over to process explorer so I can see the related processes from the test

@Waldeedle
Copy link
Contributor

Waldeedle commented Jan 1, 2024

/attempt #3
I see where it is erroring out, and believe I have a potential solution.
Before I create a PR, could I get context around, this particular if statement, why are you attempting to kill a process that has already exited/doesn't exist?

if (err.message.includes('could not be terminated') && err.message.includes('There is no running instance of the task.'))
Options

Copy link

algora-pbc bot commented Jan 1, 2024

Note

The user @smartbizlord is already attempting to complete issue #3 and claim the bounty. We recommend checking in on @smartbizlord's progress, and potentially collaborating, before starting a new solution.

@crutchcorn
Copy link
Owner Author

I see where it is erroring out, and believe I have a potential solution.
Before I create a PR, could I get context around, this particular if statement, why are you attempting to kill a process that has already exited/doesn't exist?

if (err.message.includes('could not be terminated') && err.message.includes('There is no running instance of the task.'))

Truthfully I don't remember. I think this was an attempt to fix the problem at hand or work around windows specifically but idk

Copy link

algora-pbc bot commented Jan 1, 2024

💡 @Waldeedle submitted a pull request that claims the bounty. You can visit your bounty board to reward.

Copy link

algora-pbc bot commented Jan 2, 2024

@Waldeedle: Your claim has been rewarded! 👉 Complete your Algora onboarding to collect the bounty.

@crutchcorn
Copy link
Owner Author

@Waldeedle thanks so much for contributing to this project and helping us out! Great work!

@Waldeedle
Copy link
Contributor

@crutchcorn anytime, thanks for the brain teaser! 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💎 Bounty help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants