Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running tests with coverage causes CPU to hit 100% after tests finish #454

Open
achingbrain opened this issue Feb 21, 2023 · 20 comments
Open

Comments

@achingbrain
Copy link

  • Version: v18.14.0
  • Platform: windows-latest, ubuntu-latest and macos-latest GitHub CI images

When running tests with coverage in CI, after the test run completes successfully, CPU spikes to 100% and the process hangs.

It happened in this build: https://github.com/ipfs/aegir/actions/runs/4225847659/jobs/7338650232 and sometimes happens locally too, though it's a lot more frequent in CI.

This started happening with node 18, it didn't seem to happen with node 16.

The command being run is similar to:

c8 --reporter json --report-dir .coverage --temp-directory /tmp/some-random-string --clean mocha test/node.{js,cjs,mjs} test/**/*.spec.{js,cjs,mjs} dist/test/node.{js,cjs,mjs} dist/test/**/*.spec.{js,cjs,mjs} --ui bdd --require source-map-support/register --timeout=60000 --bail
@mmomtchev
Copy link

Same environment, same symptoms here: https://github.com/mmomtchev/sqlite-wasm-http/actions/runs/4377210809/attempts/1
Haven't been able to reproduce locally, only in GA

@mmomtchev
Copy link

Node v19 does not solve it

@AriPerkkio
Copy link
Contributor

How large does the --temp-directory / coverage/tmp grow? Could you check this with du -sh or similar tool? And how many files are there?
This can be checked in local environment as well.

@mmomtchev
Copy link

44 MB in 90 files, all in coverage/tmp

@AriPerkkio
Copy link
Contributor

AriPerkkio commented Mar 14, 2023

That doesn't sound too bad. I've seen cases where CIs get slow when coverage/tmp reaches +450 MB. The process may seem stuck but it's actually just processing a lot of JSON files and takes a while.

But getting a CI stuck with 44MB files sounds like a different issue.

@achingbrain
Copy link
Author

The CI build I linked to was killed after having stalled for six hours. Prior to the appearance of this bug processing the coverage report in CI took seconds so I definitely think it's a different issue!

@bartsteverink
Copy link

bartsteverink commented Mar 14, 2023

I am running into the same issue. Able to run the tests and coverage successfully locally inside the applications' docker container (based on node:18.14.2-bullseye-slim). Using the same container in the Gitlab CICD environment I do get test results on stdout but hangs when generating coverage. Using the following Gitlab config (already commented out most of the post-processing stuff for debugging):

...
unit_test:
  stage: test
  image: $TEST_IMAGE_NAME # Build in the previous step based on the Dockerfile in the repo.
  variables:
    # We do not need GitLab to clone the source code.
    GIT_STRATEGY: none
    NODE_ENV: dev
  before_script:
    - cd /code
    - npm ci
  script:
    - npx c8 mocha
  # after_script:
  #   - mv test-results.xml ${CI_PROJECT_DIR}
  #   - mv coverage ${CI_PROJECT_DIR}
  # artifacts:
  #   when: always
  #   paths:
  #     - test-results.xml
  #     - coverage/cobertura-coverage.xml
  #   expire_in: 10 years
  #   reports:
  #     junit: test-results.xml
  #     coverage_report:
  #       coverage_format: cobertura
  #       path: coverage/cobertura-coverage.xml
  <<: *rules-default
...

My coverage/tmp only gets to 2MB (really small app)

@AriPerkkio
Copy link
Contributor

Try adding NODE_DEBUG: 'c8' environment variable to see if c8 logs any errors.

mmomtchev added a commit to mmomtchev/sqlite-wasm-http that referenced this issue Mar 15, 2023
as suggested in bcoe/c8#454
mmomtchev added a commit to mmomtchev/sqlite-wasm-http that referenced this issue Mar 22, 2023
* add NODE_DEBUG

as suggested in bcoe/c8#454

* try windows

* change the correct job

* use npx

* remove NODE_DEBUG
@mmomtchev
Copy link

mmomtchev commented Mar 22, 2023

@AriPerkkio It seems that Windows is not affected

PS. Finally, Windows is affected too, but the frequency is much lower

@zargarzadehm
Copy link

zargarzadehm commented May 13, 2023

@achingbrain Did you find any solution?
This error reproduced in the local environment for me with node 16.14 and c8 7.11.3 but it's non-deterministic!

@achingbrain
Copy link
Author

No, currently running tests without coverage due to this issue.

@mmomtchev
Copy link

Somebody has to fix it. I can't reproduce it outside of Github Actions. Try attaching the inspector when it happens.

@zargarzadehm
Copy link

Somebody has to fix it. I can't reproduce it outside of Github Actions. Try attaching the inspector when it happens.

What do you mean by attaching in inspector, how can I help you?

I can reproduce this error in the local environment with node 16.14, I tried to find debug logs to report to you but the attempt was not successful! also, my problem was solved with node:18.14.0 in my GitLab pipeline.

@mmomtchev
Copy link

I was able to reproduce it with node:18.14.0, it is not related
https://github.com/mmomtchev/sqlite-wasm-http/actions/runs/4991572755

@platosha
Copy link

The workaround that helped me is to enable the exit mode in Mocha (--exit on command line or "exit": true in the config).

Seems that there is some mutual issue in handling the exit between c8 and Mocha running as a child process. I suspected that there is some loop or wait process on the Mocha side waiting for something to finish after waiting for the tests.

I was unable to attach the debugger to the Mocha child process, Node seems to be in the phase when it does not accept debugger connections anymore.

Then I tried wtfnode in attempt to dump the remaining handlers. The only way of using wtfnode that worked for me is to insert wtf.dump() in the the Mocha's exitMochaLater handler. It gave basically empty result for me:

[WTF Node?] open handles:
- File descriptors: (note: stdio always exists)
  - fd 2 (tty) (stdio)
  - fd 1 (tty) (stdio)
  

If I'm not missing anything, then I guess what happens is that Mocha's process gets stalled waiting until the stdout pipe is read just before exiting, while the c8 process perhaps is not reading enough from this pipe to let Mocha's Node process finish. The workaround helps by making Mocha flush and exit the process explicitly after completing the tests. See the flag description for the dangers of the exit mode: https://mochajs.org/#-exit

I'm using macOS, btw, and the issue is occurring ~10% of time for me locally.

@platosha
Copy link

I also have no issues when running Mocha directly with the same arguments.

@mmomtchev
Copy link

Still having the issue with "exit": true

@bcoe
Copy link
Owner

bcoe commented Oct 25, 2023

I wonder if this could be related to nodejs/node#49344 we should keep an eye and see if the problem continues once patched.

@abcfy2
Copy link

abcfy2 commented Nov 28, 2023

I don't know if my issue is related to this issue.

I use ava with c8. When I use node 18, it's working, but when I upgrade to node 20, c8 will hang forever after ava test finished.

When I use node 18:

$ npm test
...
  ✔ application › should schedule a job (2s)
  ─

  7 tests passed
--------------------------|---------|----------|---------|---------|-------------------------------
File                      | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s             
--------------------------|---------|----------|---------|---------|-------------------------------
All files                 |      83 |    86.36 |   70.83 |      83 |                               
 _core                    |   87.73 |    91.66 |      80 |   87.73 |                               
  application.ts          |   87.73 |    91.66 |      80 |   87.73 | 86-94,167-169,177-178,184-195 
 _core/controller         |   76.92 |      100 |      75 |   76.92 |                               
  route.ts                |   76.92 |      100 |      75 |   76.92 | 35-49                         
 _core/controller/actions |   68.75 |       60 |      75 |   68.75 |                               
  health.ts               |     100 |      100 |     100 |     100 |                               
  siweAction.ts           |   63.41 |       50 |   66.66 |   63.41 | 15-18,28-29,33-41             
 _core/services           |    80.3 |       70 |      25 |    80.3 |                               
  dbService.ts            |      75 |      100 |       0 |      75 | 20-22,25-27,30-31             
  securityService.ts      |   85.29 |       70 |     100 |   85.29 | 11-12,16-17,29                
 _core/utils              |     100 |      100 |     100 |     100 |                               
  common.ts               |     100 |      100 |     100 |     100 |                               
--------------------------|---------|----------|---------|---------|-------------------------------

But when I use node 20, the same code with test will hang:

$ npm test

> sln-a@0.1.0 test
> c8 ava

...
 ✔ application › should schedule a job (2s)

# <==== hang forever and I have to kill node process to stop it
# no tests result output and ctrl-c is not working

@mmomtchev
Copy link

Node.js 21.6.0 which includes the fix for nodejs/node#47748 resolved the issue for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants