Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Runner Drops CTest Failure Output #1206

Open
jgehrig opened this issue Jul 17, 2021 · 20 comments
Open

Windows Runner Drops CTest Failure Output #1206

jgehrig opened this issue Jul 17, 2021 · 20 comments
Labels
bug Something isn't working Runner Bug Bug fix scope to the runner

Comments

@jgehrig
Copy link

jgehrig commented Jul 17, 2021

Describe the bug

On Windows, runner drops terminal output from the test scenarios spawned by ctest.

Working on Actions + Linux:
https://github.com/equalsraf/neovim-qt/pull/885/checks?check_run_id=3094273077

Working on AppVeyor + Win:
https://ci.appveyor.com/project/equalsraf/neovim-qt/builds/40031037/job/ddhiyj6vtqyaclt9

No Output on Actions + Win:
https://github.com/equalsraf/neovim-qt/pull/885/checks?check_run_id=3094273084

You can RDP into the Windows runner, manually run the test and observe the correct output.

The output is being dropped by the GitHub Actions Runner.

See equalsraf/neovim-qt#884 for more details.

To Reproduce

Example scenario: equalsraf/neovim-qt#885

  1. Run ctest within the action containing failures on Linux and Windows
  2. Notice failure output is present on Linux: FAIL! : ...
  3. Notice Windows drops failure output only when run inside GitHub Actions. No FAIL!: ... or PASS: ...

Expected behavior

GitHub actions should pipe output from ctest, and output from the tests it spawns.

You should see output for the failing tests such as tst_encoding. FAIL!: ...

Runner Version and Platform

Latest Github Actions hosted runner.

Does not work with windows-latest.

Works with ubuntu-latest and macos-latest.

AppVeyor + Windows example provided to demonstrate the underlying code works on Windows.

What's not working?

Please include error messages and screenshots.

Output from the failing test cases is missing. Only for GitHub Actions hosted instances running Windows.

Linux/MacOS work fine. Other CI tools work fine.

Actions + Linux (Good):

12/19 Test #12: tst_input_common_unix ............***Failed    0.05 sec
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-runner'
********* Start testing of TestInputCommon *********
Config: Using QtTest library 5.12.8, Qt 5.12.8 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 9.3.0)
PASS   : TestInputCommon::initTestCase()
FAIL!  : TestInputCommon::LessThanKey() 'false' returned FALSE. ()
   Loc: [../test/tst_input_common.cpp(32)]
PASS   : TestInputCommon::ModifierOnlyKeyEventsIgnored()
PASS   : TestInputCommon::VolumeKeysIgnored()
PASS   : TestInputCommon::ShiftKeyEventWellFormed()
PASS   : TestInputCommon::CapsLockIgnored()
PASS   : TestInputCommon::AltGrAloneIgnored()
PASS   : TestInputCommon::AltGrKeyEventWellFormed()
PASS   : TestInputCommon::ShiftSpaceWellFormed()
PASS   : TestInputCommon::ShiftBackSpaceWellFormed()
PASS   : TestInputCommon::MouseLeftClick()
PASS   : TestInputCommon::MouseRightClick()
PASS   : TestInputCommon::MouseMiddleClick()
PASS   : TestInputCommon::cleanupTestCase()
Totals: 13 passed, 1 failed, 0 skipped, 0 blacklisted, 1ms
********* Finished testing of TestInputCommon *********

      Start 13: tst_input_common_win32

AppVeyor + Windows (Good):

12: Test command: C:\projects\neovim-qt\build\bin\tst_input_common_unix.exe
12: Test timeout computed to be: 10000000
12: ********* Start testing of TestInputCommon *********
12: Config: Using QtTest library 5.15.2, Qt 5.15.2 (x86_64-little_endian-llp64 shared (dynamic) release build; by GCC 8.1.0), windows server2016
12: PASS   : TestInputCommon::initTestCase()
12: FAIL!  : TestInputCommon::LessThanKey() 'false' returned FALSE. ()
12: C:\projects\neovim-qt\test\tst_input_common.cpp(32) : failure location
12: PASS   : TestInputCommon::ModifierOnlyKeyEventsIgnored()
12: PASS   : TestInputCommon::VolumeKeysIgnored()
12: PASS   : TestInputCommon::ShiftKeyEventWellFormed()
12: PASS   : TestInputCommon::CapsLockIgnored()
12: PASS   : TestInputCommon::AltGrAloneIgnored()
12: PASS   : TestInputCommon::AltGrKeyEventWellFormed()
12: PASS   : TestInputCommon::ShiftSpaceWellFormed()
12: PASS   : TestInputCommon::ShiftBackSpaceWellFormed()
12: PASS   : TestInputCommon::MouseLeftClick()
12: PASS   : TestInputCommon::MouseRightClick()
12: PASS   : TestInputCommon::MouseMiddleClick()
12: PASS   : TestInputCommon::cleanupTestCase()
12: Totals: 13 passed, 1 failed, 0 skipped, 0 blacklisted, 1ms
12: ********* Finished testing of TestInputCommon *********
12/19 Test #12: tst_input_common_unix ............***Failed    0.05 sec
test 13
      Start 13: tst_input_common_win32

Actions + Windows (Bad):

12/19 Test #12: tst_input_common_unix ............***Failed    0.04 sec

      Start 13: tst_input_common_win32

Runner and Worker's Diagnostic Logs

Here are logs from the example above:

logs_269.zip

@jgehrig jgehrig added the bug Something isn't working label Jul 17, 2021
@TingluoHuang
Copy link
Member

@jgehrig I saw you are using an actions GabrielBB/xvfb-action@v1, have you check whether that action is doing something differently across Windows/Linux?

The log shows:

  • Actions + Linux: /usr/bin/xvfb-run --auto-servernum ctest --output-on-failure -C Debug
  • Actions + Windows: "C:\Program Files\CMake\bin\ctest.exe" --output-on-failure -C Release
  • AppVeyor + Windows: ctest -VV -C Release

Since they all running different commands, I am not convinced that the runner is losing certain output.

@jgehrig
Copy link
Author

jgehrig commented Jul 27, 2021

@TingluoHuang

GabrielBB/xvfb-action@v1 is required on Linux. Qt apps won't run without $DISPLAY.

I've done a Windows run without that plugin, and it has no effect.

I've also done runs with -VV instead of --output-on-failure, also no effect.

Let me know what configuration you'd like to see across all three runs, and I will quickly switch them over.


The only answer I can come up with is output is being dropped by runner.

I have RDP-ed into the machine, and run both the ctest and tst_input_common commands manually; the output is correct.

When the same commands are run through Actions, output is missing. It looks like the missing output is from the spawned test executables (tst_input_common).

@TingluoHuang
Copy link
Member

The runner launch ctest and redirects all STDOUT/STDERR, we might need to figure out who is printing ********* Start testing of and why it didn't get captured and forward by ctest.

@jgehrig
Copy link
Author

jgehrig commented Jul 27, 2021

ctest is a CMake wrapper that runs test executables. In this case, the test executables are all named tst_* or test_*.

The output that appears is from ctest. The ********* Start testing of TestNeovimObject ********* is from the test executable.

Here is an example run on my local machine:

$ ctest -VV
UpdateCTestConfiguration  from :/home/jgehrig/Development/neovim-qt/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/home/jgehrig/Development/neovim-qt/build/DartConfiguration.tcl
Test project /home/jgehrig/Development/neovim-qt/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 1
      Start  1: tst_neovimobject

1: Test command: /home/jgehrig/Development/neovim-qt/build/bin/tst_neovimobject
1: Test timeout computed to be: 10000000
1: ********* Start testing of TestNeovimObject *********
1: Config: Using QtTest library 5.15.2, Qt 5.15.2 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 9.3.0), gentoo unknown
1: QDEBUG : TestNeovimObject::initTestCase() Neovim API version compatible with 0 supported 7
1: PASS   : TestNeovimObject::initTestCase()
1: PASS   : TestNeovimObject::eventTypes()
1: PASS   : TestNeovimObject::extDecodeApi0()
1: PASS   : TestNeovimObject::extDecodeApi1()
1: PASS   : TestNeovimObject::extDecodeApi2()
1: PASS   : TestNeovimObject::cleanupTestCase()
1: Totals: 6 passed, 0 failed, 0 skipped, 0 blacklisted, 1503ms
1: ********* Finished testing of TestNeovimObject *********
 1/18 Test  #1: tst_neovimobject .................   Passed    1.54 sec
$ bin/tst_neovimobject 
********* Start testing of TestNeovimObject *********
Config: Using QtTest library 5.15.2, Qt 5.15.2 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 9.3.0), gentoo unknown
QDEBUG : TestNeovimObject::initTestCase() Neovim API version compatible with 0 supported 7
PASS   : TestNeovimObject::initTestCase()
PASS   : TestNeovimObject::eventTypes()
PASS   : TestNeovimObject::extDecodeApi0()
PASS   : TestNeovimObject::extDecodeApi1()
PASS   : TestNeovimObject::extDecodeApi2()
PASS   : TestNeovimObject::cleanupTestCase()
Totals: 6 passed, 0 failed, 0 skipped, 0 blacklisted, 1502ms
********* Finished testing of TestNeovimObject *********

I can do the same manually on Windows while RDPed into the GH Action runner. The correct (full) output appears.

@jgehrig
Copy link
Author

jgehrig commented Jul 27, 2021

Let me know if there is anything I can do to help debug.

@jgehrig
Copy link
Author

jgehrig commented Aug 2, 2021

@TingluoHuang

Can you point me in the direction of documentation that would describe how to debug an issue like this?

I am reasonably confident the output is present, but not getting passed through to the GH Actions Runner logs...

Thanks!

@jgehrig
Copy link
Author

jgehrig commented Sep 8, 2021

Still looking for a solution here.

I am not sure what is causing the output to drop from the Actions logs.

@d1vanov
Copy link

d1vanov commented Oct 9, 2021

Have this issue with a number of my repositories as well.

@exactly-one-kas
Copy link

I can confirm this issue for me as well

@jgehrig
Copy link
Author

jgehrig commented Nov 29, 2021

Ping @TingluoHuang

A few users have expressed they also experience this issues. Can someone take a look?

I'm happy to help in any way:

  • Provide an example from my repository.
  • Provide a dummy-repository with a minimal repro case.
  • Something else?

Let me know if I can help. Thanks!

jgehrig added a commit to jgehrig/ctest-actions-bug that referenced this issue Mar 16, 2022
The CTest output is dropped by the GitHub Actions Runner. This is a
minimal repro case for demonstration.

See: actions/runner#1206
@jgehrig
Copy link
Author

jgehrig commented Mar 16, 2022

@TingluoHuang

Here is a minimal example of the issue:
https://github.com/jgehrig/ctest-actions-bug

How can we debug this issue? What can we do to get ctest failure output?

Let me know how I can help.

@nikola-jokic nikola-jokic added the Runner Bug Bug fix scope to the runner label Jul 15, 2022
@silverqx
Copy link

I can confirm this, Ubuntu image works correctly, but Windows-2019 and Windows-2022 images don't.

The --output-on-failure and also the CTEST_OUTPUT_ON_FAILURE=1 doesn't work correctly which means the output is empty.

The problem will be somewhere between how the ctest is outputting this output (I don't know if it's outputting to the stderr or stdout) and how the runner catches this output.
It works correctly on my desktop with the pwsh 7.

My current workaround for this is the following snippet:

- name: Setup tmate session
  if: ${{ failure() }}
  uses: mxschmitt/action-tmate@v3
  with:
    limit-access-to-actor: true
- run: exit 1

And then invoke the ctest --output-on-failure manually in the tmate session.

@oblivioncth
Copy link

oblivioncth commented Jun 28, 2023

Still an issue for me as well with the 2022 runner image and ctest -C Release --output-on-failure

Seems this was an issue for CLion in the past so I suppose there might be something slightly unusual about how ctest invokes the child test processes and configures their output.

EDIT
Interesting. This was briefly touched on by the OP but I did not realize this at first:

This issues seems to have nothing to do with CTest nor the arguments passed to it. Updating one of my workflow jobs to directly run a failing test executable like so (echo used as a sanity check)

echo "It's me, powershell"
./pxcrypt_tst_consistent_rng

Simply results in:

It's me, powershell
Error: Process completed with exit code 1.

The entirety of the test executable's output is dropped : /

@louis-langholtz
Copy link

louis-langholtz commented Jul 3, 2023

I'm seeing this issue too. See https://github.com/louis-langholtz/PlayRho/actions/runs/5426258835/jobs/9868070006?pr=483 for an example. Search for "Failed".

I'd even added the --verbose flag, hoping that might help get some signal on this (to ctest --output-on-failure --build-config Debug). It does appear more verbose, but still isn't showing any explanation for why the tests fail nor why output on that might not be getting logged.

Making matters worse, I cannot get the same tests to fail on my local Windows system but suspect that the failures indicate a real issue.

As a potentially helpful data point, this seems like a new problem - at least that I'm seeing for the project - in the sense that perhaps something changed on the runner or workflow environment within the last few months. I think I recall prior to that time, with different pull requests, seeing unit test failures on the Windows Debug runs that had the expected details.

@silverqx
Copy link

silverqx commented Jul 3, 2023

Making matters worse, I cannot get the same tests to fail on my local Windows system but suspect that the failures indicate a real issue.

@louis-langholtz I was exactly in the same situation a few months ago, on localhost ok but on GitHub Action was failing. The big problem was that there was no way for me to investigate why it was failing because of this bug.

This problem should be fixed because it is causing bad headaches 😵‍💫

For me the solution was the mxschmitt/action-tmate@v3, connect directly to the runner, invoke ctest manually, and obtain the output.

@louis-langholtz
Copy link

louis-langholtz commented Jul 3, 2023

I'm gonna call the mystery of dropping failure output that I was dealing with, solved. I don't believe in my case that the issue was due to the Windows runner as much as it has to do with the default Windows C++ STL runtime environment. No idea though whether what I discovered explains anyone else's missing test failure output.

In my particular case, it appears there's no output by default under Windows default std::terminate handler and that was getting invoked in my test runs for shared library builds of my project. This was due to an unexpected state that occurs causing an exception to get thrown during destruction which presumably invokes std::terminate.

Data points...

Tried adding 2>&1 as in:

ctest --output-on-failure --verbose --build-config Debug 2>&1

That seems to have made no difference.

Tried setting shell to bash as in adding

shell: bash

to the Test step. No difference I could see with that either.

Replaced --output-on-failure With Artifact

Removed:

  • --output-on-failure

Added:

  • continue-on-error: true
  • Archive test results step w/ name: test-output-${{ matrix.build_type }}-${{ matrix.platform }} for path: D:/a/PlayRho/PlayRho/Build/Testing/Temporary/LastTest.log.

Downloading the Debug test run results, I see the failures as expected, but still not the expected failure output info.

Failing test exiting status of 1/3

Windows C++ standard library has more checking support it turns on for debug builds that exercises and catches issues gcc/clang doesn't. I see now that at least the first test failing for my project is exiting with code 1 - an unsuccessful exit status. So perhaps what I'm encountering is that the runner isn't helping to recognize crashing/aborting tests in this context. Running the unit tests executable directly on the runner, I'm seeing it's exiting with status 3. That's what Windows uses as the abort status code.

Reproduced Locally

I was able to locally reproduce the issue I was seeing by building my library as a shared library like was occurring in the Windows runner environment. Building my library static, was otherwise the default and all my unit tests run fine in that configuration just like they have on the runner.

Catching SIGABRT

I installed a signal handler for SIGABRT which I found was being invoked for that signal. This confirms for me that the exit status of 3 I have seen is indeed from abort getting called.

Isolated The Abort As Occurring Within Stack Unwinding

Using the Visual Studio debugger, I was able to reproduce the aborts during stack unwinding. The library I'm working on does some work in a destructor in this context that was calling functionality which was throwing an exception. Removing the throw, made the aborts go away.

Installed Custom Terminate Handler

My test application had been using the pre-existing/pre-installed terminate handler. Setting my own terminate handler (via std::set_terminate) which writes to std::cerr, I'm able to see output that confirms the exception described above and its message as occurring in a terminating context. So presumably, the default Windows terminate handler is just calling std::abort and not showing a message like is done in LLVM's STL implementation that calls demangling_terminate_handler.

Source code for the Windows default terminate handler appears to be available under /c/Program Files (x86)/Windows Kits/10/Source. It just calls abort by default. And quietly cause program termination; unlike LLVM's implementation. Haven't checked GCC's implementation.

@silverqx
Copy link

silverqx commented Jul 4, 2023

We can cry here maybe forever, until it will be reported at CMake issue tracker, than maybe somebody fixes it, and this could be why this problem exists so long unfixed. I wrote it like mental, but it's early morning and I'm still in half sleep 🙂

@louis-langholtz
Copy link

louis-langholtz commented Jul 4, 2023

@silverqx,

I just noticed that @oblivioncth earlier reported that:

This issues seems to have nothing to do with CTest nor the arguments passed to it.

Also @jgehrig reported that:

The only answer I can come up with is output is being dropped by runner.

@silverqx
Copy link

silverqx commented Jul 4, 2023

I don't think that GitHub Actions are doing anything special than redirecting stderr and stdout outputs, I think the problem will be on CMake's side but I'm just guessing, the problem can be anywhere between or on any side.

Maybe the problem is the CMake is outputting to some other output or pipe channel than the GitHub Actions are expecting. I had a similar problem with the scoop, it was outputting to some info output (7) so after the redirection I had an empty log, but it was pwsh related.

louis-langholtz added a commit to louis-langholtz/PlayRho that referenced this issue Jul 5, 2023
Restores windows workflow to a closer semblance of what it used to be.
Seems there's an issue with the github runner for windows that's not
passing along the ctest failure info. See:
actions/runner#1206
@louis-langholtz
Copy link

louis-langholtz commented Jul 5, 2023

Update: I just opened a discussion in the community forum for this at https://github.com/orgs/community/discussions/60149. I'd searched for any similar discussions, but found none. Which is odd seeming to me. I mean why might we be the only ones seeing this problem? Or why is my project only recently seeing this problem on Windows when @jgehrig seems to have first seen this in 2021?

Update 2023/07/16: I've closed that discussion. I've tracked down the problem I've been seeing to the pre-installed terminate handler; I don't believe that Windows runner would itself be customizing that handler. See my other comment for explanation of what was occurring in my particular situation.

louis-langholtz added a commit to louis-langholtz/PlayRho that referenced this issue Jul 14, 2023
Restores windows workflow to a closer semblance of what it used to be.
Seems there's an issue with the github runner for windows that's not
passing along the ctest failure info. See:
actions/runner#1206
louis-langholtz added a commit to louis-langholtz/PlayRho that referenced this issue Jul 15, 2023
- Updates to windows-2022 runner and accompanying MSVC updates.
- Adds step to archive build results.

There may be an issue with windows that's not showing info like from
exception messages that terminate the application.
See: actions/runner#1206
A particular issue that was found under windows, were aborts generating
SIGABRT that apparently were due to exceptions propogating out from
a destructor without any output messages containing any of the exception
text info.
louis-langholtz added a commit to louis-langholtz/PlayRho that referenced this issue Jul 21, 2023
- Updates to windows-2022 runner and accompanying MSVC updates.
- Adds step to archive build results.

There may be an issue with windows that's not showing info like from
exception messages that terminate the application.
See: actions/runner#1206
A particular issue that was found under windows, were aborts generating
SIGABRT that apparently were due to exceptions propogating out from
a destructor without any output messages containing any of the exception
text info.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Runner Bug Bug fix scope to the runner
Projects
None yet
Development

No branches or pull requests

8 participants