Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colcon does not show failed build output if using Ninja generator #67

Closed
rotu opened this issue Mar 13, 2020 · 22 comments
Closed

Colcon does not show failed build output if using Ninja generator #67

rotu opened this issue Mar 13, 2020 · 22 comments
Labels
question Further information is requested

Comments

@rotu
Copy link
Contributor

rotu commented Mar 13, 2020

When using Ninja as CMake's build system generator, colcon does not show error information:

colcon build --cmake-args " -GNinja" --packages-select rmw_cyclonedds_cpp
Starting >>> rmw_cyclonedds_cpp
Failed   <<< rmw_cyclonedds_cpp	[ Exited with code 1 ] 

Summary: 0 packages finished [3.15s]
  1 package failed: rmw_cyclonedds_cpp

These errors do show up in build logs:

build_2020-03-13_15-32-21.zip

@dirk-thomas dirk-thomas added the question Further information is requested label Mar 13, 2020
@dirk-thomas
Copy link
Member

Does Ninja output the error messages to stderr? If not, colcon doesn't know about them and by default hides stdout output.

@rotu
Copy link
Contributor Author

rotu commented Mar 13, 2020

As you can see in the attached log files, Ninja redirects the stderr of subprocesses to its stdout.

@dirk-thomas
Copy link
Member

dirk-thomas commented Mar 13, 2020

This seems similar to colcon/colcon-core#236. There would need to be custom logic to redirect the specific Ninja error message to stderr. Otherwise colcon can't do anything about it.

So the question is are you interested in implementing custom logic for this? If yes, I would be happy to review a pull request.

@rotu
Copy link
Contributor Author

rotu commented Mar 14, 2020

Oof. I just looked into what it would take. The assumption that "the only feedback from build tools the user is immediately interested in is stderr" seems buried deep in the architecture of colcon_core.subprocess.run/check_output. So this is going to be less trivial that I assumed.

Before I start:

  1. Why does colcon run subprocesses in pseudoterminals?
  2. How does colcon prevent stderr output from subprocesses being interleaved in its own stdout? It seems like there are no precautions.

@dirk-thomas
Copy link
Member

Why does colcon run subprocesses in pseudoterminals?

Because invoked tools operate differently when being invoked in a shell / pseudoterminal rather than without. One example is colorization of compiler warnings. Without using a pseudoterminal the compiler coming through e.g. make is not colorized anymore.

@dirk-thomas
Copy link
Member

How does colcon prevent stderr output from subprocesses being interleaved in its own stdout? It seems like there are no precautions.

There are only two possible options:

  1. redirect both streams to a single stream and with that loose the ability to distinguish what output came via stdout and stderr or
  2. capture both streams separately in order to handle them differently but with the downside that there is no way to synchronize them - the best you can do is read / handle them as fast as possible to introduce as little as possible "order change".

@rotu
Copy link
Contributor Author

rotu commented Mar 25, 2020

Because invoked tools operate differently when being invoked in a shell / pseudoterminal rather than without. One example is colorization of compiler warnings. Without using a pseudoterminal the compiler coming through e.g. make is not colorized anymore.

Okay. So $MAKE_TERMOUT and $MAKE_TERMERR would fix the output for Make, if the output of Colcon is indeed going to a terminal and not a log file. It would also work on Windows, where PTYs are not supported. Is there another place this breaks functionality?

@dirk-thomas
Copy link
Member

So $MAKE_TERMOUT and $MAKE_TERMERR would fix the output for Make,

What about any other invoked command which might have shell/pseudoterminal conditional behavior? The current approach of using PTYs works for all of them - I don't see the reason to replace it with an alternative approach which as in this case is tool specific and we don't know for which other tools we would need something similar.

@rotu
Copy link
Contributor Author

rotu commented Mar 25, 2020

Better yet, CLICOLOR_FORCE and CLICOLOR, which CMake respects already and should pass on to makefile.

The reason I propose this is because the pty code is fragile. It uses limited resources (pty's are finite), it prevents the process from exiting (this patch makes ctrl-c much more responsive), it causes hangs due to async issues (see comment in code about having to put in an endline to work around bugs), and it makes flow control weird (when a process ends, it should release its resources as it unwinds the stack, but because pty's are conditionally used, there isn't a finally block to put this cleanup in).

@dirk-thomas
Copy link
Member

Better yet, CLICOLOR_FORCE and CLICOLOR, which CMake respects already and should pass on to makefile.

And this that doesn't address the various other tools invoked: e.g. bazel, gradle, etc.,

@rotu
Copy link
Contributor Author

rotu commented Mar 25, 2020

I think I'll need to ponder on this. While using a pty will make Bazel use color, it might also make it use curses, which seems like an accident. It also seems like a bad idea in Gradle to use rich output until this is resolved gradle/gradle#6843

@rotu
Copy link
Contributor Author

rotu commented Mar 25, 2020

We also don’t want build tools to see that a terminal is interactive and wait for user input from a user who never sees the prompt (that may be why CMake ExternalProjects can take so long before failing... because they’re waiting on interactive input)

@dirk-thomas
Copy link
Member

dirk-thomas commented Mar 26, 2020

The check_call() API already exposes the use_pty option. I would suggest to update code which uses that API (e.g. the CMake build task) to explicitly pass False while also setting the necessary environment variables to ensure the current behavior is maintained. Once all existing code has been migrated to use use_pty=False the option can be deprecated.

@dirk-thomas
Copy link
Member

@rotu Since the discussion has been going a bit of topic (about use_pty) what about the topic in the title of the ticket and my original question about how to proceed?

This seems similar to colcon/colcon-core#236. There would need to be custom logic to redirect the specific Ninja error message to stderr. Otherwise colcon can't do anything about it.

So the question is are you interested in implementing custom logic for this? If yes, I would be happy to review a pull request.

@rotu
Copy link
Contributor Author

rotu commented Apr 1, 2020

I don't know the right approach to fixing this but I don't think it should be special treatment for Ninja. I think the best solution would be, when a subprocess fails, to print a postmortem with both the subprocess's stdout and stderr.

I'm not comfortable doing it before use_pty goes away because I think that code is too fragile.

@dirk-thomas
Copy link
Member

I think the best solution would be, when a subprocess fails, to print a postmortem with both the subprocess's stdout and stderr.

Atm when a task fails the stderr is being shown which seems to be the information you would want to see (rather than all of stdout which can be very verbose). That approach works very well in many use cases - not for Ninja obviously.

I don't think the behavior should change to accommodate the fact that Ninja doesn't print error message to stderr. Maybe reporting this upstream at the Ninja project might be an option.

@dirk-thomas
Copy link
Member

Te question about the Ninja output (which doesn't print error message to stderr, same as e.g. msbuild) has been answered and needs to be either:

  • addressed upstream in Ninja or
  • handled with custom logic to extract errors from Ninja (which I don't think is something we should do since it is likely fragile)

The pty discussion was off topic anyway. Please feel free to follow up on those independently.

So I am going to close the ticket since there is nothing to be done in this repo atm. @rotu Please post a link to an upstream ticket as soon as you have created one so future readers can check on the status.

@tylerjw
Copy link

tylerjw commented Jul 8, 2020

Sadly it looks like the ninja project itself considers "feature" to be not-useful: ninja-build/ninja#1537

Their reasoning is that the error message is not useful without the context of what command caused it and therefore if you care about why it failed, you also care about what command was called when it failed.

The suggested solution there is this:

NINJA_STATUS="[ninja][%f/%t]
filter() {
  local l n
  while read -r l; do
    n=${l#\[ninja\]}
    [ "x$l" != "x$n" ] && printf "%s\n" "$n" || printf "%s\n" "$l" >&2
  done
}

NINJA_STATUS="[ninja][%f/%t] " ninja | filter

I don't know how difficult this would be to implement (here) but it is simple enough. I'll see if I can figure out how to add this logic.

@rotu
Copy link
Contributor Author

rotu commented Sep 1, 2020

This ninja PR seems like we could hook into it to get failure output: ninja-build/ninja#1600

@alecGraves
Copy link

BTW, if anyone is looking for a way to use ninja builds and still see build errors, colcon can output stdout for packages during a build:

colcon build --event-handlers console_cohesion+

or for immediate build output (lines interleaved for parallel builds):

colcon build --event-handlers console_direct+

Taken from: https://colcon.readthedocs.io/en/released/user/how-to.html
Also mentioned here: https://answers.ros.org/question/363112/how-to-see-compiler-invocation-in-colcon-build/

@rsarrazin2
Copy link

@alecGraves I took the liberty of referencing your workaround in How can I make colcon output a plain CMake project's output with the ninja generator?. If you have a stackoverflow account I'd be happy to tag you explicitly.

@Ryanf55
Copy link

Ryanf55 commented Feb 10, 2024

Hi there,

Would to colcon team accept a contribution that enables event-handlers=console_cohesion+ when CMAKE_GENERATOR=Ninja?
We switched to using Ninja because it was faster on most of our builds, but if you forget to add this and the build fails, you don't get the output.

I also don't think this issue is solved; the workaround is not a solution as this behavior is not expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

6 participants