Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include phase reporting in log file for ssh-ng builds #9280

Merged

Conversation

r-vdp
Copy link
Contributor

@r-vdp r-vdp commented Nov 2, 2023

Motivation

When doing local builds, we get phase reporting lines in the log file, they look like

@nix {"action":"setPhase","phase":"unpackPhase"}

With the ssh-ng protocol, we do have access to these messages, but since we are only including messages of type resBuildLogLine in the logs, the phase information does not end up in the log file.

The phase reporting could probably be improved altoghether (it looks like it is kind of accidental that these JSON messages for phase reporting show up but others don't, just because they are actually emitted by nixpkgs' stdenv), but as a first step I propose to make ssh-ng behave in the same way as local builds do.

I looked at legacy ssh remote builds as well, but fixing those would take some more effort and I'm not sure that it's worth it.

Context

Priorities

Add 👍 to pull requests you find important.

@r-vdp r-vdp force-pushed the rvdp/fix_remote_logging_phase_reporting branch from a39f1d2 to d032675 Compare November 2, 2023 16:37
@r-vdp r-vdp force-pushed the rvdp/fix_remote_logging_phase_reporting branch 2 times, most recently from c5e5367 to 694cc2d Compare November 3, 2023 10:08
@github-actions github-actions bot added documentation new-cli Relating to the "nix" command with-tests Issues related to testing. PRs with tests have some priority store Issues and pull requests concerning the Nix store fetching Networking with the outside (non-Nix) world labels Nov 3, 2023
@r-vdp r-vdp force-pushed the rvdp/fix_remote_logging_phase_reporting branch from 694cc2d to 17f6204 Compare November 3, 2023 10:09
@github-actions github-actions bot removed documentation new-cli Relating to the "nix" command with-tests Issues related to testing. PRs with tests have some priority store Issues and pull requests concerning the Nix store fetching Networking with the outside (non-Nix) world labels Nov 3, 2023
When doing local builds, we get phase reporting lines in the log file,
they look like '@nix {"action":"setPhase","phase":"unpackPhase"}'.
With the ssh-ng protocol, we do have access to these messages, but since we
are only including messages of type resBuildLogLine in the logs, the phase
information does not end up in the log file.

The phase reporting could probably be improved altoghether (it looks like it
is kind of accidental that these JSON messages for phase reporting show up
but others don't, just because they are actually emitted by nixpkgs' stdenv),
but as a first step I propose to make ssh-ng behave in the same way as local builds do.
@r-vdp r-vdp force-pushed the rvdp/fix_remote_logging_phase_reporting branch from 17f6204 to 60b3639 Compare November 3, 2023 11:30
@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Nov 4, 2023

With the ssh-ng protocol, we do have access to these messages, but since we are only including messages of type resBuildLogLine in the logs, the phase information does not end up in the log file.

Without knowing any details, it seems weird to me that there is any distinction of log messages to begin with. I would have assumed stdout and stderr from the derivation builder are simply captured as they are. Why would Nix care about the concrete message contents?

What I'm trying to say is that I wonder which exact problem this change is trying to solve, and that we should clarify that and make sure it's the right one.

@r-vdp
Copy link
Contributor Author

r-vdp commented Nov 6, 2023

With the ssh-ng protocol, we do have access to these messages, but since we are only including messages of type resBuildLogLine in the logs, the phase information does not end up in the log file.

Without knowing any details, it seems weird to me that there is any distinction of log messages to begin with. I would have assumed stdout and stderr from the derivation builder are simply captured as they are. Why would Nix care about the concrete message contents?

What I'm trying to say is that I wonder which exact problem this change is trying to solve, and that we should clarify that and make sure it's the right one.

The message types are part of the remote build protocol, which uses JSON to communicate between the remote builder and the machine initiating the build. One of those message types contains the stdout and stderr output indeed, but there are other things that are communicated between the two machines, which is why we have other types of messages as well. Those other messages are not generated from the build output, they are generated by nix directly.

There is one big exception to this (AFAIK), which is the JSON messages signaling phase starts. Phases are a nixpkgs stdenv concept, but this concept partly leaked into nix. Even though nix is mostly unaware of phases.
However, there is a type of JSON message to signal phase starts, which is used for instance for the progress bar in nix3 commands.

The nixpkgs stdenv does not have a way of sending messages through a separate channel to the nix daemon though, and so it abuses the fact that when nix looks at the build log, and sees a message that looks like the nix JSON format (something like @nix { ... }, it interpretes that as a protocol message and not as normal build output. So the stdenv generates such lines as part of the build output and those are then intercepted by nix.

All of this is rather tricky and it would probably be a good idea at some point to give a build script (like the stdenv builder) a separate channel to communicate this type of information back to nix. But that would be a big change that would probably require more upfront discussion to agree on the desired design.

What this PR is trying to solve, is an incoherence between local builds and remote ssh-ng builds:

  1. For local builds, all build output generated by the build script including such JSON messages for phase signaling are written to the build log on disk as-is (but they are not included in the stderr of the nix build command).
  2. For remote ssh-ng builds however, the type confusion happens on the remote builder and so those JSON messages generated by stdenv are sent to the initiating machine as a different type of message, not a normal build output line, and therefore they do not end up in the log file on disk.

This is annoying, because whether we want to eventually redesign all of this or not, currently we have a difference in behaviour between remote and local builds, even though I think it would be fair to assume that the build log for a local vs a remote build should be the same.

So in order to align the behaviour, this PR proposes to re-insert those messages, that were originally written to stderr by the nixpkgs stdenv, into the build log that's written to disk, so that we get exactly the same build log.

@fricklerhandwerk
Copy link
Contributor

Thank you so much for this detailed explanation! That should be in the relevant docstring almost verbatim – it's not prominent enough in the commit message. I'll merge a PR without nitpicking.

The situation makes my toe nails curl up. And it feels very wrong to further cement it by encoding even more stdenv-specific knowledge into the deepest internals of Nix.

I'm strongly against this change, because I fear it will a major headache for whoever will touch that code, a few months or years in the future. It does not make ssh-ng behave like local builds at all, only makes it look like it. I don't think we should support such a hack. This should not be a concern at all at the protocol level. If @nixos/nix-team still really badly wants that in for some reason, then only if that chunk of code is plastered with apologies and lots of useful suggestions how to get out of that mess again.

nix build can still be changed, the progress indicator output can still be changed. All of this is superficial anyway. The remote builder protocol can at least be somewhat contained, and surely it will be annoying at best to transition to a different one that supports encoding message types outside of their contents. But that's the stuff we should be focusing on in my opinion, more than anything else.

@r-vdp
Copy link
Contributor Author

r-vdp commented Nov 8, 2023

I don't think I agree with the analysis. We are not further cementing anything, this is a pretty minor change that will simply include a couple of additional lines in the log file.

And I didn't claim that we should make ssh-ng behave like local builds, which is difficult for several reasons, but we can try and make the fact that a derivation was built remotely as transparent as possible, and this PR is removing one observable inconsistency in the resulting build artifacts.

I do agree that the coupling of nix to the nixpkgs stdenv is not ideal, but I don't think that this PR makes it any harder to refactor that in the future.

{"action", "setPhase"},
{"phase", phase}
});
(*logSink)("@nix " + logLine.dump(-1, ' ', false, nlohmann::json::error_handler_t::replace) + "\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't

(*logSink)(data);

work as well? Since it seems like we're just printing the JSON message that we just parsed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, it's not the same format. We want to print the JSON message as it was emitted by the build script, which looks like @nix { "action": "setPhase", "phase": "unpackPhase" }, but what we receive in DerivationGoal::handleChildOutput looks like @nix {"action":"result","fields":["unpackPhase"],"id":810782451302409,"type":104}.
That's why we parse the internal JSON message and then write it out in the same format as the build script.

Copy link
Member

@edolstra edolstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer term, it would be better to have a way to tell a remote builder to pass through the raw build output verbatim and disable it from interpreting the @nix messages itself. But this PR looks good to me in the interim.

@thufschmitt thufschmitt merged commit b1e7d7c into NixOS:master Nov 17, 2023
8 checks passed
@r-vdp r-vdp deleted the rvdp/fix_remote_logging_phase_reporting branch November 17, 2023 13:48
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-11-17-nix-team-meeting-minutes-104/35753/1

@fricklerhandwerk fricklerhandwerk added the error-messages Confusing messages and better diagnostics label Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error-messages Confusing messages and better diagnostics
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

6 participants