Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssh-ng: Set log-fd for ssh to 4 by default #7659

Closed
wants to merge 1 commit into from

Conversation

Ma27
Copy link
Member

@Ma27 Ma27 commented Jan 22, 2023

Motivation

That's expected by build-remote and makes sure that errors are correctly forwarded to the user.

For instance, let's say that the host-key of example.org is unknown and

nix-build ../nixpkgs -A hello -j0 --builders 'ssh-ng://example.org'

is issued, then you get the following, somewhat cryptic error:

error: cannot open connection to remote store 'ssh-ng://example.org': error: unexpected end-of-file

The relevant information (Host key verification failed) ends up in the daemon's log, but that's not very obvious considering that the daemon isn't very chatty normally.

This can be fixed - the same way as its done for legacy-ssh - by passing fd 4 to the SSH wrapper. Now you'd get the following error:

error: cannot open connection to remote store 'ssh-ng://example.org': error: unexpected end-of-file: Host key verification failed.

...and now it's clear what's wrong.

Context

See feefcb3 for the corresponding fix in ssh://.

cc @edolstra @thufschmitt

Checklist for maintainers

Maintainers: tick if completed or explain if not relevant

  • agreed on idea
  • agreed on implementation strategy
  • tests, as appropriate
    • functional tests - tests/**.sh
    • unit tests - src/*/tests
    • integration tests
  • documentation in the manual
  • code and comments are self-explanatory
  • commit message explains why the change was made
  • new feature or bug fix: updated release notes

@Ericson2314
Copy link
Member

On the other hand, I recall that this wasn't done before because ssh-ng doesn't "normally" use stderr, but instead interleaves log messages in stdout with a protocol. We indeed should have some solution for dealing with these "pre handshake established" issues, but it is an awkward businesses.

I am correct about that, it may make sense to coordinate with @balsoft who is working on that handshaking in #6628. As I said there, I hope we also get #5265 which will allow more robust remote servers long term.

@Ma27
Copy link
Member Author

Ma27 commented Jan 23, 2023

@Ericson2314 but doesn't build-remote need to pick up these messages anyways somehow? Not sure if I understand what you mean.

However: I agree that the solution may not be pretty, but I think that getting rid of that cryptic error message is a fairly useful improvement on its own :)

@Ericson2314
Copy link
Member

Ericson2314 commented Jan 23, 2023

@Ma27 so remember ssh-ng:// is the same protocol as unix://, but the later is just a single stream over a single file descriptor. So multiplexiing is necessary. Log messages only go on just stderrr before the connection is established, during the handshake.

@edolstra
Copy link
Member

I think this is wrong, because (IIRC) it will cause ssh errors/warnings to end up in the derivation log. The intent is that ssh's stderr goes to the hook's stderr, which gets processed in DerivationGoal::handleChildOutput(). But that function might be ignoring non-JSON messages at the moment.

@Ericson2314
Copy link
Member

@edolstra yeah if nothing else I find the SSH situation very complex and confusing. I hope we can merge your #5265 and also do #5025 (comment), and then we will have a baseline thing to test which is far simpler (less indirection, just 1 file descriptor, etc.)

@Ma27
Copy link
Member Author

Ma27 commented Jan 28, 2023

@edolstra not sure if I missed something, but I don't think so. After removing the relevant entries from ~root/.ssh/known_hosts I get the following error when e.g. trying to build grafana remotely:

❯ ~/Projects/nixpkgs master* → nix-build -A grafana -j0
this derivation will be built:
  /nix/store/d07khgr5bkj9gabll8q0qq8llbxs283b-grafana-9.3.6.drv
cannot build on 'ssh-ng://builder-ng': error: cannot open connection to remote store 'ssh-ng://builder-ng': error: unexpected end-of-file: Host key verification failed.
Failed to find a machine for remote build!
derivation: d07khgr5bkj9gabll8q0qq8llbxs283b-grafana-9.3.6.drv
required (system, features): (x86_64-linux, [])
1 available machines:
(systems, maxjobs, supportedFeatures, mandatoryFeatures)
([x86_64-linux], 10, [big-parallel, kvm, nixos-test], [])
error: unable to start any build; remote machines may not have all required system features.
       https://nixos.org/manual/nix/stable/advanced-topics/distributed-builds.html

However when I run nix log /nix/store/d07khgr5bkj9gabll8q0qq8llbxs283b-grafana-9.3.6.drv I get the log from the previous (failed) build. When I alter the derivation (and no build-attempt has happened), I get the following error when trying to do nix log:

❯ ~/Projects/nixpkgs master* → nix log /nix/store/v54lcx5rp0jbrlvvx3zpg9kwcj46h6s8-grafana-9.3.6.drv
error: build log of '/nix/store/v54lcx5rp0jbrlvvx3zpg9kwcj46h6s8-grafana-9.3.6.drv!*' is not available

@Ma27
Copy link
Member Author

Ma27 commented Feb 14, 2023

ping @edolstra did I do anything wrong with my example, AFAIU this doesn't write handshake errors into the derivation's log.

Log messages only go on just stderrr before the connection is established, during the handshake.

@Ericson2314 Well they do (for ssh-ng://) now because stderr is dup2-ed from logFD (4) which is where that ends up in build-remote? Or can you give me an example where this is problematic (sorry, haven't hacked too much in this area of Nix, so sorry for needing some time to understand 😅 )

@Ericson2314
Copy link
Member

@Ma27 I don't know all the details myself, but check out RemoteStore::Connection::processStderr. This is used both for the unix:// (including regular daemon) and ssh-ng:// remote stores.

@Ma27
Copy link
Member Author

Ma27 commented Feb 15, 2023

@Ericson2314 then this shouldn't be a problem, right? The from in RemoteStore::Connection::processStderr where messages are read from is the stdout of the SSH connection. This change only dup2()es stderr to 4, so messages on that fd shouldn't be affected at all.

Relevant code from ssh.cc:

        if (dup2(out.writeSide.get(), STDOUT_FILENO) == -1)
            throw SysError("duping over stdout");
        if (logFD != -1 && dup2(logFD, STDERR_FILENO) == -1)
            throw SysError("duping over stderr");

@Ericson2314
Copy link
Member

Ericson2314 commented Feb 15, 2023

@Ma27 the issue is not that this PR changes things, but that we have two sources of truth as to error messages -- stdout errors part of the protocol and the other fd out of band.

@Ma27
Copy link
Member Author

Ma27 commented Feb 16, 2023

@Ericson2314 ok, now we're talking 😄

I agree, this should be fixed properly, however I think that this is a usability issue IMHO, so we could merge this as workaround now and make it prettier in #6628?

@Ma27
Copy link
Member Author

Ma27 commented Feb 23, 2023

also gentle ping @edolstra for opinions :)

@Ericson2314
Copy link
Member

Yeah I am happy to defer to others about whether this is a good temporary solution.

@Ma27
Copy link
Member Author

Ma27 commented Feb 25, 2023

hmm, I guess it makes sense to get input from e.g. @NixOS/nix-team

@fricklerhandwerk
Copy link
Contributor

Put it on the project board for triage.

@fricklerhandwerk fricklerhandwerk added UX The way in which users interact with Nix. Higher level than UI. error-messages Confusing messages and better diagnostics labels Mar 3, 2023
@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Mar 22, 2023

Triaged in the Nix team meeting 2023-03-17:

Assigned to @edolstra for another review, asked for by @Ericson2314.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-03-17-nix-team-meeting-minutes-41/26614/1

That's expected by `build-remote` and makes sure that errors are
correctly forwarded to the user.

For instance, let's say that the host-key of `example.org` is unknown
and

    nix-build ../nixpkgs -A hello -j0 --builders 'ssh-ng://example.org'

is issued, then you get the following, somewhat cryptic error:

    error: cannot open connection to remote store 'ssh-ng://example.org': error: unexpected end-of-file

The relevant information (`Host key verification failed`) ends up in the
daemon's log, but that's not very obvious considering that the daemon
isn't very chatty normally.

This can be fixed - the same way as its done for legacy-ssh - by passing
fd 4 to the SSH wrapper. Now you'd get the following error:

    error: cannot open connection to remote store 'ssh-ng://example.org': error: unexpected end-of-file: Host key verification failed.

...and now it's clear what's wrong.
@github-actions github-actions bot added the store Issues and pull requests concerning the Nix store label Jun 18, 2023
@Ma27
Copy link
Member Author

Ma27 commented Jun 18, 2023

Rebased against latest master @fricklerhandwerk @Ericson2314 @edolstra

@Ma27
Copy link
Member Author

Ma27 commented Jul 25, 2023

friendly ping @edolstra

@Ma27
Copy link
Member Author

Ma27 commented Sep 24, 2023

gentle ping @edolstra @thufschmitt @fricklerhandwerk

@Ma27
Copy link
Member Author

Ma27 commented Jan 20, 2024

ping @NixOS/nix-team even though this may not be perfect, I still think it's a step forward in terms of useful error messages, so I'd really appreciate this being considered to be merged.

@edolstra edolstra removed their assignment Apr 26, 2024
@Ma27
Copy link
Member Author

Ma27 commented Apr 26, 2024

Closing due to lack of interest.

@Ma27 Ma27 closed this Apr 26, 2024
@Ma27 Ma27 deleted the fix-log-fd-for-ssh-ng branch April 26, 2024 20:42
lf- pushed a commit to lix-project/lix that referenced this pull request Apr 28, 2024
That's expected by `build-remote` and makes sure that errors are
correctly forwarded to the user. For instance, let's say that the
host-key of `example.org` is unknown and

    nix-build ../nixpkgs -A hello -j0 --builders 'ssh-ng://example.org'

is issued, then you get the following output:

    cannot build on 'ssh-ng://example.org?&': error: failed to start SSH connection to 'example.org'
    Failed to find a machine for remote build!
    derivation: yh46gakxq3kchrbihwxvpn5bmadcw90b-hello-2.12.1.drv
    required (system, features): (x86_64-linux, [])
    2 available machines:
    [...]

The relevant information (`Host key verification failed`) ends up in the
daemon's log, but that's not very obvious considering that the daemon
isn't very chatty normally.

This can be fixed - the same way as its done for legacy-ssh - by passing
fd 4 to the SSH wrapper. Now you'd get the following error:

    cannot build on 'ssh-ng://example.org': error: failed to start SSH connection to 'example.org': Host key verification failed.
    Failed to find a machine for remote build!
    [...]

...and now it's clear what's wrong.

Please note that this is won't end up in the derivation's log.

For previous discussion about this change see
NixOS/nix#7659.

Change-Id: I5790856dbf58e53ea3e63238b015ea06c347cf92
lf- pushed a commit to lix-project/lix that referenced this pull request May 4, 2024
That's expected by `build-remote` and makes sure that errors are
correctly forwarded to the user. For instance, let's say that the
host-key of `example.org` is unknown and

    nix-build ../nixpkgs -A hello -j0 --builders 'ssh-ng://example.org'

is issued, then you get the following output:

    cannot build on 'ssh-ng://example.org?&': error: failed to start SSH connection to 'example.org'
    Failed to find a machine for remote build!
    derivation: yh46gakxq3kchrbihwxvpn5bmadcw90b-hello-2.12.1.drv
    required (system, features): (x86_64-linux, [])
    2 available machines:
    [...]

The relevant information (`Host key verification failed`) ends up in the
daemon's log, but that's not very obvious considering that the daemon
isn't very chatty normally.

This can be fixed - the same way as its done for legacy-ssh - by passing
fd 4 to the SSH wrapper. Now you'd get the following error:

    cannot build on 'ssh-ng://example.org': error: failed to start SSH connection to 'example.org': Host key verification failed.
    Failed to find a machine for remote build!
    [...]

...and now it's clear what's wrong.

Please note that this is won't end up in the derivation's log.

For previous discussion about this change see
NixOS/nix#7659.

Change-Id: I5790856dbf58e53ea3e63238b015ea06c347cf92
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error-messages Confusing messages and better diagnostics store Issues and pull requests concerning the Nix store UX The way in which users interact with Nix. Higher level than UI.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

5 participants