Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing the reliance on netcat #374

Closed
evanfarrar opened this issue Dec 21, 2017 · 17 comments
Closed

Removing the reliance on netcat #374

evanfarrar opened this issue Dec 21, 2017 · 17 comments
Labels

Comments

@evanfarrar
Copy link
Member

The functionality of netcat is almost identical to the Go networking standard libraries, yet when users supply a proxy to bosh ssh then the CLI will shell out to SSH which will shell out to nc.

Unfortunately, netcat is a very old and storied program, and as a result it is not always consistent which flavor of nc a user will have installed on their system. Additionally, though it can now be expected that windows users will have a reasonable version ssh these days, it is still uncommon to have nc.

I propose that we make a new bosh subcommand, bosh nc, and when a SOCKS5 proxy is supplied to bosh ssh then this command is supplied as the ProxyCommand to OpenSSH instead of nc. We could reflect on what the name of the command we used for bosh ssh was (e.g. bosh or bosh2), and consistently use that same invocation name for ProxyCommand.

@cppforlife
Copy link
Contributor

cppforlife commented Dec 21, 2017 via email

@evanfarrar
Copy link
Member Author

evanfarrar commented Dec 21, 2017

https://cloudfoundry.slack.com/archives/C2DBC3YGZ/p1513631526000248

I believe this nc runs locally, it is the proxy client in this case:

Running SSH:
  1 error(s) occurred:

* Running command: 'ssh -tt -o ServerAliveInterval=30 -o ForwardAgent=no -o PasswordAuthentication=no -o IdentitiesOnly=yes -o IdentityFile=/Users/jtarchie/.bosh/tmp/ssh-priv-key848031265 -o StrictHostKeyChecking=yes -o UserKnownHostsFile=/Users/jtarchie/.bosh/tmp/ssh-known-hosts023177484 -o ProxyCommand=nc -x localhost:56164 %!h(MISSING) %!p(MISSING) 10.0.31.190 -l bosh_697c0a05ff9240a', stdout: '', stderr: '': exit status 255

Exit code 1

@evanfarrar
Copy link
Member Author

I've encountered this again today with a user. They preferred CentOS over Ubuntu, so they stood up a CentOS box on GCP as a jumpbox to run BBL from because they also preferred windows over linux. After deploying BOSH with bbl successfully, they attempted to deploy concourse, and we attempted to debug it with bosh ssh web/0. We got: nc command not found. So, install it, right? The yum repos only contain nmap-ncat and install an alias for ncat as nc, so we installed it, and that also didn't work.

Technically we could introspect the help for netcat to figure out which flavor of netcat is installed, but there are THREE popular variants of netcat: BSD, nmap, and GNU. We're not doing a lot with netcat that Go couldn't do just as well, and in a way that is not just *nix agnostic but also platform agnostic.

I'd even be open to making the command flags the same so that bosh nc -x localhost:51234 remotehost 22 work, then not changing a thing about bosh ssh but instead documenting the fact that "you can run alias nc=bosh nc if you don't have the BSD flavor of netcat installed"

Would you accept this as a pull request?

@cppforlife
Copy link
Contributor

cppforlife commented Jan 24, 2018 via email

@cppforlife
Copy link
Contributor

cc @genevieve this is another topic we should discuss

@evanfarrar
Copy link
Member Author

Given that I was totally stumped by this for days, I'm pretty motivated to solve this.

Here is the story I want to prioritize for our team:

GIVEN I have a jumpbox and director create by bbl
AND I have installed bosh as mybosh
AND I have a job named router running at 10.0.16.13
AND I have a tunnel created with ssh -D 56789 -f -C -q -N -p 22 jumpbox@bbl jumpbox-address -i /path/to/jumpbox.key
WHEN I run mybosh nc --proxy-host 127.0.0.1 --proxy-port 56789 10.0.16.13:22
THEN I should see SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.1
AND WHEN I run BOSH_ALL_PROXY=socks5://localhost:56789 mybosh ssh router/0
THEN I should see Welcome to Ubuntu 14.04.5 LTS
(which implies that mybosh nc --proxy-host 127.0.0.1 --proxy-port 56789 10.0.16.13:22 was supplied as the ProxyCommand for ssh)

Some design reasoning:

  • it works the same on any platform Go supports
  • it mostly tries to conform to the CLI of "netcat" in case we need to reimplement more bits of the functionality of netcat later
  • but it also tries to conform to what appears to be the BOSH cli convention of <verb> [options] <object>
  • It eliminates a soft dependency that is awkward to explain
  • It still OpenSSH, which we still rely on very heavily for bosh ssh
  • It opens up a little more room for helpful error messages for a very difficult to debug portion of the CLI, even though we don't control SSH

We will likely PR first, then ask to cross team pair with BOSH if we need to rebase or refactor. This does not block us in any way, so we won't be really worrying about when this release is cut. It just eliminates a soft dependency of the BOSH cli that is awkward to document and only used in, and almost always works for core devs and almost never works well for users trying to use BOSH by themselves

This would also resolve #328

@krishicks
Copy link
Contributor

I ran into an issue today with nc not working on a Debian box. bosh ssh wants to provide -x to nc, but that isn't available on my version of netcat:

Running command: 'ssh -tt -o ServerAliveInterval=30 -o ForwardAgent=no -o PasswordAuthentication=no -o IdentitiesOnly=yes -o IdentityFile=/home/hicks/.bosh/tmp/ssh-priv-key997237017 -o StrictHostKeyChecking=yes -o UserKnownHostsFile=/home/hicks/.bosh/tmp/ssh-known-hosts704100772 -o ProxyCommand=nc -x localhost:25555 %!h(MISSING) %!p(MISSING) 10.150.1.2 -l bosh_9c1363f41bcf4af', stdout: '', stderr: '': exit status 255

I tried this with bosh-cli 5.4.0.

@Houlistonm
Copy link

I'd like to add to this issue.
I'm on an Amazon-Linux Bastion/Jumpbox that doesn't have a BSD compatible version of nc available.
When I do this

om --version
3.0.0
eval "$( om -e env.yml bosh-env -i om_key.pem )"
it sets a bunch of env variables for me
export BOSH_ALL_PROXY=ssh+socks5://ubuntu@my_foundry.com:22?private-key=/home/ec2-user/om_key.pem
export BOSH_CLIENT=ops_manager
export BOSH_CLIENT_SECRET=<SECRET REDACTED>
export BOSH_ENVIRONMENT=<IP REDACTED>
export BOSH_CA_CERT=<CERT REDACTED>

bosh commands work.. vms, instances, etc.
bosh ssh fails

bosh -d <DEPLOYMENT> ssh <INSTANCE>

Running SSH:
  1 error occurred:
        * Running command: 'ssh -tt -o ServerAliveInterval=30 -o ForwardAgent=no -o PasswordAuthentication=no -o IdentitiesOnly=yes -o IdentityFile=/home/ec2-user/.bosh/tmp/ssh-priv-key771013002 -o StrictHostKeyChecking=yes -o UserKnownHostsFile=/home/ec2-user/.bosh/tmp/ssh-known-hosts217309025 -o ProxyCommand=nc -x 127.0.0.1:36103 %!h(MISSING) %!p(MISSING) <IP_REDACTED> -l bosh_b74b80fcfeb6433', stdout: '', stderr: '': exit status 255

we have a work around to the failure by setting three new variables and unsetting another on ssh invocation.

eval "$( om -e env.yml bosh-env -i om_key.pem )"

# BOSH_ALL_PROXY=ssh+socks5://ubuntu@my_foundry.com:22?private-key=/home/ec2-user/om_key.pem
export BOSH_GW_USER=$( grep -Po '(?<=@)([^:]+)(?=:)' <<< $BOSH_ALL_PROXY )
export BOSH_GW_HOST=$( grep -Po '(?<=://)([^@]+)(?=@)' <<< $BOSH_ALL_PROXY )
export BOSH_GW_PRIVATE_KEY=$( grep -Po '(?<=key=)(.*)' <<< $BOSH_ALL_PROXY )

Now this command works.

bosh -d <DEPLOYMENT> ssh <INSTANCE> --gw-socks5=

I would like to have the reliance on nc removed (most desirable) OR updated to support the other nc commands.

@RoboMWM
Copy link

RoboMWM commented Feb 18, 2020

Any progress on this? bosh ssh is impossible on our Windows clients, and it's a bit difficult to find a Windows version of netcat that an antivirus doesn't complain about...

@bosh-admin-bot
Copy link

This issue was marked as Stale because it has been open for 21 days without any activity. If no activity takes place in the coming 7 days it will automatically be close. To prevent this from happening remove the Stale label or comment below.

@Houlistonm
Copy link

I still believe this is a valid request.

@bosh-admin-bot
Copy link

This issue was marked as Stale because it has been open for 21 days without any activity. If no activity takes place in the coming 7 days it will automatically be close. To prevent this from happening remove the Stale label or comment below.

@bosh-admin-bot
Copy link

This issue was closed because it has been labeled Stale for 7 days without subsequent activity. Feel free to re-open this issue at any time by commenting below.

@RoboMWM
Copy link

RoboMWM commented Aug 31, 2021

stale bots succ

@jpalermo
Copy link
Member

jpalermo commented Nov 4, 2021

For reference, this does still seem to be a problem without a lot of great solutions. Unfortunately it doesn't cause problems for most users.

But if somebody wants to take a crack at fixing it, we'd be more than happy to review a PR and provide any guidance we can.

@risicle
Copy link

risicle commented May 17, 2022

Multiple people on my team have run into this, but it's made worse by the fact that the error messages it results in are so opaque. Even adding a check for the netcat variant and emitting a helpful error message would improve things slightly.

@beyhan
Copy link
Member

beyhan commented May 23, 2022

@risicle we will review a pr in case you would like to contribute this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants