Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl ignores SSH_KNOWNHOSTS file #4953

Closed
duboism opened this issue Feb 19, 2020 · 11 comments
Closed

curl ignores SSH_KNOWNHOSTS file #4953

duboism opened this issue Feb 19, 2020 · 11 comments
Labels

Comments

@duboism
Copy link

@duboism duboism commented Feb 19, 2020

I did this

I ran the following script several times:

import pycurl


# Accept new hosts and add them to HOSTS_FILE
def accept_new_hosts(known_key, found_key, match):
    print((known_key, found_key, match))
    return pycurl.KHSTAT_FINE_ADD_TO_FILE


# Configure & download
crl = pycurl.Curl()
crl.setopt(pycurl.SSH_KNOWNHOSTS, "/tmp/known_hosts")
crl.setopt(pycurl.SSH_KEYFUNCTION, accept_new_hosts)

crl.setopt(pycurl.URL, "sftp://test.rebex.net/readme.txt")
crl.setopt(pycurl.USERPWD, "demo:password")

fp = open("/tmp/readme.txt", "wb")
crl.setopt(pycurl.WRITEFUNCTION, fp.write)

crl.perform()

# Re-read hosts file & print length
with open("/tmp/known_hosts") as f:
    print(len(f.readlines()))

I noticed that after each run the known hosts file (/tmp/known_hosts) get larger and the key is never found. For instance after 3 runs the file contains:

test.rebex.net ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAkRM6RxDdi3uAGogR3nsQMpmt43X4WnwgMzs8VkwUCqikewxqk4U7EyUSOUeT3CoUNOtywrkNbH83e6/yQgzc3M8i/eDzYtXaNGcKyLfy3Ci6XOwiLLOx1z2AGvvTXln1RXtve+Tn1RTr1BhXVh2cUYbiuVtTWqbEgErT20n4GWD4wv7FhkDbLXNi8DX
07F9v7+jH67i0kyGm+E3rE+SaCMRo3zXE6VO+ijcm9HdVxfltQwOYLfuPXM2t5aUSfa96KJcA0I4RCMzA/8Dl9hXGfbWdbD2hK1ZQ1pLvvpNPPyKKjPZcMpOznprbg+jIlsZMWIHt7mq2OJXSdruhRrGzZw==
test.rebex.net ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAkRM6RxDdi3uAGogR3nsQMpmt43X4WnwgMzs8VkwUCqikewxqk4U7EyUSOUeT3CoUNOtywrkNbH83e6/yQgzc3M8i/eDzYtXaNGcKyLfy3Ci6XOwiLLOx1z2AGvvTXln1RXtve+Tn1RTr1BhXVh2cUYbiuVtTWqbEgErT20n4GWD4wv7FhkDbLXNi8DX
07F9v7+jH67i0kyGm+E3rE+SaCMRo3zXE6VO+ijcm9HdVxfltQwOYLfuPXM2t5aUSfa96KJcA0I4RCMzA/8Dl9hXGfbWdbD2hK1ZQ1pLvvpNPPyKKjPZcMpOznprbg+jIlsZMWIHt7mq2OJXSdruhRrGzZw==
test.rebex.net ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAkRM6RxDdi3uAGogR3nsQMpmt43X4WnwgMzs8VkwUCqikewxqk4U7EyUSOUeT3CoUNOtywrkNbH83e6/yQgzc3M8i/eDzYtXaNGcKyLfy3Ci6XOwiLLOx1z2AGvvTXln1RXtve+Tn1RTr1BhXVh2cUYbiuVtTWqbEgErT20n4GWD4wv7FhkDbLXNi8DX
07F9v7+jH67i0kyGm+E3rE+SaCMRo3zXE6VO+ijcm9HdVxfltQwOYLfuPXM2t5aUSfa96KJcA0I4RCMzA/8Dl9hXGfbWdbD2hK1ZQ1pLvvpNPPyKKjPZcMpOznprbg+jIlsZMWIHt7mq2OJXSdruhRrGzZw==

I expected the following

I expected the key to be found after first run and the known host file to contain one key.

curl/libcurl version

$ curl --version
curl 7.66.0 (x86_64-redhat-linux-gnu) libcurl/7.66.0 OpenSSL/1.1.1d-fips zlib/1.2.11 brotli/1.0.7 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0
Release-Date: 2019-09-11
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz Metalink NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets

operating system

$ uname -a
Linux xxx 5.4.19-200.fc31.x86_64 #1 SMP Wed Feb 12 15:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
@duboism

This comment has been minimized.

Copy link
Author

@duboism duboism commented Feb 19, 2020

In a discussion on the ML, @bagder suggested that the bug comes form libssh or libcurl.

Here are more information/tests:

  • Setting SSH_KNOWNHOST to ~/.ssh/known_hosts works: the ecdsa-sha2-nistp256 key is added (and not ssh-rsa).
  • Compiling curl with libssh2 (and reinstalling pycurl on top of it) solves the problem (it also uses ecdsa-sha2-nistp256). After using this version, the libssh based version (from Fedora) works.
@bagder bagder added the SCP/SFTP label Feb 19, 2020
@bagder

This comment has been minimized.

Copy link
Member

@bagder bagder commented Feb 19, 2020

ping @nmav, any immediate thoughts?

@nmav

This comment has been minimized.

Copy link
Contributor

@nmav nmav commented Feb 19, 2020

None unfortunately. Maybe @ansasaki or @cryptomilk have a better understanding of the issue.

@ansasaki

This comment has been minimized.

Copy link
Contributor

@ansasaki ansasaki commented Feb 20, 2020

I'll take a look. At first glance, I saw libcurl is using a deprecated API to check for known hosts (ssh_is_server_known() instead of the recommended ssh_session_is_known_server()). I'll investigate further if this is the cause of the bug.

@ansasaki

This comment has been minimized.

Copy link
Contributor

@ansasaki ansasaki commented Feb 21, 2020

@duboism Hello, could you try the patch I proposed on #4962?

You'll need libssh >= 0.9.0 for the behaviour to be improved.

But there is one more thing: libssh will append the entry to the know hosts file when ssh_write_knownhost() or ssh_session_update_known_hosts() are called without checking if the entry is already there. Whenever you return pycurl.KHSTAT_FINE_ADD_TO_FILE in the callback, the entry is appended to the known hosts file.

You should change your callback to check if the entry should or not be appended to the file and return your decision accordingly.

@duboism

This comment has been minimized.

Copy link
Author

@duboism duboism commented Feb 21, 2020

@ansasaki I have applied your patch against curl 7.68.0, compiled and installed it (Fedora 31 has libssh 0.9.3) and installed pycurl with it:

$ curl --version
curl 7.69.0-DEV (x86_64-pc-linux-gnu) libcurl/7.69.0-DEV OpenSSL/1.1.1d-fips zlib/1.2.11 libssh/0.9.3/openssl/zlib
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS HTTPS-proxy IPv6 Largefile libz NTLM NTLM_WB SSL TLS-SRP UnixSockets

If I understood correctly, the accept_new_hosts should be:

def accept_new_hosts(known_key, found_key, match):
    print((known_key, found_key, match))
    if match != pycurl.CURLKHMATCH_OK:
        return pycurl.KHSTAT_FINE_ADD_TO_FILE
    else:
        return pycurl.KHSTAT_FINE

With this setup, I have segmentation errors before perform.

@ansasaki

This comment has been minimized.

Copy link
Contributor

@ansasaki ansasaki commented Feb 21, 2020

@ansasaki I have applied your patch against curl 7.68.0, compiled and installed it (Fedora 31 has libssh 0.9.3) and installed pycurl with it:

$ curl --version
curl 7.69.0-DEV (x86_64-pc-linux-gnu) libcurl/7.69.0-DEV OpenSSL/1.1.1d-fips zlib/1.2.11 libssh/0.9.3/openssl/zlib
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS HTTPS-proxy IPv6 Largefile libz NTLM NTLM_WB SSL TLS-SRP UnixSockets

If I understood correctly, the accept_new_hosts should be:

def accept_new_hosts(known_key, found_key, match):
    print((known_key, found_key, match))
    if match != pycurl.CURLKHMATCH_OK:
        return pycurl.KHSTAT_FINE_ADD_TO_FILE
    else:
        return pycurl.KHSTAT_FINE

With this setup, I have segmentation errors before perform.

I'm looking into it because I get segmentation faults even without my changes with this new calback.

@fds242

This comment has been minimized.

Copy link

@fds242 fds242 commented Feb 23, 2020

The segfault looks like a pycurl issue to me. It crashes in pycurl code, and goes away when I replace CURLKHMATCH_OK with the numerical equivalent, 0:

if match != 0:

It should probably be reported to the pycurl developers.
It also appears to me that released versions of curl, including as shipped in Fedora 31, already function as expected here. Notice that on first run, when the key wasn't in the known hosts file yet, match had the value of 2 (CURLKHMATCH_MISSING). While on subsequent runs that changes to 0 (CURLKHMATCH_OK). So your add-or-don't-add logic can already work properly, as long as you manage to avoid that pycurl segfault. I'd definitely urge to reply with KHSTAT_REJECT on at least CURLKHMATCH_MISMATCH (numerical value 1).

The very welcome patch from @ansasaki removes deprecated libssh API use, and rectifies an old deficiency in that the "known key" was never previously supplied to the libcurl callback function. Ultimately that doesn't make a difference here though, as libssh was already correctly reading and verifying the key, and the result of that verification was also already passed along.

That libssh, like OpenSSH, can accept and verify any of the host keys it understands is a great benefit actually. Most ssh servers have a variety of host keys configured, one each for ssh-rsa, ecdsa-sha2-nistp256, ssh-ed25519, etc. Your known hosts file can list either single one of them, and it will be accepted. Contrast that with curl built libssh2 where it fixates on one specific host key type. Build with libssh2/1.8.2 and it will accept and verify the rsa key only. Upgrade to libssh2/1.9.0 and now it will insist on only matching the ecdsa key. None of that pain with libssh.

Additionally, if your aim really is just to indiscriminately accept any host key without verification, you don't need this entire callback, host key saving dance. Set CURLOPT_SSL_VERIFYHOST to 0 and you're done. This is another libssh/libssh2 difference: when built with libssh2, the server host key is not verified by default, and the VERIFYHOST option makes no difference whatsoever.

@nmav

This comment has been minimized.

Copy link
Contributor

@nmav nmav commented Feb 24, 2020

@bagder would it make sense to switch the main contact person for libssh to @ansasaki, if that's also ok with him? (I already asked him, but just for the record :)

@ansasaki

This comment has been minimized.

Copy link
Contributor

@ansasaki ansasaki commented Feb 24, 2020

@bagder would it make sense to switch the main contact person for libssh to @ansasaki, if that's also ok with him? (I already asked him, but just for the record :)

I am happy to help with libssh related issues!

@bagder bagder closed this in 507cf6a Feb 27, 2020
@duboism

This comment has been minimized.

Copy link
Author

@duboism duboism commented Mar 2, 2020

Sorry for the long delay I needed to finish some important stuff.

For the record, I think that the segfault was because the corresponding constant in pycurl is KHMATCH_OK (not CURLKHMATCH_OK). Therefore the callback failed which caused the segmentation fault.

To sum up:

  • The fact that the file gets longer and longer was due to the logic error (i.e. don't add the key in file if match == KHSTAT_OK)
  • Before this patch, the known key was never passed to the callback (which explained why it was never printed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants
You can’t perform that action at this time.