Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in openldap while using ldaps #6676

Closed
monnerat opened this issue Mar 1, 2021 · 15 comments
Closed

Segfault in openldap while using ldaps #6676

monnerat opened this issue Mar 1, 2021 · 15 comments

Comments

@monnerat
Copy link
Contributor

@monnerat monnerat commented Mar 1, 2021

I did this

curl -k --cert-type P12 --cert mycert.p12:password 'ldaps://localhost/?supportedSASLMechanisms?base?(objectclass=*)'
After correct output, curl segfaults while in ldap_unbind_ext called by ldap_disconnect.
This is the ldapsb_tls problem we shortly discussed at #6518 (comment) and is related to the conn->data removal.

I expected the following

Clean termination

curl/libcurl version

curl 7.76.0-DEV (x86_64-pc-linux-gnu) libcurl/7.76.0-DEV OpenSSL/1.1.1i-fips zlib/1.2.11 brotli/1.0.9 zstd/1.4.7 libidn2/2.3.0 libpsl/0.21.1 (+libidn2/2.3.0) iconv libssh/0.9.5/openssl/zlib nghttp2/1.43.0 librtmp/2.3 libgsasl/1.8.1
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli CharConv Debug gsasl GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP TrackMemory UnixSockets zstd

operating system

Linux patrick.monnerat 5.10.18-200.fc33.x86_64 #1 SMP Tue Feb 23 22:06:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

@emilengler
Copy link
Contributor

@emilengler emilengler commented Mar 1, 2021

I can reproduce with 24f850f on OpenBSD-current

eengler@openbsd:~/src/curl$ src/curl ldaps://db.debian.org/ 
DN: 
        objectClass: top
        objectClass: OpenLDAProotDSE


Segmentation fault (core dumped) 

@emilengler
Copy link
Contributor

@emilengler emilengler commented Mar 4, 2021

I did some investigation on it and it turns out that the following line tries to access an item from a NULL pointer:

struct ldapconninfo *li = conn->proto.ldapc;

The problem is that conn (data->conn) is NULL. However data itself is set, just not conn.

I might find time to write a patch for it this weekend but I can't promise anything. Will post more updates regarding it soon.

@monnerat
Copy link
Contributor Author

@monnerat monnerat commented Mar 4, 2021

I might find time to write a patch for it this weekend

I chased this down too and IMHO it's not as simple as it looks: data passed to ldap_disconnect is not the same as the one latched for ldapsb_tls. Daniel and me faced the same kind of problem in openssl while removing conn->data occurrences and we've made several unsuccessful trials before Daniel finally resolved it by a hack.

See #6518 (comment), #6446 (comment) and f2f91ac.

@emilengler
Copy link
Contributor

@emilengler emilengler commented Mar 5, 2021

Okay I spent some time this morning inspecting it but I see myself unable to do it unfortunately. The problem is indeed much bigger

bagder added a commit that referenced this issue Mar 23, 2021
Follow-up to a59c33c
Reported-by: Patrick Monnerat
Fixes #6676
Closes #[fill in]
@bagder bagder closed this in e467ea3 Mar 23, 2021
@monnerat
Copy link
Contributor Author

@monnerat monnerat commented Mar 24, 2021

@bagder : thanks for fix. This effectively suppresses the segfault, but I wonder if the way it's done would'nt close the connection prematurely with regards to the LDAP protocol. I did not check, but the unbindRequest is probably not transmitted.

@bagder
Copy link
Member

@bagder bagder commented Mar 24, 2021

(This doesn't just suppress the segfault, it fixes the function to use the correct pointer and only use it if it is (still) valid.)

That's a slightly different but possibly still important issue: since the connection is already gone when ldap_disconnect is called the final actions are probably never done. I honestly don't know how important that is for users or servers.

The only proper way we can get this in order is to introduce an LDAP test server to the test suite and a few tests to make sure that curl speaks the protocol correctly. Easier said than done.

@monnerat
Copy link
Contributor Author

@monnerat monnerat commented Mar 24, 2021

(This doesn't just suppress the segfault, it fixes the function to use the correct pointer and only use it if it is (still) valid.)

It seems you set the pointer to NULL: in ldap_disconnect:
ber_sockbuf_add_io(sb, &ldapsb_tls, LBER_SBIOD_LEVEL_TRANSPORT, NULL);
This disables effective io.

The only proper way we can get this in order is to introduce an LDAP test server to the test suite and a few tests to make sure that curl speaks the protocol correctly. Easier said than done.

I already dreamed of it: that's another story!

@bagder
Copy link
Member

@bagder bagder commented Mar 24, 2021

It seems you set the pointer to NULL: in ldap_disconnect:

Yes, because when we disconnect LDAP there is no longer any transfer associated to it...

@monnerat
Copy link
Contributor Author

@monnerat monnerat commented Mar 24, 2021

Yes, because when we disconnect LDAP there is no longer any transfer associated to it...

Sure, but we're still having a (dummy?) data passed to curl_disconnect().

I had good results trying the following method:

  • As you did for openssl trace, save data in the ldapconninfo new field io, revert the ldasb_tls stuff to use a conn pointer.
  • In the ldapsb_tls_*() functions, use li->io as data (mimic conn->data).
  • In ldap_disconnect(), save data->conn, set it to conn and set li->io to data.
  • Restore data->conn before return.

Hacky-tricky, but it seems to work.

I can provide a PR if you like it.

@bagder
Copy link
Member

@bagder bagder commented Mar 24, 2021

Sure, but we're still having a (dummy?) data passed to curl_disconnect()

Curl_disconnect() gets the transfer passed in for which the associated connection should get disconnected. It's not a dummy.

But I was slightly wrong. The connection is actually still associated with the transfer when ldap_diconnect is called so it shouldn't have to be NULLed. I did that because the openldap callback got a data pointer to an already freed easy handle... I just draw the wrong conclusion. The correct fix there should be to update to the (new) transfer by updating the callback pointer to the new transfer:

diff --git a/lib/openldap.c b/lib/openldap.c
index 066c0fd73..c80ac8b07 100644
--- a/lib/openldap.c
+++ b/lib/openldap.c
@@ -369,11 +369,11 @@ static CURLcode ldap_disconnect(struct Curl_easy *data,
 
   if(li) {
     if(li->ld) {
       Sockbuf *sb;
       ldap_get_option(li->ld, LDAP_OPT_SOCKBUF, &sb);
-      ber_sockbuf_add_io(sb, &ldapsb_tls, LBER_SBIOD_LEVEL_TRANSPORT, NULL);
+      ber_sockbuf_add_io(sb, &ldapsb_tls, LBER_SBIOD_LEVEL_TRANSPORT, data);
       ldap_unbind_ext(li->ld, NULL, NULL);
       li->ld = NULL;
     }
     conn->proto.ldapc = NULL;
     free(li);

@bagder
Copy link
Member

@bagder bagder commented Mar 24, 2021

see #6787

@monnerat
Copy link
Contributor Author

@monnerat monnerat commented Mar 24, 2021

I tried that alone, but it does not work, since data->conn is NULL.

@bagder
Copy link
Member

@bagder bagder commented Mar 24, 2021

For the original data, yes. This is not the same transfer object. The connection was moved to a new transfer.

@bagder
Copy link
Member

@bagder bagder commented Mar 24, 2021

Thread 1 "curl" hit Breakpoint 1, ldap_disconnect (data=0x5555556a9a18, 
    conn=0x5555556ac028, dead_connection=false) at openldap.c:366
366       struct ldapconninfo *li = conn->proto.ldapc;
(gdb) p data->conn
$1 = (struct connectdata *) 0x5555556ac028
(gdb) n
370       if(li) {
(gdb) 
371         if(li->ld) {
(gdb) 
373           ldap_get_option(li->ld, LDAP_OPT_SOCKBUF, &sb);
(gdb) 
374           ber_sockbuf_add_io(sb, &ldapsb_tls, LBER_SBIOD_LEVEL_TRANSPORT, data);
(gdb) p data->conn
$2 = (struct connectdata *) 0x5555556ac028

@monnerat
Copy link
Contributor Author

@monnerat monnerat commented Mar 24, 2021

For the original data, yes. This is not the same transfer object. The connection was moved to a new transfer.

Yes, you're right. I just tried #6787 with removals of null tests in tls io functions and it works.

I think we've finally got it. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants