Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Segfault in https fetching #992

Closed
jeffkaufman opened this issue Jul 15, 2015 · 10 comments
Closed

Segfault in https fetching #992

jeffkaufman opened this issue Jul 15, 2015 · 10 comments

Comments

@jeffkaufman
Copy link
Contributor

Reported on the mailing list:

I am getting a few core dumps from nginx daily and after finally capturing one of these it looks to me (might be completely wrong here though) that its coming from the nginx pagespeed module when using ssl.

gdb /usr/sbin/nginx /tmp/core-nginx-11-33-33-28969-1436943894
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/nginx...done.
[New LWP 28988]
[New LWP 28984]
[New LWP 28969]
[New LWP 28976]
[New LWP 28987]
[New LWP 28985]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker pr'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000b5f4e1 in SSL_set_tlsext_host_name (ssl=0x0, name=0x24aa828 "google-maps-utility-library-v3.googlecode.com") at third_party/boringssl/src/ssl/s3_lib.c:355
355    third_party/boringssl/src/ssl/s3_lib.c: No such file or directory.
(gdb) bt
#0  0x0000000000b5f4e1 in SSL_set_tlsext_host_name (ssl=0x0, name=0x24aa828 "google-maps-utility-library-v3.googlecode.com") at third_party/boringssl/src/ssl/s3_lib.c:355
#1  0x0000000000aab1e1 in serf_ssl_set_hostname (context=0x7f4f6c07f368, hostname=0x24aa828 "google-maps-utility-library-v3.googlecode.com") at third_party/serf/instaweb_ssl_buckets.c:1215
#2  0x0000000000929079 in net_instaweb::SerfFetch::ConnectionSetup (socket=0x7f4f781e28c0, read_bkt=0x2673138, write_bkt=0x7f4f7c8db780, setup_baton=0x7f4f78204ff0, pool=0x24aa688)
    at net/instaweb/system/serf_url_async_fetcher.cc:333
#3  0x0000000000aa79e0 in do_conn_setup (conn=0x26730a8) at third_party/serf/instaweb_outgoing.c:550
#4  0x0000000000aa7a75 in prepare_conn_streams (conn=0x26730a8, istream=0x2673138, ostreamt=0x7f4f7c8db820, ostreamh=0x7f4f7c8db818) at third_party/serf/instaweb_outgoing.c:591
#5  0x0000000000aa7cc4 in write_to_connection (conn=0x26730a8) at third_party/serf/instaweb_outgoing.c:696
#6  0x0000000000aa84a1 in serf__process_connection (conn=0x26730a8, events=4) at third_party/serf/instaweb_outgoing.c:1104
#7  0x0000000000aa5ad5 in serf_event_trigger (s=0x22cbe80, serf_baton=0x26730b8, desc=0x22cc068) at third_party/serf/src/context.c:233
#8  0x0000000000aa5c38 in serf_context_run (ctx=0x22cbe80, duration=0, pool=0x22cbdb8) at third_party/serf/src/context.c:296
#9  0x0000000000924dc0 in net_instaweb::SerfFetch::Start (this=0x7f4f702cbad0, fetcher=0x1f12880) at net/instaweb/system/serf_url_async_fetcher.cc:1033
#10 0x0000000000924ef5 in net_instaweb::SerfUrlAsyncFetcher::StartFetch (this=0x1f12880, fetch=0x7f4f702cbad0) at net/instaweb/system/serf_url_async_fetcher.cc:1209
#11 0x0000000000929240 in net_instaweb::SerfThreadedFetcher::TransferFetchesAndCheckDone (this=0x1f12880, block_on_empty=false) at net/instaweb/system/serf_url_async_fetcher.cc:930
#12 0x00000000009292f8 in net_instaweb::SerfThreadedFetcher::SerfThread (this=0x1f12880) at net/instaweb/system/serf_url_async_fetcher.cc:949
#13 0x000000000092938c in net_instaweb::SerfThreadedFetcher::SerfThreadFn (thread_id=0x22cc2b0, context=0x1f12880) at net/instaweb/system/serf_url_async_fetcher.cc:873
#14 0x0000000000a2eebd in dummy_worker (opaque=0x22cc2b0) at third_party/apr/src/threadproc/unix/thread.c:142
#15 0x00007f4f87fd6b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#16 0x00007f4f856f670d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x0000000000000000 in ?? ()
(gdb)


My nginx is compiled with:
nginx version: nginx/1.9.2
built by gcc 4.7.2 (Debian 4.7.2-5)
built with OpenSSL 1.0.1e 11 Feb 2013
TLS SNI support enabled
configure arguments: --add-module=headers-more-nginx-module-0.26 --add-module=/root/ngx_pagespeed-release-1.9.32.4-beta --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --http-scgi-temp-path=/var/lib/nginx/tmp/scgi --pid-path=/var/run/nginx.pid --lock-path=/var/lock/subsys/nginx --user=nginx --group=nginx --with-file-aio --with-ipv6 --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module --with-http_image_filter_module --with-http_geoip_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_stub_status_module --with-http_perl_module --with-mail --with-mail_ssl_module --with-debug --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' --with-ld-opt=-Wl,-E


So pagespeed version is: ngx_pagespeed-release-1.9.32.4-beta

It looks like we have:

  • SerfFetch::ConnectionSetup calls serf_ssl_set_hostname calls
    SSL_set_tlsext_host_name.
  • SSL_set_tlsext_host_name receives NULL for its first argument, which
    should be a valid SSL (connection) pointer, and then dereferences it,
    hence the segfault.

This means serf_ssl_set_hostname is receiving a serf_ssl_context_t
*context where context->ssl is NULL.

So probably somewhere we're not checking return values properly, and
we're continuing on after a failed call. Candidates? (Most likely
something where BoringSSL tightened up error checking, but I'm not
sure where that is.)

@jeffkaufman
Copy link
Contributor Author

ConnectionSetup should be creating a serf_ssl_context_t* fetch->ssl_context_ where fetch->ssl_context_->ssl isn't null.

@jeffkaufman
Copy link
Contributor Author

ssl_init_context has:

ssl_ctx->ssl = SSL_new(ssl_ctx->ctx);

SSL_new returns NULL on error, and at least in boringssl there are a whole bunch of checks that lead to it returning null (see ssl_lib.c). And our call isn't null-checked.

@jeffkaufman
Copy link
Contributor Author

I just diffed boringssl and openssl SSL_new definitions, and it looks like they have the same checks. I'll repeat this with the version of openssl used in 1.9.23.3 since I just used HEAD and they might have made changes.

@jeffkaufman
Copy link
Contributor Author

Repeating this with revision 063a4b93646788bd883fc0cb1b5eafc991ddacc4 of https://chromium.googlesource.com/chromium/deps/openssl.git which is what we were using in 1.9.23.3, I do see an additional check:

if (s->ctx->alpn_client_proto_list) {
    s->alpn_client_proto_list = BUF_memdup(s->ctx->alpn_client_proto_list,
                                           s->ctx->alpn_client_proto_list_len);
    if (s->alpn_client_proto_list == NULL) {
      goto err;
    }
    s->alpn_client_proto_list_len = s->ctx->alpn_client_proto_list_len;
}

I also found a check that I missed before, which was added in boringssl:

  s->psk_identity_hint = NULL;
  if (ctx->psk_identity_hint) {
    s->psk_identity_hint = BUF_strdup(ctx->psk_identity_hint);
    if (s->psk_identity_hint == NULL) {
      goto err;
    }
  }

Neither of these seem like the problem, though, since they're both just verifying that a memory copy succeeded, and unless we're running out of memory this shouldn't be active. Doesn't seem worth continuing in this direction.

@jeffkaufman
Copy link
Contributor Author

@crowell is going to look through the code to see if I missed something. If not, we'll build a custom PSOL that does check that ssl_ctx->ssl is non-NULL and logs debugging info, and send that out for testing.

@crowell
Copy link
Contributor

crowell commented Jul 20, 2015

done with the pre-release of 1.9.32.5 from last week, looking at this today

@crowell
Copy link
Contributor

crowell commented Jul 21, 2015

version of PSOL with some extra debug logging where the crashes would have happened, along with a check for null deref which was causing the crash

rebuilding now, built that from the wrong tag 👎

@crowell
Copy link
Contributor

crowell commented Jul 21, 2015

whoops, built that from the wrong tag, rebuilding now.

@crowell
Copy link
Contributor

crowell commented Jul 22, 2015

updated PSOL build here https://github.com/pagespeed/ngx_pagespeed/releases/tag/1.9.32.4-dbg-ssl-crash

Please give this a try and report back.

Thanks,
Jeff

@crowell
Copy link
Contributor

crowell commented Jul 27, 2015

fixed in apache/incubator-pagespeed-mod@7e36d7d

I'd still like to get any feedback from the build in the comment above to get extra debugging information.

@crowell crowell closed this as completed Jul 27, 2015
@jeffkaufman jeffkaufman changed the title segfault in https fetching Segfault in https fetching Jul 27, 2015
@pono pono unassigned crowell Jan 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants