-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow DefaultSslEngineFactory subclass customization of the SslContext #1171
Allow DefaultSslEngineFactory subclass customization of the SslContext #1171
Conversation
|
||
private SslContext getSslContext(AsyncHttpClientConfig config) throws SSLException { | ||
private SslContext buildSslContext(AsyncHttpClientConfig config) throws SSLException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this code where it was in the source but if you don't mind the reshuffling, I'll move it down to the bottom with the other private stuff
Headache... Actually, it bothers me if we only detect SslContext building crash on first request instead of client instantiation... |
Yeah, that's really not very nice at all. :( If breaking compatibility is out of the question, one thing we could do is add a pre-emptive initialization call in ChannelManager in the typical case where the factory is the default impl (either a custom subclass or not): cast, and then call some method that's not in the interface. Perhaps at the next opportunity to break compatibility we could add some sort of |
Oh wait, this project is Java 8 -- couldn't we just add an |
Oh right! Can you give it a try? |
} catch (SSLException e) { | ||
throw new ExceptionInInitializerError(e); | ||
throw new RuntimeException("Could not initialize sslEngineFactory", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was a more helpful exception
btw, I can't run tests on my Linux box - Specifically, it looks like it makes a request to the proxy, which then requires authorization, so it makes another request, and the proxy tries to forward, and nothing happens:
... and eventually timeout. Watching in Wireshark, nothing is ever sent to the upstream server. I see two GETs to the proxy but nothing to the upstream server. |
Oh my... I run on OSX and was trusting Travis to execute the tests on Linux, but I just realized that Travis skip tests by default!!! Investigating. |
Actually, there was an issue with the Travis set up, tests are currently running: https://travis-ci.org/AsyncHttpClient/async-http-client/builds/134807126 |
FWIW I get the following on a full run:
The last one only shows up sometimes. The second one shows up every time I do a full run, but doesn't fail when run by itself. |
How do you run maven? By any chance, do you use parallel builds? |
|
I'll see if I can reproduce this in a VM. |
I'll investigate tomorrow at the office too. Those issues are very weird. The |
In case it comes in handy, I pushed https://github.com/marshallpierce/async-http-client/tree/debug-test-failures-vagrant which has a vagrant box. Curiously, I get different issues in the VM:
|
WAT?! I only see 2 changes that could be related:
Could you try adding the former AND/OR reverting the latter, please? I'll try to reproduce on my side today. |
I could reproduce the 504 on Travis: https://travis-ci.org/AsyncHttpClient/async-http-client#L5629 |
I could get the Travis build to pass if I ask surefire to not fork and execute tests sequentially, see 8c054a3 and https://travis-ci.org/AsyncHttpClient/async-http-client/builds/134987736. And I can't reproduce the issue on my local Linux box. Will try your vagrant. |
I can't reproduce on VirtualBox either :( And actually build failed again on Travis... I'm at a loss. |
I would be awesome if you could get the full stacktrace. In order to get surefire to output logs, you have to enable a profile: This message comes from Netty, and I think the root issue is there, like lacking/leaking native memory. I really need the full stacktrace with the cause Exception. |
Pinned it: #1172 |
I'll dig in once I'm at the office (about an hour). The host is an E5-1650 v3 w/ 64GiB, running up-to-date Gentoo Linux with Oracle JDK 1.8u92. It's not running with SELinux or restrictive sysctl params or anything like that. |
Check last comment: issue solved :) |
(cont) At least the I couldn't reproduce the other ones. |
On 9f817eb I get:
Stack traces:
and
The first one I cannot reproduce when running by itself. The latter one I did on the third try. However, that one appears to be hitting the live freakonomics.com site, so it's probably unavoidable without reimplementing using a local http server.
|
Oh, apparently the hard limit on OS X is unlimited anyway. Re-running on linux with |
Yeah, some tests where originally introduced this way (before I joined here). There are some issues open for refactoring them into proper standalone one, such as #496. Help welcome. The "freakonomics.com" website is sometimes unstable.
Maybe, but not necessarily. It could be that we have tons of ports that have been opened, but can't be recycled yet.
Sure:
|
lsof accounts for all open files, not only sockets. What does netstat says? |
Is there anything in particular you'd like to look at in netstat output? I don't see anything other than a hundred TIME_WAIT sockets, which since they've already been closed are not associated with the opening process. In my lsof'ing I'm grepping for this (with all other java processes closed):
Or this:
I can't think of a good reason why that should output 121256 eventpoll fds, or 85741 pipes, and yet I've seen those numbers. Even if the sockets were persisting after being properly closed, they shouldn't be tied to that pid anymore. |
I also found one of these in the test log:
So, we've got at least one leak. |
I stuck a Thread.sleep(Long.MAX_VALUE) at the top of testRequestNonProxyHost to give me some time to investigate. After dumping the (live only) heap with jmap, some cursory When I run just that test alone and take a live heap dump while it's sleeping at the beginning of the test, I see 8 instances, so it seems likely that the other 792 are leaked from previous tests. |
566598d
to
d61a243
Compare
This was probably a derived client no being closed: f402385#diff-8bec36202922bc4718c8cabbdf131ea0L276. I did found tons of clients not being closed but in an extra module that's build downstream. |
d61a243
to
306d344
Compare
I've rebased this on top of latest master. Also, if travis will let you, would be interesting to add 'ulimit -H -n 4000' so we have a known config there. The build passes with 4000 for me. (The cutoff appears to be between 3500 and 4000.) |
Done |
Is there anything else you'd like me to improve on this, or are you happy with the current approach? |
public DefaultSslEngineFactory(AsyncHttpClientConfig config) throws SSLException { | ||
this.sslContext = getSslContext(config); | ||
} | ||
public DefaultSslEngineFactory() { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove explicit default constructor.
Could you please address last comment and squash? |
…makes it feasible provide a subclass of DSEF via config that can customize the SslContext as it is built. Let SSLContext errors be detected early by adding init() to SslEngineFactory.
306d344
to
f20d1e3
Compare
Thanks! |
@marshallpierce I've just released 2.0.5 |
2.0.5 is working great with a custom subclass of DSEF; thanks! |
Great! Thanks for your help! |
See #1170 for context.
If you have ideas for how to usefully test this, I'm happy to write them up, but it wasn't obvious to me how to usefully test this change.