New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

badarg received when using async #267

Open
priestjim opened this Issue Dec 9, 2015 · 12 comments

Comments

Projects
3 participants
@priestjim

priestjim commented Dec 9, 2015

On a very high HTTP call volume server, using async to "fire and forget" continuously produces this message in the error logger:

2015-12-09 23:56:17 =ERROR REPORT====
Error in process <0.3450.45> on node '[...]' with exit value:
{badarg,[{erlang,port_info,[#Port<0.148406>,connected],[]},{inet,tcp_controlling_process,2,[{file,"inet.erl"},{line,1522}]},{hackney_stream,stream_loop,4,[{file,"[...]hackney/src/hackney_stream.erl"},{line,85}]},{hackney_stream,init,4,[{file,"[...]hackney/src/hackney_stream.erl"},{line,30}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}
2015-12-09 23:56:17 =SUPERVISOR REPORT====
     Supervisor: {local,hackney_sup}
     Context:    child_terminated
     Reason:     shutdown
     Offender:   [{pid,<0.1418.45>},{id,[...]},{mfargs,{hackney_pool,start_link,[[...],[{name,[...]},{timeout,120000},{max_connections,8192}]]}},{restart_type,permanent},{shutdown,10000},{child_type,worker}]

{badarg,[{erlang,port_info,[#Port<0.58183>,connected],[]},{inet,tcp_controlling_process,2,[{file,"inet.erl"},{line,1522}]},{hackney_stream,stream_loop,4,[{file,"[...]/hackney/src/hackney_stream.erl"},{line,85}]},{hackney_stream,init,4,[{file,"[..]/hackney/src/hackney_stream.erl"},{line,30}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}

The call that causes this issue is:

hackney:request(post, URL, Headers, Body, [async, {connect_timeout, 10000}, {pool, PoolName}]),
_ = receive _ -> ok end,

Where the custom pool is defined with the following options:

[{timeout, 120000}, {max_connections, 8192}]
@priestjim

This comment has been minimized.

priestjim commented Dec 10, 2015

I'm gonna assume hackney is trying to give the socket back to the parent, but the socket is already closed remotely (by the remote server) hence the badarg. This situation should be managed in the same manner as the synchronous call.

@benoitc

This comment has been minimized.

Owner

benoitc commented Dec 10, 2015

@priestjim

This situation should be managed in the same manner as the synchronous call.

How do you mean? Return an error?

@priestjim

This comment has been minimized.

priestjim commented Dec 10, 2015

After further digging, I saw that this error should be getting caught here:

try
    stream_loop(Parent, Owner, Ref, Client#client{parser=Parser,
                                                  response_state=on_status})
catch Class:Reason ->

but it's not. I have no idea why!

@jlouis

This comment has been minimized.

Contributor

jlouis commented Dec 10, 2015

This may not be related to the above at all, but has the stream_loop called proc_lib:hibernate/3 at any point? That would explain the problem, since if the process hibernates, that catch is effectively gone forever from the call stack. I don't know exactly what would happen with the backtrace from get_stracktrace() though, so it could be non-related. I'm somewhat positive that code will never work as intended once the process hibernates, however.

@priestjim

This comment has been minimized.

priestjim commented Dec 10, 2015

I think it does in multiple points. Lines 99, 140 and 241 can all be called while in the stream_loop function.

@benoitc is there any critical reason to hibernate at those points?
@jlouis you saved the day again :)

@jlouis

This comment has been minimized.

Contributor

jlouis commented Dec 10, 2015

But does it happen in your case that they are called? It could explain the problem you are seeing. Add a helper function to the module which calls hibernate, then use dbg to trace if that function gets called in production, I'd say. This should at least make it feasible to check for the case.

@priestjim

This comment has been minimized.

priestjim commented Dec 10, 2015

I am almost sure it does. The problem is that this happens only on extremely high number of requests, trying it locally never triggers the issue. I'll give it a shot though.

@benoitc

This comment has been minimized.

Owner

benoitc commented Dec 10, 2015

@priestjim all this code should be simpler. I some code on the machine for it. I will try to finish the new_pool branch and do a release until the week-end with it. The general idea is to simplify the loop and the flow between the requester and the stream.

@jlouis i guess hackney_trace when enabled should be able to catch it. It may worth a try.

@priestjim

This comment has been minimized.

priestjim commented Dec 13, 2015

Please do, we're dying over here - we got ~350K calls/sec at one point :)

@benoitc

This comment has been minimized.

Owner

benoitc commented Dec 13, 2015

@priestjim should land on tuesday, i made good progress on it :)

@priestjim

This comment has been minimized.

priestjim commented Jan 4, 2016

@benoitc any progress on the new pool arch? Need a hand?

@benoitc

This comment has been minimized.

Owner

benoitc commented Jan 25, 2016

@priestjim #276 is the ticket to track it. It should land sometimes next week. In the mean time I may have a quick fix about the hard limit.

@benoitc benoitc added the connection label May 19, 2017

@benoitc benoitc added this to Connection in Development May 19, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment