-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connections not returned to the pool on connection errors #643
Comments
Erlang 23 [erts-11.0.2 |
do you read the body or skip it explicitly? if the connection is not closed or the body not read (or skipped which will flush it) the socket will not be released to the pool. |
i believe @InoMurko is saying there is no body to be read because the connection is refused. should it get released to the pool when the connection is refused? certifi 2.5.2 code: (sorry for elixir code. if you need this translated to erlang, i will do my best)
output:
|
what do you mean by connection refused? socket level or HTTP level? |
in that case it should indeed release the socket. Sounds like something new, I will check. |
I have similar symptoms after upgrading hackney from 1.15.1 to 1.16.0. hackney_pool:get_stats(default). I check the log, and the in_use_count matches the number of {error, timeout} returned by:
|
I have same problems as @kape1395 describes. |
I think I see the issue. I will have a look tomorrow on it, thanks for the report |
@benoitc easy way to replicate is to create a pool with max 4-5 connetions and them make requests to some servers that are stopped. First N connections (where N is the pool size) will fail right away with econnrefused reason and all subsequent one with checkout_timeout |
I believe we have this same issue. We started running into occasional We added logging from It appears that the pool is gradually filling (until it drops to Our requests are all made via |
@losvedir right no to workaround this problem I restart the pool when I receive checkout_timeout. |
@silviucpp - How are you restarting the pool? I put up a branch of our app that once a minute logs the pool stats, and stops the pool if the stats = :hackney_pool.get_stats(pool)
if stats[:in_use_count] >= @connection_pool_in_use_threshold do
Logger.warn("stopping_pool pool=#{pool}")
:hackney_pool.stop_pool(pool)
end Also, I haven't been able to recreate the behavior in a minimal case. I put up this repo which is an app that starts up @benoitc - are you still looking into this? You said "I think I see the issue." If you have an idea what it might be, I'd be happy to try to replicate the behavior in my example repo to try to get a consistently failing example for you to work with. In the meantime, I think my only recourse is to switch back to |
Hello In my case when
And retry the query. Silviu |
I was preparing this comment while testing, so some areas were added during the test, anyway unto the actual comment. I'm digging into this issue at the moment, what I've found so far is the connections are not even in the pool: I'm looking at the pool checkout process at the moment https://github.com/benoitc/hackney/blob/master/src/hackney_pool.erl#L86-L133 I inserted some Issuing a request then causes an
After some more testing and tweaking It seems the error related failures can be fixed by sticking a %% in hackney_pool.erl
do_checkout(Requester, Host, _Port, Transport, #client{options=Opts,
mod_metrics=Metrics}=Client, ConnectTimeout, CheckoutTimeout) ->
{Connection, ConnectOptions} = hackney_connection:new(Client),
RequestRef = Client#client.request_ref,
PoolName = proplists:get_value(pool, Opts, default),
Pool = find_pool(PoolName, Opts),
case catch gen_server:call(Pool, {checkout, Connection, Requester, RequestRef}, CheckoutTimeout) of
{ok, Socket, Owner} ->
%% stats
?report_debug("reuse a connection", [{pool, PoolName}]),
_ = metrics:update_meter(Metrics, [hackney_pool, PoolName, take_rate], 1),
_ = metrics:increment_counter(Metrics, [hackney_pool, Host, reuse_connection]),
{ok, {PoolName, RequestRef, Connection, Owner, Transport}, Socket};
{error, no_socket, Owner} ->
?report_trace("no socket in the pool", [{pool, PoolName}]),
Begin = os:timestamp(),
case hackney_connection:connect(Connection, ConnectOptions, ConnectTimeout) of
{ok, Socket} ->
case hackney_connection:controlling_process(Connection, Socket, Requester) of
ok ->
?report_trace("new connection", []),
ConnectTime = timer:now_diff(os:timestamp(), Begin)/1000,
_ = metrics:update_histogram(Metrics, [hackney, Host, connect_time], ConnectTime),
_ = metrics:increment_counter(Metrics, [hackney_pool, Host, new_connection]),
{ok, {PoolName, RequestRef, Connection, Owner, Transport}, Socket};
Error ->
'Elixir.IO':inspect({pool, connect_error_controlling, Error}),
catch hackney_connection:close(Connection, Socket),
cancel_checkout(Pool, Connection, RequestRef),
_ = metrics:increment_counter(Metrics, [hackney, Host, connect_error]),
Error
end;
{error, timeout} ->
_ = metrics:increment_counter(Metrics, [hackney, Host, connect_timeout]),
cancel_checkout(Pool, Connection, RequestRef),
{error, timeout};
Error ->
?report_trace("connect error", []),
'Elixir.IO':inspect({pool, connect_error, Error}),
_ = metrics:increment_counter(Metrics, [hackney, Host, connect_error]),
cancel_checkout(Pool, Connection, RequestRef),
Error
end;
{error, Reason} ->
{error, Reason};
{'EXIT', {timeout, Reason}} ->
% socket will still checkout so to avoid deadlock we send in a cancellation
cancel_checkout(Pool, Connection, RequestRef),
{error, checkout_timeout}
end.
cancel_checkout(Pool, Connection, RequestRef) ->
gen_server:cast(Pool, {checkout_cancel, Connection, RequestRef}). The above change seems to fix the Regarding the valid request failure The issue was not reading off the request body to complete the transaction: for _ <- 1..60 do
result = {:ok, 200, headers, body_ref} = :hackney.request("http://localhost:7654")
_ = :hackney.body(body_ref) #< the important bit
IO.inspect {:response, result}
IO.inspect {:pool_stats, :hackney_pool.get_stats(:default)}
end |
…ailed downloads, the pool is (permanently?) exhausted Without this, after we hit a few HTTP errors (which of course NOAA serves us a lot, that's the whole point of this proxy), all (?) future downloads fail with this error: {:error, %HTTPoison.Error{id: nil, reason: :checkout_timeout}} Hackney issue: edgurgel/httpoison#414 More info: benoitc/hackney#643
Fixed issue #643 and memory leak in hackney_pool
Thanks everyone!!! <3 |
So firstly, I did the same thing using these:
Connections were returned to the pool correctly (note in_use_count remains 0).
Now I bumped httpoison (and everything else with it).
The story here is that I retry establishing a http connection until I succeed. But in this case, the pool gets maxed out because
in_use_count
keeps on growing on{:error, :econnrefused}
errors.The text was updated successfully, but these errors were encountered: