New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asio on Linux stalls in epoll() #180
Comments
What does ThreadSanitizer say? |
Good question. I'll give it a shot when I have time to return to this - although I wouldn't be at all surprised if the 5x-10x overhead introduced by ThreadSanitizer prevents the problem from occurring. |
I suspect we dumped on a similar issue in a totally different context, and not as instrumented as this:
|
Thanks @jeremyd-gl for the program to reproduce the issue. This was one tough little nut to crack. I have committed the fix to the asio-1-10-branch (669e6b8) and master branch (47b9319). @pascal-simon, your issue is probably unrelated as you have a single thread. If you can, please try to reduce it to a simple program to reproduce the issue, and then raise a new ticket for it. You might also find asio's handler tracking helpful. |
Nice one, will try!
cheers
j
…On 31/07/2017 23:53, chriskohlhoff wrote:
Thanks @jeremyd-gl <https://github.com/jeremyd-gl> for the program to
reproduce the issue. This was one tough little nut to crack. I have
committed the fix to the asio-1-10-branch (669e6b8
<669e6b8>)
and master branch (47b9319
<47b9319>).
@pascal-simon <https://github.com/pascal-simon>, your issue is
probably unrelated as you have a single thread. If you can, please try
to reduce it to a simple program to reproduce the issue, and then
raise a new ticket for it. You might also find asio's handler tracking
helpful.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#180 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AYILnF57JmFlQQjfQJ9_1PJ9PDUdorHRks5sTkzKgaJpZM4LrDfN>.Web
Bug from
https://github.com/notifications/beacon/AYILnO-NKZIs2-L--f-IiR1A2b5gL_sdks5sTkzKgaJpZM4LrDfN.gif
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/chriskohlhoff/asio","title":"chriskohlhoff/asio","subtitle":"GitHub
repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open
in
***@***.***
in #180: Thanks @jeremyd-gl for the program to reproduce the issue.
This was one tough little nut to crack. I have committed the fix to
the asio-1-10-branch (669e6b8) and
master branch
***@***.***, your
issue is probably unrelated as you have a single thread. If you can,
please try to reduce it to a simple program to reproduce the issue,
and then raise a new ticket for it. You might also find asio's handler
tracking helpful."}],"action":{"name":"View
Issue","url":"#180 (comment)"}}}
--
Online Development - C++ Software Engineer
Jeremy Davies
C++ Software Engineer
Carrer Nàpols, 249 6ª planta
08013 BARCELONA
E-mail : jeremy.davies@gameloft.com
<mailto:jeremy.davies@gameloft.com>
<mailto:jeremy.davies@gameloft.com>
Gameloft le informa que este mensaje, la información que contiene y
cualesquiera archivos que adjunte se dirigen exclusivamente a su
destinatario, estando prohibida por Ley su distribución, copia o
utilización con cualquier finalidad. Si recibe este mensaje por error,
le rogamos que proceda a su eliminación definitiva y nos lo comunique
inmediatamente a través de correo electrónico.
Todos los datos recibidos por esta vía serán almacenados, pudiendo ser
incluidos en un fichero de nuestra responsabilidad, donde se mantendrán
bajo la más estricta confidencialidad, con la finalidad de atender su
solicitud y dar cumplimiento a los lícitos objetivos de la empresa.
Puede ejercitar sus derechos de acceso, rectificación, cancelación y
oposición, en los términos que reconoce la Ley Orgánica 15/1999 de
Protección de Datos, mediante petición escrita dirigida a la siguiente
dirección de correo electrónico soporte.gameloft.iberica@gameloft.com
<mailto:soporte.gameloft.iberica@gameloft.com>
|
Is this also affecting non standalone ASIO? Older versions of Boost maybe? I'm facing a similar issue with Boost 1.55.0 and I wonder if I got into the same issue.. |
It affects boost 1.55.0 and 1.60.0 (likely any other versions except 1.65.0 on which the fix has been commited). |
[ As posted on StackOverflow - http://stackoverflow.com/questions/41804866/asio-on-linux-stalls-in-epoll ]
We're experiencing a problem with asynchronous operation of standalone (non-Boost) Asio 1.10.6 on Linux, which is demonstrated using the following test app:
We compile as follows (the problem only occurs in optimised builds):
We're running on Debian Jessie.
uname -a
reports(Linux <hostname> 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux
.The problem appears under both GCC (
g++ (Debian 4.9.2-10) 4.9.2
) and Clang (Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)
).In summary, the test app does the following:
We create N connections, each consisting of an inbound (listening)
end and an outbound (connecting) end. Each inbound listener is bound
to a unique port (starting at 25000), and each outbound connector
uses a system-selected originating port.
The inbound end performs an
async_accept
, and oncompletion issues an
async_read
. When the read completes it issuesanother async_read that we expect to return
eof
. When thatcompletes, we either free the socket immediately, or leave it as-is
(with no pending async operations) to be cleaned up by the relevant
destructors at program exit. (Note that the listener socket is
always left as-is, with no pending accept, until exit.)
The outbound end performs an
async_connect
, and on completion issuesan
async_write
. When the write completes it issues ashutdown
(specifically,
shutdown(both)
) followed by anasync_read
that weexpect to return
eof
. On completion, we once again either leave thesocket as-is, with no pending operations, or we free it immediately.
Any error or unexpected receive data results in an immediate
abort()
call.
The test app lets us specify the number of worker threads for the
io_service
, as well as the total number of connections to create, aswell as flags controlling whether inbound and outbound sockets
respectively are freed or left as-is.
We run the test app repeatedly, specifying 50 threads and 1000
connections.
i.e.
while ./example 50 1000 n y >out.txt ; do echo -n . ; done
If we specify that all sockets are left as-is, the test loop runs indefinitely. To avoid muddying the waters with
SO_REUSEADDR
considerations, we take care that no sockets are inTIME_WAIT
state from a previous test run before we start the test, otherwise the listens can fail. But with this caveat satisfied, the test app runs literally hundreds, even thousands of times with no error. Similarly, if we specify that inbound sockets (but NOT outbound sockets) should be explicitly freed, all runs fine.However, if we specify that outbound sockets should be freed, the app stalls after a variable number of executions - sometimes ten or fewer, sometimes a hundred or more, usually somewhere in between.
Connecting to the stalled process with GDB, we see that the main thread is waiting to join the worker threads, all but one of the worker threads are idle (waiting on an Asio internal condition variable), and that one worker thread is waiting in Asio's call to
epoll()
. The internal trace instrumentation verifies that some of the sockets are waiting on async operations to complete - sometimes the initial (inbound) accept, sometimes the (inbound) data read, and sometimes the final inbound or outbound reads that normally complete witheof
.In all cases, the other end of the connection has successfully done its bit: if the inbound accept is still pending, we see that the corresponding outbound connect has successfully completed, along with the outbound write; likewise if the inbound data read is pending, the corresponding outbound connect and write have completed; if the inbound EOF read is pending, the outbound shutdown has been performed, and likewise if the outbound EOF read is pending, the inbound EOF read has completed due to the outbound shutdown.
Examining the process's /proc/N/fdinfo shows that the epoll file descriptor is indeed waiting on the file descriptors indicated by the instrumentation.
Most puzzlingly,
netstat
shows nonzero RecvQ sizes for the waiting sockets - that is, sockets for which there is a read operation pending are shown to have receive data or close events ready to read. This is consistent with our instrumentation, in that it shows that write data has been delivered to the inbound socket, but has not yet been read (or alternatively that the outbound shutdown has issued a FIN to the inbound side, but that the EOF has not yet been 'read').This leads me to suspect that Asio's
epoll
bookkeeping - in particular its edge-triggered event management - is getting out of sync somewhere due to a race condition. Clearly this is more than likely due to incorrect operations on my part, but I can't see where the problem would be.All insights, suggestions, known issues, and pointing-out-glaring-screwups would be greatly appreciated.
The text was updated successfully, but these errors were encountered: