New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logreader/logwriter: fix race condition of idle timer and scheduled I/O job #2650
Conversation
Build FAILURE |
@kira-syslogng retest this please; |
Build SUCCESS |
579dac4
to
ed7781e
Compare
Build SUCCESS |
@alltilla kindly pointed out that we should make LogReader and LogWriter symmetric. I'll add it soon and rerun some additional tests. marking as WIP until. |
85cb7fa
to
e1ee285
Compare
Build FAILURE |
@kira-syslogng retest this please |
Build SUCCESS |
There is a race condition between scheduled I/O operations in worker threads and closing the idle connections (response/ack_timeout timer). The idle timer is not stopped, just when an I/O operation finished and we would like to re-schedule the next action on source side (update_watches()). The timer can trigger a connection closing, although there is schedule I/O operation. Timer should be stopped when we schedule an I/O. This way we eliminate the race condition between 2 sides. We do stop I/O polling (see calling log_reader_stop_watches() in log_reader_io_handle_in(), also log_writer_stop_watches() in log_writer_io_handler()). About stop_idle_timer cals in update_watches: Initially I thought that calling idle_timer is unnecessary in update_watches, since we stop the timer in io_handle. I've ven put an assertion to the timer registration, disabling timer re-registering. I forgot that in case of suspend, we will call an update_watches again, where an idle_timer registration will fail due to the assertion. I've put bach the idle timer stopping. Signed-off-by: Gabor Nagy <gabor.nagy@balabit.com>
e1ee285
to
6432a7c
Compare
About stop_idle_timer calls in update_watches: |
Build SUCCESS |
Previous patch eliminates race condition of idle_timer and scheduled IO job. Still adding protection can help in future changes. Signed-off-by: Gabor Nagy <gabor.nagy@balabit.com>
Signed-off-by: Gabor Nagy <gabor.nagy@balabit.com>
Build SUCCESS |
This pull request fixes 1 alert when merging b48c6a6 into b1620af - view on LGTM.com fixed alerts:
Comment posted by LGTM.com |
I have executed all additional test cases and they PASSED. |
There is a race condition between scheduled I/O operations in worker threads and closing the idle connections (response/ack_timeout timer).
There is no protection - in afsocket source-driver - to not destroy connections(readers), which have scheduled I/O operation (not like when we reap writers on file dest. side).
The idle timer is not stopped, just when an I/O operation finished and we would like to re-schedule the next action on source side (update_watches() ).
The timer can trigger a connection closing, although there is schedule I/O operation.
Timer should be stopped when we schedule an I/O.
This way we eliminate the race condition between 2 sides.
We do stop I/O polling (see calling log_reader_stop_watches() in log_reader_io_handle_in()), so stopping the idle timer makes sens.
UPDATE:
About stop_idle_timer cals in update_watches:
Initially I thought that calling idle_timer is unnecessary in update_watches,
since we stop the timer in io_handle. I've ven put an assertion to the timer registration,
disabling timer re-registering.
I forgot that in case of suspend, we will call an update_watches again, where an idle_timer
registration will fail due to the assertion.