Fix hanging Ctrl+C on S3 downloads #673

jamesls · 2014-02-24T19:21:09Z

This is a regression that was introduced when the IO queue/thread
was introduced (#619).

The current shutdown order of the threads is:

Queue end sentinel for IO thread.
Queue end sentinel for result thread.
Queue end sentinel for worker threads.
.join() threads in this order:
[io_thread, result_thread [, worker threads]]

Though the actual thread shutdown order is non-deterministic,
it's fairly common that the threads shutdown in roughly the above
order. This means that the IO thread will generally shutdown before
all the worker threads have shutdown.

However, the download tasks can still be enqueueing writes to the
IO queue. If the IO thread shutsdown there's nothing consuming
writes on the other end of the queue. Given that the queue is
bounded in maxsize, .put() calls to the queue will block until space
becomes available. This will never happen if the IO queue is already
shutdown.

The fix here is to ensure that the IO thread is always the last thing
to shutdown. This means any remaining IO requests will be executed
before the IO thread shutsdown. This will prevent deadlock.

Added unit tests demonstrates this issue. I've also added an
integration test that actually sends a SIGINT to the process
and verifies it exits in a timely manner so can ensure we
don't regress on this again.

Note: some unit/integ tests needed updating because they were
using .call() multiple times.

Fixes #650
Fixes #657

This is a regression that was introduced when the IO queue/thread was introduced (aws#619). The current shutdown order of the threads is: 1. Queue end sentinel for IO thread. 2. Queue end sentinel for result thread. 3. Queue end sentinel for worker threads. 4. .join() threads in this order: [io_thread, result_thread [, worker threads]] Though the actual thread shutdown order is non-deterministic, it's fairly common that the threads shutdown in roughly the above order. This means that the IO thread will generally shutdown before all the worker threads have shutdown. However, the download tasks can still be enqueueing writes to the IO queue. If the IO thread shutsdown there's nothing consuming writes on the other end of the queue. Given that the queue is bounded in maxsize, .put() calls to the queue will block until space becomes available. This will never happen if the IO queue is already shutdown. The fix here is to ensure that the IO thread is always the last thing to shutdown. This means any remaining IO requests will be executed before the IO thread shutsdown. This will prevent deadlock. Added unit tests demonstrates this issue. I've also added an integration test that actually sends a SIGINT to the process and verifies it exits in a timely manner so can ensure we don't regress on this again. Note: some unit/integ tests needed updating because they were using .call() multiple times. Fixes aws#650 Fixes aws#657

danielgtaylor · 2014-02-24T19:37:41Z

tests/integration/customizations/s3/test_plugin.py

+        while time.time() < deadline:
+            rc = process.poll()
+            if rc is not None:
+                break


Does it make sense to put a short time.sleep here? How long does this test typically take to run? Up to 30 seconds seems like a long time to busy loop.

danielgtaylor · 2014-02-24T19:39:18Z

Great find and fix. LGTM 🚢-it.

toastdriven · 2014-02-24T23:05:15Z

LGTM as well.

Fallout from removal of .done and .interrupt.

It's possible that between the isdir() check and the makedirs() check that another thread has come along and created the directory. We also need to make sure that any uncaught exception will cause the download to be cancelled.

jamesls · 2014-02-25T01:30:10Z

I've added some addition fixes. @toastdriven mind taking a look?

toastdriven · 2014-02-25T01:37:30Z

Still LGTM.

danielgtaylor reviewed Feb 24, 2014
View reviewed changes

jamesls added 3 commits February 24, 2014 16:30

Use only QUEUE_END_SENTINEL for triggering shutdowns

37933ff

Fix failing unittests

86969ec

Fallout from removal of .done and .interrupt.

Fix race condition in CreateLocalFileTask

d2cb65c

It's possible that between the isdir() check and the makedirs() check that another thread has come along and created the directory. We also need to make sure that any uncaught exception will cause the download to be cancelled.

jamesls mentioned this pull request Feb 25, 2014

sync and cp hangs with a large amount of files #657

Closed

jamesls merged commit d2cb65c into aws:develop Feb 25, 2014

jamesls deleted the s3-hang branch June 23, 2014 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix hanging Ctrl+C on S3 downloads #673

Fix hanging Ctrl+C on S3 downloads #673

jamesls commented Feb 24, 2014

danielgtaylor Feb 24, 2014

danielgtaylor commented Feb 24, 2014

toastdriven commented Feb 24, 2014

jamesls commented Feb 25, 2014

toastdriven commented Feb 25, 2014

Fix hanging Ctrl+C on S3 downloads #673

Fix hanging Ctrl+C on S3 downloads #673

Conversation

jamesls commented Feb 24, 2014

danielgtaylor Feb 24, 2014

Choose a reason for hiding this comment

danielgtaylor commented Feb 24, 2014

toastdriven commented Feb 24, 2014

jamesls commented Feb 25, 2014

toastdriven commented Feb 25, 2014