Blocking queue for reader #10206

JiayiFeng · 2018-04-25T12:46:52Z

This PR replaces the invocations of framework::Channel in readers with new implementation of reader::BlockingQueue.

Why not keep using `framework::Channel`?

framework::Channel is a great idea. It provides rich functionality and the usage of only a small part of it has made the developing of readers quite easy. However, the implementation of framework::Channel is not very stable so far. It crashes in GCC 4.8.2 environment and suffers occasional deadlock. So we implemented the reader::BlockingQueue, which can be regarded as an extremely simplified framework::Channel. It has similar interfaces with framework::Channel while only provides features that readers really need. Its conciseness makes itself easy to maintain, and also it runs a little faster than frameowrk::Channel (892s vs 910s, 1000 batches in the transformer job).

Why not merge `framework::BlockingQueue` and `reader::BlockingQueue`?

These two blocking queues have different customized features. The framework::BlockingQueue is mainly used in the ParallelExecutor, so it supports extend operation and timeout mechanism. The framework::BlockingQueue is only used in Readers. As a replacement of framework::Channel, it supports capacity limitation and closing mechanism.
It's hard to implement all these features in a single blocking queue.

… blocking_queue_for_reader

tpatejko · 2018-04-25T14:52:07Z

@JiayiFeng just a quick question. I noticed that there is already an implementation of blocking queue in fluid: fluid/framework/blocking_queue.h.

What is the difference between the blocking queue in this PR and the blocking queue that is already used in PaddlePaddle?

The PR replaces framework::channel abstraction with new implementation of blocking queue. Can this abstraction be implemented with existing blocking queue and additional functionality needed by framework::channel?

JiayiFeng · 2018-04-25T16:50:52Z

Hi @tpatejko, I have just updated the PR description. And maybe your questions can be answered by it.

tonyyang-svail · 2018-04-25T19:11:20Z

paddle/fluid/operators/reader/create_double_buffer_reader_op.cc

@@ -58,7 +58,7 @@ class DoubleBufferReader : public framework::DecoratedReader {
  bool HasNext() const;

  void StartPrefetcher() {
-    channel_ = framework::MakeChannel<size_t>(kChannelSize);
+    channel_ = new reader::BlockingQueue<size_t>(kChannelSize);


If you name the class BlockingQueue, please name the variable blocking_queue_

I think channel is a better name. It is implemented with blocking queue while it is essentially a channel.

wangkuiyi · 2018-04-25T19:55:46Z

This PR replaces framework::Channel in C++ readers with new implementation of reader::BlockingQueue.

It seems that this PR DOESN'T replace framework::Channel. Instead, it replaces the invocations of framework::Channel in reader/*.

wangkuiyi · 2018-04-25T19:58:13Z

paddle/fluid/operators/reader/blocking_queue.h

+namespace reader {
+
+template <typename T>
+class BlockingQueue {


Here we need a class comment, something like

// BlockingQueue is for buffered reading and is supposed to use only the reader package. It is true that we could and we should have been using framework::Channel, but which has currently a deadlock bug. BlockingQueue is a workaround and a simplified version of framework::Channel as it doesn't support GPU and it implements on buffered blocking queue.

Done. Thanks!

wangkuiyi · 2018-04-25T20:00:18Z

paddle/fluid/operators/reader/blocking_queue.h

+    return closed_;
+  }
+
+  bool CanSend() {


I am afraid that CanSend is useless. Please correct me if I am wrong.

In my mind, the usage of CanSend is

if (q.CanSend()) { q.Send(...); }

However, between the invocation of CanSend and Send, there could be some other threads who wrote something into the queue and made it no longer CanSend.

If I am right, please delete CanSend.

It's correct that CanSend and CanReceive are not thread-safe. They are offered for two reasons:

framework::Channel has these two interfaces. As a replacement of framework::Channel, reader::BlockingQueue would better to have similar interfaces. It is possible that we reuse the framework::Channel in readers in the future. Similar interfaces can reduce the migration workload.

In current implementations of all c++ readers, CanSend and CanReceive are invoked single-threaded. So no bug is caused.

However, I agree that removing them is a better choise. Hidden troubles should be eliminated at the very start.

wangkuiyi · 2018-04-25T20:00:39Z

paddle/fluid/operators/reader/blocking_queue.h

+    return !closed_ && queue_.size() < capacity_;
+  }
+
+  bool CanReceive() {


Similar comments as the one I gave to CanSend.

wangkuiyi · 2018-04-25T20:02:59Z

paddle/fluid/operators/reader/blocking_queue.h

+    std::unique_lock<std::mutex> lock(mutex_);
+    send_cv_.wait(lock, [&] { return queue_.size() < capacity_ || closed_; });
+    if (closed_) {
+      return false;


Here we might need to VLOG a warning in addition to returning false because sending to a closed channel is very likely a bug.

Great idea!

wangkuiyi · 2018-04-25T20:03:10Z

paddle/fluid/operators/reader/blocking_queue.h

+    send_cv_.wait(lock, [&] { return queue_.size() < capacity_ || closed_; });
+    if (closed_) {
+      return false;
+    } else {


It looks to me that we don't need this else.

wangkuiyi · 2018-04-25T20:08:52Z

paddle/fluid/operators/reader/reader_blocking_queue_test.cc

+  EXPECT_FALSE(q.CanSend());
+}
+
+void FirstInFirstOut(size_t queue_cap, size_t elem_num, size_t send_time_gap,


What is the point of having send_time_gap and receive_time_gap? Can we remove them?

In unit tests, sender threads start before receiver threads. If we don't set the time_gap, all sender threads may have finished before receiver threads starting. Bugs that only appear when senders and receivers run concurrently will not be found.

wangkuiyi · 2018-04-25T21:40:33Z

paddle/fluid/operators/reader/blocking_queue.h

+      send_cv_.notify_one();
+      return true;
+    } else {
+      PADDLE_ENFORCE(closed_);


这里的假设是不是 —— “如果队列空，就必须是已经closed的”？

如果是，那么这个假设不合理吧。队列刚创立的时候是空的；此时，如果在writer写入数据之前，reader调用了Receive，难道就crash了？

bool Receive(T* elem) { std::unique_lock<std::mutex> lock(mutex_); receive_cv_.wait(lock, [&] { return !queue_.empty() || closed_; }); if (!queue_.empty()) { PADDLE_ENFORCE_NOT_NULL(elem); *elem = queue_.front(); queue_.pop_front(); send_cv_.notify_one(); return true; } else { PADDLE_ENFORCE(closed_); return false; } }

if之前有一个条件变量的wait操作。所以当代码运行到这个if...else...的时候，意味着“队列非空或者队列已被close”。然后如果进入了else分支，说明此时“队列为空”，那么显然，队列已经被close。

您说的一上来直接就receive的情况，这时候队列为空，并且没有close，那么会在前面的条件变量wait处就被阻塞。所以并不会出现您说的问题。

明白了！

wangkuiyi

LGTM！

tpatejko · 2018-04-26T07:47:19Z

@JiayiFeng @wangkuiyi Thanks for clarifying the idea!

JiayiFeng added 4 commits April 25, 2018 19:32

Add reader blocking queue

1a25f3c

Replace Channel in MultiFileReader with BlockingQueue

a786611

Replace Channel in DoubleBufferReader with BlockingQueue

e2ca424

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e057ba6

… blocking_queue_for_reader

Remove unnecessary header files

4cb63d8

tonyyang-svail requested review from abhinavarora and wangkuiyi April 25, 2018 19:05

tonyyang-svail reviewed Apr 25, 2018

View reviewed changes

wangkuiyi requested changes Apr 25, 2018

View reviewed changes

JiayiFeng added 3 commits April 26, 2018 11:11

Follow comments

304b6b7

fix unit test error

17c51d6

fix unit test error

8bd3466

wangkuiyi approved these changes Apr 26, 2018

View reviewed changes

JiayiFeng merged commit 9c7fa6f into PaddlePaddle:develop Apr 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blocking queue for reader #10206

Blocking queue for reader #10206

JiayiFeng commented Apr 25, 2018 •

edited

tpatejko commented Apr 25, 2018 •

edited

JiayiFeng commented Apr 25, 2018

tonyyang-svail Apr 25, 2018

JiayiFeng Apr 26, 2018 •

edited

wangkuiyi commented Apr 25, 2018

wangkuiyi Apr 25, 2018

JiayiFeng Apr 26, 2018

wangkuiyi Apr 25, 2018

JiayiFeng Apr 26, 2018 •

edited

wangkuiyi Apr 25, 2018

wangkuiyi Apr 25, 2018

JiayiFeng Apr 26, 2018

wangkuiyi Apr 25, 2018

JiayiFeng Apr 26, 2018

wangkuiyi Apr 25, 2018

JiayiFeng Apr 26, 2018 •

edited

wangkuiyi Apr 25, 2018

JiayiFeng Apr 26, 2018 •

edited

wangkuiyi Apr 26, 2018

wangkuiyi left a comment

tpatejko commented Apr 26, 2018

Blocking queue for reader #10206

Blocking queue for reader #10206

Conversation

JiayiFeng commented Apr 25, 2018 • edited

Why not keep using framework::Channel?

Why not merge framework::BlockingQueue and reader::BlockingQueue?

tpatejko commented Apr 25, 2018 • edited

JiayiFeng commented Apr 25, 2018

Choose a reason for hiding this comment

JiayiFeng Apr 26, 2018 • edited

Choose a reason for hiding this comment

wangkuiyi commented Apr 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JiayiFeng Apr 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JiayiFeng Apr 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JiayiFeng Apr 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangkuiyi left a comment

Choose a reason for hiding this comment

tpatejko commented Apr 26, 2018

JiayiFeng commented Apr 25, 2018 •

edited

Why not keep using `framework::Channel`?

Why not merge `framework::BlockingQueue` and `reader::BlockingQueue`?

tpatejko commented Apr 25, 2018 •

edited

JiayiFeng Apr 26, 2018 •

edited

JiayiFeng Apr 26, 2018 •

edited

JiayiFeng Apr 26, 2018 •

edited

JiayiFeng Apr 26, 2018 •

edited