Load balanced thread pool #5

erikwijmans · 2021-02-13T18:27:51Z

Use a work queue to better load balanced the thread pool. I also added the ability to reset the envs in parallel and the vulkan rendering in parallel since it seems to be just OpenGL that doesn't like that. This doesn't really matter for the async RL setting as the extra latency due to bad load balancing doesn't matter. I made a kinda sync setting by setting SF to 1 worker and 1 env per worker, there FPS improves from 4.5k to 4.7k, so this helps, but not much.

alex-petrenko · 2021-02-15T10:45:25Z

src/libs/env/src/vector_env.cpp

+                              while ((envIdx = nextTaskQueue.fetch_add(
+                                          1, std::memory_order_acq_rel)) <
+                                     numEnvs) {
+                                  taskFunc(task, envIdx);


The formatter is super aggressive. Probably set to 80 chars per line?

alex-petrenko

Fantastic! Thanks a lot.
Maybe it does not help much currently but it might definitely help in the future when other types of envs are added.

alex-petrenko · 2021-02-15T10:46:13Z

src/libs/env/src/vector_env.cpp

+    if (renderer.isVulkan()) {
+        renderer.reset(*envs[envIdx], envIdx);
+        renderer.preDraw(*envs[envIdx], envIdx);
+    }


That's good, I guess I never bothered to check. This actually means that we don't even have to fix this, because who cares about OpenGL renderer.

alex-petrenko · 2021-02-15T10:46:55Z

src/libs/env/src/vector_env.cpp

    envs[envIdx]->reset();
+
+    // The vulkan renderer is fine with being reset in parallel
+    if (renderer.isVulkan()) {


I'd prefer if this was called supportsParallelReset, but 99.99% we will never have another renderer :D

alex-petrenko · 2021-02-15T10:48:44Z

src/libs/env/src/vector_env.cpp

+    numReady.store(0, std::memory_order_relaxed);
+    nextTaskQueue.store(0, std::memory_order_relaxed);
+    std::atomic_thread_fence(std::memory_order_release);


Is this to use one fence for 2 instructions?

Yeah, and the fence will also synchronize other things that aren't part of the atomic if needed.

alex-petrenko · 2021-02-15T10:50:42Z

src/libs/env/src/vector_env.cpp

+        if (envs[envIdx]->isDone()) {
+            done[envIdx] = true;
+            for (int agentIdx = 0; agentIdx < envs[envIdx]->getNumAgents();
+                 ++agentIdx)
+                trueObjectives[envIdx][agentIdx] =
+                    envs[envIdx]->trueObjective(agentIdx);
+
+            resetEnv(envIdx);
+        } else {
+            done[envIdx] = false;
+        }
+    }
+}


I think this should be in stepEnv()

alex-petrenko · 2021-02-15T10:55:42Z

src/libs/env/src/vector_env.cpp

+
+                                  cvTask.wait(lock, [this, &threadIdx] {
+                                      return currTasks[threadIdx] != Task::IDLE;
+                                  });
+                                  task = currTasks[threadIdx];
+
+                                  currTasks[threadIdx] = Task::IDLE;
+                              }
+
+                              int envIdx = 0;
+                              while ((envIdx = nextTaskQueue.fetch_add(
+                                          1, std::memory_order_acq_rel)) <
+                                     numEnvs) {


That's pretty cool.
So we're using the expensive mutex/cond only once per cycle to wake up the thread, the rest is handled in the while loop. I quite like this

alex-petrenko · 2021-02-15T10:58:22Z

src/libs/env/src/vector_env.cpp

+    while (numReady.load(std::memory_order_acquire) < numThreads - 1)
+        asm volatile("pause" ::: "memory");


Oh yeah, that's fair.

You can thank Brennan for this one! He taught me this.

alex-petrenko · 2021-02-15T10:59:03Z

src/libs/env/src/vector_env.cpp

+    int envIdx = 0;
+    while ((envIdx = nextTaskQueue.fetch_add(1, std::memory_order_acq_rel)) <
+           int(envs.size())) {
+        taskFunc(task, envIdx);
+    }


same loop as in the thread func. Perhaps can be a function.

erikwijmans added 3 commits February 12, 2021 13:02

Use a load balanced thread pool

30e5c55

That's better

31e8427

Fix slight change in logic

28ab226

erikwijmans requested a review from alex-petrenko February 13, 2021 18:27

alex-petrenko reviewed Feb 15, 2021

View reviewed changes

alex-petrenko approved these changes Feb 15, 2021

View reviewed changes

alex-petrenko merged commit c217348 into master Feb 15, 2021

erikwijmans deleted the thread-pool branch February 15, 2021 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load balanced thread pool #5

Load balanced thread pool #5

erikwijmans commented Feb 13, 2021

alex-petrenko Feb 15, 2021

alex-petrenko left a comment

alex-petrenko Feb 15, 2021

alex-petrenko Feb 15, 2021

alex-petrenko Feb 15, 2021

erikwijmans Feb 15, 2021

alex-petrenko Feb 15, 2021

alex-petrenko Feb 15, 2021

alex-petrenko Feb 15, 2021

erikwijmans Feb 15, 2021

alex-petrenko Feb 15, 2021

		while (numReady.load(std::memory_order_acquire) < numThreads - 1)
		asm volatile("pause" ::: "memory");

Load balanced thread pool #5

Load balanced thread pool #5

Conversation

erikwijmans commented Feb 13, 2021

Choose a reason for hiding this comment

alex-petrenko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment