[RUNTIME] Support setting CPU affinity #1403

eqy · 2018-07-08T22:51:12Z

Currently one common use mode of tvm is to use all the available cores on a system, as defined in MaxConcurrency. On asymmetric or heterogeneous multicores (e.g., big.LITTLE), however, we may want to specify a subset of CPU cores (with a preference for the CPU core type).

This PR introduces a new global function to change the runtime configuration runtime.config_threadpool which allows the user to specify a CPU type as well as the number of threads to use.
arguments:
mode : {big, little, default}
nthreads: {0, 1, 2, ...}
Mode will select the preferred CPU type (based on CPU clock rate). "big" will prefer higher clocked cores first, while "little" will prefer lower clocked cores first. Choosing "default" will not change the CPU affinity preference order from the system's CPU id ordering.
Nthreads selects the number of CPUs to use, picking CPUs to use in the order specified by mode.
If a value of 0 is picked, all of the cores of the preferred mode will be used.

Example:
System has 4x Cortex A-53 (LITTLE) + 2x Cortex A-72 (big).
cpuid 0: LITTLE
cpuid 1: LITTLE
cpuid 2: LITTLE
cpuid 3: LITTLE
cpuid 4: big
cpuid 5: big

config_threadpool("big", 1) -> changes affinity order to [4,5,1,2,3], will use CPU 4
config_threadpool("big", 2) -> changes affinity order to [4,5,1,2,3], will use CPUs 4, 5
config_threadpool("big", 3) -> changes affinity order to [4,5,1,2,3], will use CPUs 4,5,0
config_threadpool("big", 0) -> changes affinity order to [4,5,1,2,3], will use CPUs 4,5

config_threadpool("little", 0) ->leaves affinity order as [0,1,2,3,4,5], will use CPUs 0,1,2,3
config_threadpool("little", 5) ->leaves affinity order as [0,1,2,3,4,5], will use CPUs 0,1,2,3,4

config_threadpool("default", 0) ->leaves affinity order as [0,1,2,3,4,5], will use CPUs 0,1,2,3,4,5

Note that if config_threadpool is not called, the default behavior remains the same as in current tvm.

On a few toy experiments, we can observe that restricting the runtime to 2x A-72 cores can actually outperform 4x A-53 PLUS 2x A-72 cores due to the current static work partitioning scheme.

misc:
I do not know the current standard for docstrings of TVM_REGISTER_GLOBAL functions, let me know where that should go.

Currently this scheme will fix CPU affinity once the runtime spawns the threads for execution. We can modify the scheme to support switching affinity dynamically within the same runtime instance, but this requires considering how we want to handle migrating existing threads, as the existing threading backend implementation only sets thread affinity at thread creation time.

tqchen · 2018-07-09T02:10:27Z

src/runtime/thread_pool.cc

+    std::vector<std::pair <unsigned int, int64_t>> max_freqs;
+    std::vector<unsigned int> sorted_order;
+
+    for (unsigned int i = 0; i < threads; i++) {


The frequency query need to happen in the threading_backend.cc

Restriction: threadpool cannot have IO related stuffs to accommodate SGX

tqchen · 2018-07-09T02:11:18Z

src/runtime/thread_pool.cc

+    std::vector<unsigned int> sorted_order;
+
+    for (unsigned int i = 0; i < threads; i++) {
+        snprintf(filepath, sizeof(filepath),


use std::ostringstream to construct the filename

tqchen · 2018-07-09T02:11:55Z

src/runtime/thread_pool.cc

+                    LOG(WARNING) << "CPU max frequency info empty!";
+                max_freqs.push_back(std::make_pair(i, cur_freq));
+        } else {
+            LOG(WARNING) << "failed to read CPU max frequency!";


if the file is not available, consider add a default value

tqchen · 2018-07-09T02:12:15Z

src/runtime/thread_pool.cc

+    ThreadPool::preferred_num = preferred_num;
+}
+
+void setNthreadsPref(int nthreads) {


Style: Use CamelCase for functions

tqchen · 2018-07-09T02:12:40Z

src/runtime/thread_pool.cc

+.set_body([](TVMArgs args, TVMRetValue* rv) {
+    std::string mode = args[0];
+    int nthreads = args[1];
+    if (mode == "big") {


use integer enums to minimize runtime overhead.
0 -> default, +1, big, -1: little

I don't know how enum types work over the Python ffi; do you mean just use an actual integer here?

tqchen · 2018-07-09T02:14:30Z

src/runtime/thread_pool.cc

@@ -242,14 +244,18 @@ class SpscTaskQueue {
 class ThreadPool {
 public:
  ThreadPool(): num_workers_(tvm::runtime::threading::MaxConcurrency()) {
+    if (nthreads != 0) {
+        num_workers_ = nthreads;


This seems to be bad and cannot allow dynamic switching

this is not necessary, the num_workers are set directly, we just need to have a num_threads_(restricted to be smaller than num_workers)

(basically was because I wasn't sure if mutating the threads vector was safe, same as previous case)

tqchen · 2018-07-09T02:21:46Z

include/tvm/runtime/threading_backend.h

    */
  ThreadGroup(int num_workers,
              std::function<void(int)> worker_callback,
-              bool exclude_worker0 = false);
+              bool exclude_worker0 = false,


If the SetAffinity is exposed in some way in ThreadingBackend, we do not need to call these arguments here. The affinity setting function can be called in a second round

same as above, if SetAffinity needs to change (add/remove threads, is that safe?)

tqchen · 2018-07-09T02:23:11Z

src/runtime/thread_pool.cc

+}
+
+TVM_REGISTER_GLOBAL("runtime.config_threadpool")
+.set_body([](TVMArgs args, TVMRetValue* rv) {


There are two part of things that matter in here. The affinity order configuration need to happen inside the threading backend, the nthread configuration need to happen here.

tqchen · 2018-07-09T02:24:05Z

src/runtime/thread_pool.cc

+    }
+
+    auto max = [] (std::pair<unsigned int, int64_t> a, std::pair<unsigned int, int64_t> b) {
+        return a.second > b.second;


No need to have two ways, sort CPU by descending frequency and always use this information

tqchen · 2018-07-09T02:25:13Z

include/tvm/runtime/threading_backend.h

    */
  ThreadGroup(int num_workers,
              std::function<void(int)> worker_callback,
-              bool exclude_worker0 = false);
+              bool exclude_worker0 = false,
+              std::vector<unsigned int> *affinity_order = NULL);
  ~ThreadGroup();


Consider add

// mode: big, little, default
// nthread can be 0, which takes the maximum number
// returns the real number of threads needed
int ConfigureAffinity(bool exclude_worker0, int mode, int nthread);

tqchen · 2018-07-10T01:55:48Z

src/runtime/thread_pool.cc

+    int mode = args[1];
+    int nthreads = args[2];
+    std::vector<unsigned int> sorted_order;
+    unsigned int num_workers_used = threading::configThreadGroup(mode, nthreads, &sorted_order);


This can be directly merged into ThreadLocal->UpdateWorkerConfig(mode, threads);
Which then calls thread_group_->Configure(exclude_worker0, mode, nthreads);

There is no need to pass sorted list back in to the threadpool

tqchen · 2018-07-10T03:01:09Z

src/runtime/threading_backend.cc

+
+  // big or LITTLE
+  if (mode) {
+    for (unsigned int i = 0; i < threads; ++i) {


read the result once and store the sorted result inside the thread group

tqchen · 2018-07-10T03:01:38Z

src/runtime/threading_backend.cc

@@ -124,6 +147,62 @@ int MaxConcurrency() {
  return std::max(max_concurrency, 1);
 }

+
+unsigned int configThreadGroup(int mode, int nthreads, std::vector<unsigned int> *sorted_order) {


no need to return sorted order back, we can just store the order inside the ThreadGroup and use it to support SetAffinity(Configure)

tqchen · 2018-07-10T03:02:34Z

src/runtime/threading_backend.cc

@@ -124,6 +147,62 @@ int MaxConcurrency() {
  return std::max(max_concurrency, 1);
 }

+
+unsigned int configThreadGroup(int mode, int nthreads, std::vector<unsigned int> *sorted_order) {


Use CamelCaseFUnctionName, this can be member of ThreadGroup and combined with SetAffinity

tqchen · 2018-07-10T05:15:13Z

src/runtime/thread_pool.cc

-    std::vector<unsigned int> sorted_order;
-    unsigned int num_workers_used = threading::configThreadGroup(mode, nthreads, &sorted_order);
+  void UpdateWorkerConfig(int mode, int nthreads) {
+    unsigned int num_workers_used = threading::ConfigThreadGroup(mode, nthreads, threads_.get());


Again, combine threading::ConfigThreadGroup and SetAffinity into

// rename num_workers_used_ -> num_active_workers_; num_active_workers_ = threads_->Configure(exclude_worker0, mode, nthreads);

Change the constructor of ThreadGroup to be simple form, and call threads_->Configure(exclude_worker0, mode, num_workers_); in the constructor

tqchen · 2018-07-10T05:16:43Z

src/runtime/threading_backend.cc

+    min_count_ = min_count;
+  }
+
+  bool AffinityOrderSet() {


this function is no really necessary

tqchen · 2018-07-10T05:17:56Z

src/runtime/threading_backend.cc

@@ -147,51 +174,55 @@ int MaxConcurrency() {
  return std::max(max_concurrency, 1);
 }

-
-unsigned int configThreadGroup(int mode, int nthreads, std::vector<unsigned int> *sorted_order) {
+unsigned int ConfigThreadGroup(int mode, int nthreads, ThreadGroup *thread_group) {


Move this function into ThreadGroup::Impl

tqchen · 2018-07-10T05:18:21Z

src/runtime/threading_backend.cc

+bool ThreadGroup::AffinityOrderSet() {
+  return impl_->AffinityOrderSet();
+}
+int ThreadGroup::GetPrefCount(bool reverse) {


remove this function, merge most logic into ThreadGroup::Impl

tqchen · 2018-07-10T05:18:29Z

src/runtime/threading_backend.cc

+void ThreadGroup::SetAffinityOrder(std::vector<unsigned int> order, int max_count, int min_count) {
+  impl_->SetAffinityOrder(order, max_count, min_count);
+}
+bool ThreadGroup::AffinityOrderSet() {


remove this function, merge most logic into ThreadGroup::Impl

tqchen · 2018-07-10T05:18:32Z

src/runtime/threading_backend.cc

+void ThreadGroup::SetAffinity(bool exclude_worker0, bool reverse) {
+  impl_->SetAffinity(exclude_worker0, reverse);
+}
+void ThreadGroup::SetAffinityOrder(std::vector<unsigned int> order, int max_count, int min_count) {


remove this function, merge most logic into ThreadGroup::Impl

tqchen · 2018-07-10T05:18:36Z

src/runtime/threading_backend.cc

-void ThreadGroup::SetAffinity(bool exclude_worker0, const std::vector<unsigned int> *order,
-                              bool reverse) {
-  impl_->SetAffinity(exclude_worker0, order, reverse);
+void ThreadGroup::SetAffinity(bool exclude_worker0, bool reverse) {


remove this function, merge most logic into ThreadGroup::Impl

tqchen · 2018-07-10T05:19:37Z

src/runtime/threading_backend.cc

@@ -66,7 +66,16 @@ class ThreadGroup::Impl {
 #endif
 #if defined(__linux__) || defined(__ANDROID__)


lazily get the sorted_order here

tqchen · 2018-07-10T05:22:30Z

include/tvm/runtime/threading_backend.h

+    */
+  void SetAffinity(bool exclude_worker0, bool reverse = false);
+
+  /*!


Combine all the functions into one function

int ThreadGroup::Configure(bool exlude_worker0, int mode, int nthreads); calls into // put all the logics of the functions into this function int ThreadGroup::Impl::Configure(bool exlude_worker0, int mode, int nthreads);

tqchen · 2018-07-10T05:23:16Z

src/runtime/thread_pool.cc

@@ -297,6 +300,15 @@ class ThreadPool {
    return dmlc::ThreadLocalStore<ThreadPool>::Get();
  }

+  void UpdateWorkerConfig(int mode, int nthreads) {
+    unsigned int num_workers_used = threading::ConfigThreadGroup(mode, nthreads, threads_.get());


change to

num_active_worker_ = threads_->Configure(exclude_worker0_, mode, nthreads);

tqchen · 2018-07-10T05:27:27Z

src/runtime/threading_backend.cc

  // bind worker threads to disjoint cores
  // if worker 0 is offloaded to master, i.e. exclude_worker0 is true,
  // the master thread is bound to core 0.
-  void SetAffinity(bool exclude_worker0) {
+  void SetAffinity(bool exclude_worker0, bool reverse = false) {


Merge all logic into Configure, and you can directly call Configure in the constructor of ThreadGroup

tqchen · 2018-07-10T21:14:37Z

include/tvm/runtime/threading_backend.h

+   *
+   * \return The number of workers to use.
+   */
+  unsigned int ConfigThreadGroup(int mode, int nthreads, bool exclude_worker0);


return int is fine as we use int to store number of active workers.

Because it is already member function of ThreadGroup, let us just rename it to Configure

tqchen · 2018-07-10T21:15:37Z

src/runtime/thread_pool.cc

@@ -297,6 +300,14 @@ class ThreadPool {
    return dmlc::ThreadLocalStore<ThreadPool>::Get();
  }

+  void UpdateWorkerConfig(int mode, int nthreads) {
+    // this will also reset the affinity of the ThreadGroup
+    unsigned int num_workers_used = threads_->ConfigThreadGroup(mode, nthreads,


just do the assignment directly

tqchen · 2018-07-10T21:16:28Z

src/runtime/threading_backend.cc

@@ -45,11 +46,71 @@ class ThreadGroup::Impl {
    }
  }

+  unsigned int ConfigThreadGroup(int mode, int nthreads, bool exclude_worker0) {
+    unsigned int threads = std::thread::hardware_concurrency();
+    std::vector<std::pair <unsigned int, int64_t>> max_freqs;


int64_t>> -> int64_t> > (add space) compatibility with some of the old compilers.

tqchen · 2018-07-10T21:17:00Z

src/runtime/threading_backend.cc

@@ -45,11 +46,71 @@ class ThreadGroup::Impl {
    }
  }

+  unsigned int ConfigThreadGroup(int mode, int nthreads, bool exclude_worker0) {
+    unsigned int threads = std::thread::hardware_concurrency();


Call MaxConcurrency() because some special rule might apply here

tqchen · 2018-07-10T21:18:07Z

src/runtime/threading_backend.cc

+              ifs.close();
+              max_freqs.push_back(std::make_pair(i, cur_freq));
+              if (cur_freq < 0) {
+                LOG(WARNING) << "failed to read CPU max frequency!";


remove warning and assume the file does not exist

tqchen · 2018-07-10T21:20:32Z

src/runtime/threading_backend.cc

+    if (nthreads)
+      num_workers_used = nthreads;
+    // use default
+    if (!num_workers_used)


move to the previous part

tqchen · 2018-07-10T21:22:07Z

src/runtime/threading_backend.cc

@@ -66,7 +127,16 @@ class ThreadGroup::Impl {
 #endif
 #if defined(__linux__) || defined(__ANDROID__)
    for (unsigned i = 0; i < threads_.size(); ++i) {
-      unsigned core_id = i + exclude_worker0;
+      unsigned core_id;
+      if (sorted_order_.size() >= threads_.size()) {


CHECK_EQ(sorted_order_.size(), threads_.size());

// always populates sorted_order_, to be the same as threads, to make simplified assumptions

tqchen · 2018-07-10T21:23:08Z

src/runtime/threading_backend.cc

+              }
+            } else {
+              LOG(WARNING) << "failed to read CPU max frequency!";
+              break;


always continue pushing default value

tqchen · 2018-07-10T21:24:35Z

src/runtime/threading_backend.cc

+    if (mode) {
+      if (sorted_order_.empty()) {
+          for (unsigned int i = 0; i < threads; ++i) {
+            std::ostringstream filepath;


#if defined(_WIN32)
// default value
cur_freq = 0;
#else
the file logic
#endif

tqchen · 2018-07-10T21:26:15Z

src/runtime/threading_backend.cc

+              break;
+            }
+          }
+        auto max = [] (std::pair<unsigned int, int64_t> a, std::pair<unsigned int, int64_t> b) {


max-> fcmpbyfreq.

When the freq equals, sort by id(make things stable)

tqchen · 2018-07-10T21:31:17Z

src/runtime/threading_backend.cc

+        for (auto it = max_freqs.begin(); it != max_freqs.end(); it++) {
+            sorted_order_.push_back(it->first);
+            if (max_freq == it->second) {
+              max_count_++;


consider do a check here that if max_count_ + min_count_ < num_cores, shoot a warning saying more than three frequencies are detected.

tqchen · 2018-07-11T00:22:04Z

include/tvm/runtime/threading_backend.h

+   *
+   * \return The number of workers to use.
+   */
+  int Config(int mode, int nthreads, bool exclude_worker0);


Config->Confgure

tqchen · 2018-07-11T00:22:13Z

src/runtime/sgx/trusted/threading_backend.cc

@@ -53,6 +53,13 @@ ThreadGroup::ThreadGroup(int num_workers,
                         bool exclude_worker0)
  : impl_(new ThreadGroup::Impl(num_workers, worker_callback, exclude_worker0)) {}
 void ThreadGroup::Join() {}
+unsigned int ThreadGroup::ConfigThreadGroup(int mode, int nthreads, bool exclude_worker0) {


tqchen · 2018-07-11T00:22:27Z

src/runtime/thread_pool.cc

@@ -297,6 +301,13 @@ class ThreadPool {
    return dmlc::ThreadLocalStore<ThreadPool>::Get();
  }

+  void UpdateWorkerConfig(int mode, int nthreads) {


tqchen · 2018-07-11T00:22:53Z

src/runtime/threading_backend.cc

+    if (mode) {
+      if (sorted_order_.empty()) {
+        for (unsigned int i = 0; i < threads; ++i) {
+          int64_t cur_freq = -1;


default to 0?

tqchen · 2018-07-11T00:24:12Z

src/runtime/threading_backend.cc

+      num_workers_used = threading::MaxConcurrency();
+    }
+    // if a specific number was given, use that
+    if (nthreads)


enclose with {}

tqchen · 2018-07-11T00:26:10Z

src/runtime/threading_backend.cc

+            filepath << "/sys/devices/system/cpu/cpu"  << i << "/cpufreq/cpuinfo_max_freq";
+            std::ifstream ifs(filepath.str());
+            if (!ifs.fail()) {
+              ifs >> cur_freq;


if (!(ifs >> curr_freq)) {
curr_freq = -1;
}

tqchen · 2018-07-11T00:26:28Z

src/runtime/threading_backend.cc

@@ -7,6 +7,7 @@
 #include <dmlc/logging.h>
 #include <thread>
 #include <algorithm>
+#include <fstream>


only include fstream in linux and android

tqchen

some final comments, we are almost good to merge

tqchen · 2018-07-11T02:32:14Z

src/runtime/sgx/trusted/threading_backend.cc

@@ -53,6 +53,13 @@ ThreadGroup::ThreadGroup(int num_workers,
                         bool exclude_worker0)
  : impl_(new ThreadGroup::Impl(num_workers, worker_callback, exclude_worker0)) {}
 void ThreadGroup::Join() {}
+unsigned int ThreadGroup::Configure(int mode, int nthreads, bool exclude_worker0) {


new line between functions

tqchen · 2018-07-11T02:32:29Z

src/runtime/sgx/trusted/threading_backend.cc

@@ -53,6 +53,13 @@ ThreadGroup::ThreadGroup(int num_workers,
                         bool exclude_worker0)
  : impl_(new ThreadGroup::Impl(num_workers, worker_callback, exclude_worker0)) {}
 void ThreadGroup::Join() {}
+unsigned int ThreadGroup::Configure(int mode, int nthreads, bool exclude_worker0) {


we can just make return type as int

tqchen · 2018-07-11T02:33:45Z

src/runtime/threading_backend.cc

+    std::vector<std::pair <unsigned int, int64_t> > max_freqs;
+
+    // big or LITTLE
+    if (mode) {


if mode guard can be ignored

tqchen · 2018-07-11T02:35:41Z

src/runtime/threading_backend.cc

@@ -66,7 +138,16 @@ class ThreadGroup::Impl {
 #endif
 #if defined(__linux__) || defined(__ANDROID__)
    for (unsigned i = 0; i < threads_.size(); ++i) {
-      unsigned core_id = i + exclude_worker0;
+      unsigned core_id;
+      if (sorted_order_.size() >= threads_.size()) {


again this seems to be not necessary. Can we simply do CHECK_EQ(sorted_order_.size(), threads_.size()); and always produce sorted_ order before call SetAffinity? Possibly do sorted order initialization in the construction phase.

This will likely simplify this function's logic.

Ok, moved the initialization to the constuctor. To preserve the old TVM_BIND_THREADS behavior, the branch in the ThreadGroup constructor is moved to ThreadPool in case Configure->SetAffinity is called so it can get the correct number of workers for the new default (prefer big).

tqchen

one final comment

tqchen · 2018-07-11T05:00:57Z

src/runtime/threading_backend.cc

+      max_freqs.push_back(std::make_pair(i, cur_freq));
+    }
+
+    auto fcmpbyfreq = [] (std::pair<unsigned int, int64_t> a,


prefer const reference: const std::pair<unsigned int, int64_t>&

tqchen

sorry, more things to be fixed

tqchen · 2018-07-11T05:10:41Z

src/runtime/threading_backend.cc

@@ -92,8 +116,57 @@ class ThreadGroup::Impl {
 #endif
  }

+  void InitSortedOrder() {
+    unsigned int threads = threading::MaxConcurrency();


Some followups: we might need to remodel MaxConcurrency a bit:

Note that the number of workers can indeed be different from the number of cores. The freq-query will always need to query all the CPU info(using hardware concurrency without considering env variable)

tqchen · 2018-07-11T05:12:17Z

src/runtime/threading_backend.cc

@@ -26,16 +30,7 @@ class ThreadGroup::Impl {
    for (int i = exclude_worker0; i < num_workers_; ++i) {
      threads_.emplace_back([worker_callback, i] { worker_callback(i); });
    }
-    const char *val = getenv("TVM_BIND_THREADS");


TVM_BIND_THREADS is an env variable asking whether we should set affinity(or let threads run as they are). We should take this into consideration.

When TVM_BIND_THREADS is defined and it is 0, we should not do SetAffinity at all, and configure have no effect other than return the corresponding value

Not sure what should be done here, as currently this branch is moved to ThreadPool; the behavior should be close to what it was before (now Configure is not called so SetAffinity is not called either, and the default num_workers_ is used).

Currently CHECK_EQ will break in the case of x86 with hyperthreading when MaxConcurrency reports #logicalcores/2 and std::hardware_concurrency gives #logicalcores.

tqchen · 2018-07-12T00:23:39Z

src/runtime/sgx/trusted/threading_backend.cc

@@ -53,6 +53,13 @@ ThreadGroup::ThreadGroup(int num_workers,
                         bool exclude_worker0)
  : impl_(new ThreadGroup::Impl(num_workers, worker_callback, exclude_worker0)) {}
 void ThreadGroup::Join() {}
+int ThreadGroup::Configure(int mode, int nthreads, bool exclude_worker0) {
+  unsigned int max_conc = (unsigned int) MaxConcurrency();


we can just use int here

tqchen · 2018-07-12T01:32:57Z

src/runtime/threading_backend.cc

+      max_freqs.push_back(std::make_pair(i, cur_freq));
+    }
+
+    auto fcmpbyfreq = [] (std::pair<unsigned int, int64_t> &a,


do const reference instead of reference in here.

tqchen · 2018-07-12T01:34:37Z

src/runtime/threading_backend.cc

@@ -65,8 +87,18 @@ class ThreadGroup::Impl {
 #endif
 #endif
 #if defined(__linux__) || defined(__ANDROID__)
+    CHECK_LE(threads_.size() + exclude_worker0, sorted_order_.size());
+    if (sorted_order_.size() != threads_.size()) {
+      LOG(WARNING) << "setting affinity with subset of threads!";


this warning seems can be a bit frequent, instead of this, do

Do CHECK_GE(sorted_order_.size(), threads_.size());

tqchen · 2018-07-12T01:36:14Z

src/runtime/threading_backend.cc

+    // if a specific number was given, use that
+    if (nthreads) {
+      num_workers_used = nthreads;
+    }
    const char *val = getenv("TVM_BIND_THREADS");
    if (val == nullptr || atoi(val) == 1) {


// Skip if sorted_order.size() is bigger than threads_.size()
if (sorted_order_.size() > threads_.size() || not_bind) skip set affinity.

tqchen · 2018-07-12T19:04:39Z

cc @nhynes @yidawang

nhynes

re: SGX, this change won't affect threading since the trusted threading backend is basically a no-op that shells out to the (herein improved) untrusted backend

nhynes · 2018-07-12T20:03:47Z

src/runtime/sgx/trusted/threading_backend.cc

@@ -53,6 +53,13 @@ ThreadGroup::ThreadGroup(int num_workers,
                         bool exclude_worker0)
  : impl_(new ThreadGroup::Impl(num_workers, worker_callback, exclude_worker0)) {}
 void ThreadGroup::Join() {}
+int ThreadGroup::Configure(int mode, int nthreads, bool exclude_worker0) {
+  int max_conc = MaxConcurrency();
+  if (!nthreads || ntheads > MaxConcurrency()) {


if (!nthreads || ntheads > max_conc) {

nhynes · 2018-07-12T20:07:20Z

src/runtime/threading_backend.cc

+        return a.first < b.first;
+      } else {
+        return a.second > b.second;
+      }


return a.second == b.second ? a.first < b.first : a.second > b.second for brevity

nhynes · 2018-07-12T20:08:31Z

src/runtime/threading_backend.cc

+          }
+          ifs.close();
+        }
+      #else


is this empty #else necessary?

nhynes · 2018-07-12T20:12:11Z

include/tvm/runtime/threading_backend.h

+  /*!
+   * \brief configure the CPU id affinity
+   *
+   * \param mode The preferred CPU type (1 = big, -1 = little).


please use an enum for these AFFINITY_MODE_(BIG|LITTLE) to take advantage of static checking

nhynes · 2018-07-12T20:13:07Z

src/runtime/threading_backend.cc

+  }
+
+  int Configure(int mode, int nthreads, bool exclude_worker0) {
+    int num_workers_used = 0;


set int num_workers_used = nthreads and remove the later if block

tqchen · 2018-07-12T22:27:15Z

@eqy please rebase against the master
@nhynes please https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

yidawang

LGTM. Thanks for the contribution.

…hing

kaishijeng · 2018-07-13T17:54:21Z

@eqy

How do I configure affinity in python script? I have rk3399 board.
Thanks

tqchen · 2018-07-13T18:42:30Z

@kaishijeng it is better to ask a question in the https://discuss.tvm.ai/

feng98 · 2018-08-03T14:19:21Z

src/runtime/threading_backend.cc

  }

 private:
  // bind worker threads to disjoint cores
  // if worker 0 is offloaded to master, i.e. exclude_worker0 is true,
  // the master thread is bound to core 0.
-  void SetAffinity(bool exclude_worker0) {
+  void SetAffinity(bool exclude_worker0, bool reverse = false) {


why add reverse? @eqy @sanallen

tqchen requested changes Jul 9, 2018

View reviewed changes

tqchen added status: review in progress status: need update need update based on feedbacks labels Jul 9, 2018

tqchen requested changes Jul 10, 2018

View reviewed changes

tqchen requested changes Jul 11, 2018

View reviewed changes

tqchen requested changes Jul 12, 2018

View reviewed changes

nhynes suggested changes Jul 12, 2018

View reviewed changes

tqchen approved these changes Jul 12, 2018

View reviewed changes

yidawang approved these changes Jul 12, 2018

View reviewed changes

eqy added 7 commits July 12, 2018 22:31

checkin wip

5d93268

uncomment

2c30035

cleanup

298c41e

cleanup

c7a5682

fix lint

5454bb6

fix lint

6972da0

move config thread group to threading_backend, support affinity switc…

66961eb

…hing

eqy added 16 commits July 12, 2018 22:31

fix

94b993c

fix lint

9c24212

update

ee51207

fix lint

e84a583

move sorted order init to constructor

2e4bac1

sort via reference

0c75a04

use hardwae_concurrency

4546ff2

try temp fix for hyperthreading with set affinity

ba78683

fix for identical cpu cores

01842b7

debug

2ee2cba

do not use hardware_concurrency for worker count

a62f6de

use min of MaxConcurrency, returned number of workers to use

189873a

fix type

3dd83ae

change some logging and conditionals

debd958

fixes for comments, use enum for mode

49a2288

fix lint

b7eea06

eqy force-pushed the cpu-affinity branch from 6e551ef to b7eea06 Compare July 12, 2018 22:31

nhynes approved these changes Jul 12, 2018

View reviewed changes

tqchen merged commit 95a97f3 into apache:master Jul 12, 2018

tqchen added status: accepted and removed status: need update need update based on feedbacks status: review in progress labels Jul 12, 2018

yidawang mentioned this pull request Jul 18, 2018

[Bug fix] Thread issue #1451

Closed

eqy mentioned this pull request Jul 18, 2018

use SetAffinity when logical cores > physical cores (hyperthreading) #1453

Merged

sanallen mentioned this pull request Aug 3, 2018

[RUNTIME]Add the flexibility of CPU affinity #1541

Closed

feng98 reviewed Aug 3, 2018

View reviewed changes

tqchen pushed a commit to tqchen/tvm that referenced this pull request Aug 4, 2018

[RUNTIME] Support setting CPU affinity (apache#1403)

7e7154f

sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018

[RUNTIME] Support setting CPU affinity (apache#1403)

5eb2ebe

		@@ -66,7 +66,16 @@ class ThreadGroup::Impl {
		#endif
		#if defined(__linux__) \|\| defined(__ANDROID__)

[RUNTIME] Support setting CPU affinity #1403

[RUNTIME] Support setting CPU affinity #1403

Conversation

eqy commented Jul 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jul 12, 2018

nhynes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eqy commented Jul 8, 2018 •

edited

Loading