ARROW-4079: [C++] Add machine benchmark #3225

pitrou · 2018-12-19T15:35:17Z

Right now there is a single memory latency benchmark.

Its output looks like this, showing the different cache levels up to main memory
(this is on a CPU with 16 MB L3 cache):

------------------------------------------------------------------
Benchmark                           Time           CPU Iterations
------------------------------------------------------------------
BM_memory_latency/2048              2 ns          2 ns  406878405   548.706M items/s
BM_memory_latency/4096              2 ns          2 ns  395414303    557.74M items/s
BM_memory_latency/8192              2 ns          2 ns  394141916   560.264M items/s
BM_memory_latency/16384             2 ns          2 ns  401410292   535.202M items/s
BM_memory_latency/32768             2 ns          2 ns  381828811   525.377M items/s
BM_memory_latency/65536             4 ns          4 ns  189027575   262.929M items/s
BM_memory_latency/131072            5 ns          5 ns  150798287    209.01M items/s
BM_memory_latency/262144            5 ns          5 ns  129287045   185.606M items/s
BM_memory_latency/524288            7 ns          7 ns   96543517   132.663M items/s
BM_memory_latency/1048576          11 ns         11 ns   66380535   89.0397M items/s
BM_memory_latency/2097152          12 ns         12 ns   55003164   76.6384M items/s
BM_memory_latency/4194304          13 ns         13 ns   51559443   70.9488M items/s
BM_memory_latency/8388608          28 ns         28 ns   25813875   33.6881M items/s
BM_memory_latency/16777216         66 ns         66 ns   10463216   14.4577M items/s
BM_memory_latency/33554432         90 ns         90 ns    7743594   10.5434M items/s

fsaintjacques · 2018-12-19T16:28:52Z

Microbenchmarks got refused by microbenchmarks.

fsaintjacques · 2018-12-19T18:09:39Z

cpp/src/arrow/util/machine-benchmark.cc

+    indices[i] = i;
+  }
+  std::shuffle(indices.begin(), indices.end(), gen);
+  std::vector<int32_t> path(size, -999999);


Indices is already a permutation, can you return it?

No, because it's not a full path. For example, if it's {1,0,2,3}, then the path is 0->1->0, i.e. we're not spanning the whole memory area.

fsaintjacques · 2018-12-19T18:19:52Z

cpp/src/arrow/util/machine-benchmark.cc

+    index = path[index];
+  }
+  benchmark::DoNotOptimize(total);
+  state.SetItemsProcessed(state.iterations());


I'd put the number of bytes with SetBytesProcessed(iterations * size * niters / 4), we can then compare this with the maximum memory bandwidth of the motherboard/cpu.

That's not really interesting, though, because we're latency limited not bandwidth limited. The number to watch for is the number of nanoseconds per iteration.

Right now there is a single memory latency benchmark. Its output looks like this, showing the different cache levels up to main memory (this is on a CPU with 16 MB L3 cache): ``` ------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------ BM_memory_latency/2048 2 ns 2 ns 406878405 548.706M items/s BM_memory_latency/4096 2 ns 2 ns 395414303 557.74M items/s BM_memory_latency/8192 2 ns 2 ns 394141916 560.264M items/s BM_memory_latency/16384 2 ns 2 ns 401410292 535.202M items/s BM_memory_latency/32768 2 ns 2 ns 381828811 525.377M items/s BM_memory_latency/65536 4 ns 4 ns 189027575 262.929M items/s BM_memory_latency/131072 5 ns 5 ns 150798287 209.01M items/s BM_memory_latency/262144 5 ns 5 ns 129287045 185.606M items/s BM_memory_latency/524288 7 ns 7 ns 96543517 132.663M items/s BM_memory_latency/1048576 11 ns 11 ns 66380535 89.0397M items/s BM_memory_latency/2097152 12 ns 12 ns 55003164 76.6384M items/s BM_memory_latency/4194304 13 ns 13 ns 51559443 70.9488M items/s BM_memory_latency/8388608 28 ns 28 ns 25813875 33.6881M items/s BM_memory_latency/16777216 66 ns 66 ns 10463216 14.4577M items/s BM_memory_latency/33554432 90 ns 90 ns 7743594 10.5434M items/s ```

wesm · 2018-12-20T01:12:20Z

Rebased

wesm

+1. This is cool

Results for my mobile Xeon 3.7ghz

$ ./release/arrow-machine-benchmark 
2018-12-20 14:01:17
Running ./release/arrow-machine-benchmark
Run on (8 X 3700 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
------------------------------------------------------------------
Benchmark                           Time           CPU Iterations
------------------------------------------------------------------
BM_memory_latency/2048              2 ns          2 ns  456013614    611.84M items/s
BM_memory_latency/4096              2 ns          2 ns  439562057   603.102M items/s
BM_memory_latency/8192              2 ns          2 ns  450940722   608.893M items/s
BM_memory_latency/16384             2 ns          2 ns  428344241   616.558M items/s
BM_memory_latency/32768             2 ns          2 ns  443366563   600.546M items/s
BM_memory_latency/65536             3 ns          3 ns  233006238    321.66M items/s
BM_memory_latency/131072            3 ns          3 ns  213815059   310.851M items/s
BM_memory_latency/262144            5 ns          5 ns  147450485   202.559M items/s
BM_memory_latency/524288            8 ns          8 ns   79725620    116.44M items/s
BM_memory_latency/1048576          11 ns         11 ns   64569151   88.5241M items/s
BM_memory_latency/2097152          12 ns         12 ns   59470080   79.2464M items/s
BM_memory_latency/4194304          13 ns         13 ns   55843009   72.0153M items/s
BM_memory_latency/8388608          28 ns         28 ns   25313659   33.6475M items/s
BM_memory_latency/16777216         57 ns         57 ns   12726900   16.8747M items/s
BM_memory_latency/33554432         72 ns         72 ns    9643933   13.2208M items/s

Right now there is a single memory latency benchmark. Its output looks like this, showing the different cache levels up to main memory (this is on a CPU with 16 MB L3 cache): ``` ------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------ BM_memory_latency/2048 2 ns 2 ns 406878405 548.706M items/s BM_memory_latency/4096 2 ns 2 ns 395414303 557.74M items/s BM_memory_latency/8192 2 ns 2 ns 394141916 560.264M items/s BM_memory_latency/16384 2 ns 2 ns 401410292 535.202M items/s BM_memory_latency/32768 2 ns 2 ns 381828811 525.377M items/s BM_memory_latency/65536 4 ns 4 ns 189027575 262.929M items/s BM_memory_latency/131072 5 ns 5 ns 150798287 209.01M items/s BM_memory_latency/262144 5 ns 5 ns 129287045 185.606M items/s BM_memory_latency/524288 7 ns 7 ns 96543517 132.663M items/s BM_memory_latency/1048576 11 ns 11 ns 66380535 89.0397M items/s BM_memory_latency/2097152 12 ns 12 ns 55003164 76.6384M items/s BM_memory_latency/4194304 13 ns 13 ns 51559443 70.9488M items/s BM_memory_latency/8388608 28 ns 28 ns 25813875 33.6881M items/s BM_memory_latency/16777216 66 ns 66 ns 10463216 14.4577M items/s BM_memory_latency/33554432 90 ns 90 ns 7743594 10.5434M items/s ``` Author: Antoine Pitrou <antoine@python.org> Closes apache#3225 from pitrou/ARROW-4079-machine-benchmark and squashes the following commits: 55f6de6 <Antoine Pitrou> ARROW-4079: Add machine benchmark

fsaintjacques reviewed Dec 19, 2018

View reviewed changes

wesm force-pushed the ARROW-4079-machine-benchmark branch from 3daa3af to 55f6de6 Compare December 20, 2018 01:12

wesm approved these changes Dec 20, 2018

View reviewed changes

wesm closed this in 398466e Dec 20, 2018

pitrou deleted the ARROW-4079-machine-benchmark branch December 20, 2018 20:05

asfimport mentioned this pull request Dec 20, 2018

[C++] Add machine benchmarks #20673

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-4079: [C++] Add machine benchmark #3225

ARROW-4079: [C++] Add machine benchmark #3225

pitrou commented Dec 19, 2018

fsaintjacques commented Dec 19, 2018

fsaintjacques Dec 19, 2018

pitrou Dec 19, 2018

fsaintjacques Dec 19, 2018

pitrou Dec 19, 2018

wesm commented Dec 20, 2018

wesm left a comment

ARROW-4079: [C++] Add machine benchmark #3225

ARROW-4079: [C++] Add machine benchmark #3225

Conversation

pitrou commented Dec 19, 2018

fsaintjacques commented Dec 19, 2018

fsaintjacques Dec 19, 2018

Choose a reason for hiding this comment

pitrou Dec 19, 2018

Choose a reason for hiding this comment

fsaintjacques Dec 19, 2018

Choose a reason for hiding this comment

pitrou Dec 19, 2018

Choose a reason for hiding this comment

wesm commented Dec 20, 2018

wesm left a comment

Choose a reason for hiding this comment