-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-4079: [C++] Add machine benchmark #3225
Conversation
Microbenchmarks got refused by microbenchmarks. |
indices[i] = i; | ||
} | ||
std::shuffle(indices.begin(), indices.end(), gen); | ||
std::vector<int32_t> path(size, -999999); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indices is already a permutation, can you return it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because it's not a full path. For example, if it's {1,0,2,3}
, then the path is 0->1->0
, i.e. we're not spanning the whole memory area.
index = path[index]; | ||
} | ||
benchmark::DoNotOptimize(total); | ||
state.SetItemsProcessed(state.iterations()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put the number of bytes with SetBytesProcessed(iterations * size * niters / 4)
, we can then compare this with the maximum memory bandwidth of the motherboard/cpu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not really interesting, though, because we're latency limited not bandwidth limited. The number to watch for is the number of nanoseconds per iteration.
Right now there is a single memory latency benchmark. Its output looks like this, showing the different cache levels up to main memory (this is on a CPU with 16 MB L3 cache): ``` ------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------ BM_memory_latency/2048 2 ns 2 ns 406878405 548.706M items/s BM_memory_latency/4096 2 ns 2 ns 395414303 557.74M items/s BM_memory_latency/8192 2 ns 2 ns 394141916 560.264M items/s BM_memory_latency/16384 2 ns 2 ns 401410292 535.202M items/s BM_memory_latency/32768 2 ns 2 ns 381828811 525.377M items/s BM_memory_latency/65536 4 ns 4 ns 189027575 262.929M items/s BM_memory_latency/131072 5 ns 5 ns 150798287 209.01M items/s BM_memory_latency/262144 5 ns 5 ns 129287045 185.606M items/s BM_memory_latency/524288 7 ns 7 ns 96543517 132.663M items/s BM_memory_latency/1048576 11 ns 11 ns 66380535 89.0397M items/s BM_memory_latency/2097152 12 ns 12 ns 55003164 76.6384M items/s BM_memory_latency/4194304 13 ns 13 ns 51559443 70.9488M items/s BM_memory_latency/8388608 28 ns 28 ns 25813875 33.6881M items/s BM_memory_latency/16777216 66 ns 66 ns 10463216 14.4577M items/s BM_memory_latency/33554432 90 ns 90 ns 7743594 10.5434M items/s ```
3daa3af
to
55f6de6
Compare
Rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. This is cool
Results for my mobile Xeon 3.7ghz
$ ./release/arrow-machine-benchmark
2018-12-20 14:01:17
Running ./release/arrow-machine-benchmark
Run on (8 X 3700 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x1)
------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------
BM_memory_latency/2048 2 ns 2 ns 456013614 611.84M items/s
BM_memory_latency/4096 2 ns 2 ns 439562057 603.102M items/s
BM_memory_latency/8192 2 ns 2 ns 450940722 608.893M items/s
BM_memory_latency/16384 2 ns 2 ns 428344241 616.558M items/s
BM_memory_latency/32768 2 ns 2 ns 443366563 600.546M items/s
BM_memory_latency/65536 3 ns 3 ns 233006238 321.66M items/s
BM_memory_latency/131072 3 ns 3 ns 213815059 310.851M items/s
BM_memory_latency/262144 5 ns 5 ns 147450485 202.559M items/s
BM_memory_latency/524288 8 ns 8 ns 79725620 116.44M items/s
BM_memory_latency/1048576 11 ns 11 ns 64569151 88.5241M items/s
BM_memory_latency/2097152 12 ns 12 ns 59470080 79.2464M items/s
BM_memory_latency/4194304 13 ns 13 ns 55843009 72.0153M items/s
BM_memory_latency/8388608 28 ns 28 ns 25313659 33.6475M items/s
BM_memory_latency/16777216 57 ns 57 ns 12726900 16.8747M items/s
BM_memory_latency/33554432 72 ns 72 ns 9643933 13.2208M items/s
Right now there is a single memory latency benchmark. Its output looks like this, showing the different cache levels up to main memory (this is on a CPU with 16 MB L3 cache): ``` ------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------ BM_memory_latency/2048 2 ns 2 ns 406878405 548.706M items/s BM_memory_latency/4096 2 ns 2 ns 395414303 557.74M items/s BM_memory_latency/8192 2 ns 2 ns 394141916 560.264M items/s BM_memory_latency/16384 2 ns 2 ns 401410292 535.202M items/s BM_memory_latency/32768 2 ns 2 ns 381828811 525.377M items/s BM_memory_latency/65536 4 ns 4 ns 189027575 262.929M items/s BM_memory_latency/131072 5 ns 5 ns 150798287 209.01M items/s BM_memory_latency/262144 5 ns 5 ns 129287045 185.606M items/s BM_memory_latency/524288 7 ns 7 ns 96543517 132.663M items/s BM_memory_latency/1048576 11 ns 11 ns 66380535 89.0397M items/s BM_memory_latency/2097152 12 ns 12 ns 55003164 76.6384M items/s BM_memory_latency/4194304 13 ns 13 ns 51559443 70.9488M items/s BM_memory_latency/8388608 28 ns 28 ns 25813875 33.6881M items/s BM_memory_latency/16777216 66 ns 66 ns 10463216 14.4577M items/s BM_memory_latency/33554432 90 ns 90 ns 7743594 10.5434M items/s ``` Author: Antoine Pitrou <antoine@python.org> Closes apache#3225 from pitrou/ARROW-4079-machine-benchmark and squashes the following commits: 55f6de6 <Antoine Pitrou> ARROW-4079: Add machine benchmark
Right now there is a single memory latency benchmark.
Its output looks like this, showing the different cache levels up to main memory
(this is on a CPU with 16 MB L3 cache):