Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-4079: [C++] Add machine benchmark #3225

Closed
wants to merge 1 commit into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Dec 19, 2018

Right now there is a single memory latency benchmark.

Its output looks like this, showing the different cache levels up to main memory
(this is on a CPU with 16 MB L3 cache):

------------------------------------------------------------------
Benchmark                           Time           CPU Iterations
------------------------------------------------------------------
BM_memory_latency/2048              2 ns          2 ns  406878405   548.706M items/s
BM_memory_latency/4096              2 ns          2 ns  395414303    557.74M items/s
BM_memory_latency/8192              2 ns          2 ns  394141916   560.264M items/s
BM_memory_latency/16384             2 ns          2 ns  401410292   535.202M items/s
BM_memory_latency/32768             2 ns          2 ns  381828811   525.377M items/s
BM_memory_latency/65536             4 ns          4 ns  189027575   262.929M items/s
BM_memory_latency/131072            5 ns          5 ns  150798287    209.01M items/s
BM_memory_latency/262144            5 ns          5 ns  129287045   185.606M items/s
BM_memory_latency/524288            7 ns          7 ns   96543517   132.663M items/s
BM_memory_latency/1048576          11 ns         11 ns   66380535   89.0397M items/s
BM_memory_latency/2097152          12 ns         12 ns   55003164   76.6384M items/s
BM_memory_latency/4194304          13 ns         13 ns   51559443   70.9488M items/s
BM_memory_latency/8388608          28 ns         28 ns   25813875   33.6881M items/s
BM_memory_latency/16777216         66 ns         66 ns   10463216   14.4577M items/s
BM_memory_latency/33554432         90 ns         90 ns    7743594   10.5434M items/s

@fsaintjacques
Copy link
Contributor

Microbenchmarks got refused by microbenchmarks.

indices[i] = i;
}
std::shuffle(indices.begin(), indices.end(), gen);
std::vector<int32_t> path(size, -999999);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indices is already a permutation, can you return it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because it's not a full path. For example, if it's {1,0,2,3}, then the path is 0->1->0, i.e. we're not spanning the whole memory area.

index = path[index];
}
benchmark::DoNotOptimize(total);
state.SetItemsProcessed(state.iterations());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put the number of bytes with SetBytesProcessed(iterations * size * niters / 4), we can then compare this with the maximum memory bandwidth of the motherboard/cpu.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not really interesting, though, because we're latency limited not bandwidth limited. The number to watch for is the number of nanoseconds per iteration.

Right now there is a single memory latency benchmark.

Its output looks like this, showing the different cache levels up to main memory
(this is on a CPU with 16 MB L3 cache):
```
------------------------------------------------------------------
Benchmark                           Time           CPU Iterations
------------------------------------------------------------------
BM_memory_latency/2048              2 ns          2 ns  406878405   548.706M items/s
BM_memory_latency/4096              2 ns          2 ns  395414303    557.74M items/s
BM_memory_latency/8192              2 ns          2 ns  394141916   560.264M items/s
BM_memory_latency/16384             2 ns          2 ns  401410292   535.202M items/s
BM_memory_latency/32768             2 ns          2 ns  381828811   525.377M items/s
BM_memory_latency/65536             4 ns          4 ns  189027575   262.929M items/s
BM_memory_latency/131072            5 ns          5 ns  150798287    209.01M items/s
BM_memory_latency/262144            5 ns          5 ns  129287045   185.606M items/s
BM_memory_latency/524288            7 ns          7 ns   96543517   132.663M items/s
BM_memory_latency/1048576          11 ns         11 ns   66380535   89.0397M items/s
BM_memory_latency/2097152          12 ns         12 ns   55003164   76.6384M items/s
BM_memory_latency/4194304          13 ns         13 ns   51559443   70.9488M items/s
BM_memory_latency/8388608          28 ns         28 ns   25813875   33.6881M items/s
BM_memory_latency/16777216         66 ns         66 ns   10463216   14.4577M items/s
BM_memory_latency/33554432         90 ns         90 ns    7743594   10.5434M items/s
```
@wesm wesm force-pushed the ARROW-4079-machine-benchmark branch from 3daa3af to 55f6de6 Compare December 20, 2018 01:12
@wesm
Copy link
Member

wesm commented Dec 20, 2018

Rebased

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. This is cool

Results for my mobile Xeon 3.7ghz

$ ./release/arrow-machine-benchmark 
2018-12-20 14:01:17
Running ./release/arrow-machine-benchmark
Run on (8 X 3700 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
------------------------------------------------------------------
Benchmark                           Time           CPU Iterations
------------------------------------------------------------------
BM_memory_latency/2048              2 ns          2 ns  456013614    611.84M items/s
BM_memory_latency/4096              2 ns          2 ns  439562057   603.102M items/s
BM_memory_latency/8192              2 ns          2 ns  450940722   608.893M items/s
BM_memory_latency/16384             2 ns          2 ns  428344241   616.558M items/s
BM_memory_latency/32768             2 ns          2 ns  443366563   600.546M items/s
BM_memory_latency/65536             3 ns          3 ns  233006238    321.66M items/s
BM_memory_latency/131072            3 ns          3 ns  213815059   310.851M items/s
BM_memory_latency/262144            5 ns          5 ns  147450485   202.559M items/s
BM_memory_latency/524288            8 ns          8 ns   79725620    116.44M items/s
BM_memory_latency/1048576          11 ns         11 ns   64569151   88.5241M items/s
BM_memory_latency/2097152          12 ns         12 ns   59470080   79.2464M items/s
BM_memory_latency/4194304          13 ns         13 ns   55843009   72.0153M items/s
BM_memory_latency/8388608          28 ns         28 ns   25313659   33.6475M items/s
BM_memory_latency/16777216         57 ns         57 ns   12726900   16.8747M items/s
BM_memory_latency/33554432         72 ns         72 ns    9643933   13.2208M items/s

@wesm wesm closed this in 398466e Dec 20, 2018
@pitrou pitrou deleted the ARROW-4079-machine-benchmark branch December 20, 2018 20:05
romainfrancois pushed a commit to romainfrancois/arrow that referenced this pull request Jan 3, 2019
Right now there is a single memory latency benchmark.

Its output looks like this, showing the different cache levels up to main memory
(this is on a CPU with 16 MB L3 cache):
```
------------------------------------------------------------------
Benchmark                           Time           CPU Iterations
------------------------------------------------------------------
BM_memory_latency/2048              2 ns          2 ns  406878405   548.706M items/s
BM_memory_latency/4096              2 ns          2 ns  395414303    557.74M items/s
BM_memory_latency/8192              2 ns          2 ns  394141916   560.264M items/s
BM_memory_latency/16384             2 ns          2 ns  401410292   535.202M items/s
BM_memory_latency/32768             2 ns          2 ns  381828811   525.377M items/s
BM_memory_latency/65536             4 ns          4 ns  189027575   262.929M items/s
BM_memory_latency/131072            5 ns          5 ns  150798287    209.01M items/s
BM_memory_latency/262144            5 ns          5 ns  129287045   185.606M items/s
BM_memory_latency/524288            7 ns          7 ns   96543517   132.663M items/s
BM_memory_latency/1048576          11 ns         11 ns   66380535   89.0397M items/s
BM_memory_latency/2097152          12 ns         12 ns   55003164   76.6384M items/s
BM_memory_latency/4194304          13 ns         13 ns   51559443   70.9488M items/s
BM_memory_latency/8388608          28 ns         28 ns   25813875   33.6881M items/s
BM_memory_latency/16777216         66 ns         66 ns   10463216   14.4577M items/s
BM_memory_latency/33554432         90 ns         90 ns    7743594   10.5434M items/s
```

Author: Antoine Pitrou <antoine@python.org>

Closes apache#3225 from pitrou/ARROW-4079-machine-benchmark and squashes the following commits:

55f6de6 <Antoine Pitrou> ARROW-4079:  Add machine benchmark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants