Some C++ collections micromenchmark for vectors and lists.
C++ C Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Source
.gitignore
LICENSE
data.xlsx
readme.md
result-maps.png
results-graph.png
results2-table.png

readme.md

A C++ Collections Benchmark

Also a cross-platform CAtlPlex-like allocator.

The allocator is compatible with node-based STL collections (linked lists, RB trees, hash maps). For vectors, it'll revert to malloc.

If you only interested in the allocator not the benchmarks, grab the header files from the Source/PlexAlloc directory, that’s all you need. To use, #include "PlexAlloc/Allocator.hpp"

The current version only works with C++/17, however I don't expect much problems porting to C++/11.

Vectors and Lists

Inspired by this.

This benchmark tests performance of various collection insert and iterate operations.

Very limited scenario, I only ran each test once. I was using i5-4460 desktop with 16GB RAM. The Linux results are from WSL running on top of the same Windows, uname -r says 4.4.0-43-Microsoft, release says Ubuntu 16.04.2 LTS.

Windows build is made by Visual Studio 2015 update 3. Linux build by clang 4.0 with libc++ 4.0.1.

Only tested Release x64 configuration, tried to keep build options more or less equivalent (Linux version compiles to SSE2 instruction set).

Following test cases are implemented here:

  1. std::vector<int>
  2. std::list<int>
  3. std::list<int> with custom CAtlPlex-like allocator
  4. CAtlArray<int> (windows-only)
  5. CAtlList<int> (windows-only)

And here’s the results.

Following observation can be made from the data.

On Windows, the performance improvement from my custom allocator is huge.

Microsoft’s STL and/or C++ compiler optimizes for very small collections, like 100 elements only. For larger collections, clang with libc++ is much faster. At least for this particular test case.

Even on Linux where STL is faster, the custom allocator I’ve implemented helped a lot, especially the total time (which includes memory allocation and de-allocation).

The Benchmark, Red-Black Trees and Hash Maps

Finally, I've added Red-Black maps and hash maps to the mix. All of them is with int keys and int values.

Here’s the results.

As you see, for red-black “maps”, runtime performance isn’t affected much. The reason is, tests insert random values, so iterating them sequentially still involves a pointer chase. However allocation/deallocation became faster.

OTOH, hash maps benefit from plex allocator on both platforms.