Skip to content
Browse files

Documents esl_alloc benchmarking (in

  • Loading branch information...
cryptogenomicon committed Feb 12, 2017
1 parent ce5a0cf commit d2b3aed6cb6bef090f57ea3ecb55d9f955a095e7
Showing with 80 additions and 0 deletions.
  1. +80 −0
@@ -80,3 +80,83 @@ is working fine, but the unit test fails because `esl_alloc_aligned()`
doesn't work on some system (perhaps because of the unspeakable things
it does).
#### there is no realloc, by design
Aligned realloc() is a problem in general. There's no POSIX aligned
realloc counterpart for posix_memalign(), nor for C11 aligned_alloc(),
not for Intel _mm_malloc().
If we try to write our own realloc, we have a problem that the
reallocated unaligned pointer could formally have a different offset
$r$, so the system realloc() is not guaranteed to move our data
correctly. To be sure, we would have to copy our data *again* in the
correct alignment, and we would need to know the size of the data, not
just the pointer to it.
Instead, at least for now, we will avoid reallocating aligned memory
altogether; instead we will free() and do a fresh allocation. Thus we
can only do `_Reinit()` style functions that do not guarantee
preservation of data, not `_Resize()`, which assume that the data will
be preserved.
### benchmarking
Real time for -L 100, -N 10000: $10^6$ reallocations, so you can think
of these as $u$sec per reallocation.
**on Mac OS/X:** timings are essentially the same w/ gcc vs. clang:
_[11 Feb 17 on wumpus. 2.5Ghz Core i7, Mac OS/X 10.10.5 Yosemite, gcc 4.9.3, gcc -O3]_

| | M=5000 | M=500000 | M=5000000 |
| malloc/realloc | 0.159 | **10.480** | **5.009** |
| malloc/free/malloc | 0.136 | 0.482 | 0.897 |
| alloc_aligned_fallback | 0.139 | 0.641 | **26.394** |
| posix_memalign | 0.189 | 0.481 | 0.908 |

**on Linux:**
_[11 Feb 17 on ody eddyfs01. icc -O3]_

| | M=5000 | M=500000 | M=5000000 |
| malloc/realloc | 0.115 | **0.662** | **1.094** |
| malloc/free/malloc | 0.100 | 0.252 | 1.868 |
| alloc_aligned_fallback | 0.106 | 0.249 | 1.877 |
| posix_memalign | 0.206 | 0.366 | 1.944 |

#### dependence on allocation size isn't obvious

Timings go up and down as max allocation size M changes. Maybe what's
happening is that the system is treating different sizes with
different strategies.

#### realloc copies data, so it can be slow

In general, if you don't need data to be preserved, allocating fresh
memory (with free()/malloc()) may be faster than realloc(), because
realloc() copies data if it has to move the allocation. However, note
one example on Linux where realloc() is faster - perhaps because it's
smart enough to recognize cases where it doesn't need to expand an

#### easel's aligned alloc can be slow on OS/X

I ran the -M5000000 case under Instruments. It is spending all its
time in free(), in madvise(). Not sure why.

#### conclusion

* posix_memalign() is usually available and performs well.
* we'll design HMMER vector code to `_Reinit()` with fresh
allocations, rather than using reallocation. This may even
speed things up a small bit.
* the `madvise()` stall with the easel fallback code is puzzling
and worrying, though it only happens on MacOS, not Linux.

0 comments on commit d2b3aed

Please sign in to comment.
You can’t perform that action at this time.