Please sign in to comment.
libc - Add poor man's cache coloring optimization to nmalloc module.
* A series of large allocations in excess of 32KB will be offset by 4K from each other. This fixes performance issues on SandyBridge and later cpus related to large matrix operations. This eats an extra 4K of VM for such allocations but does not eat any additional real memory. * Greatly improves large FP matrix benchmarks. Real-world effects are more questionable. * The Sandybridge and later cpus use a virtually indexed, physically tagged L1 cache, and tend to be sensitive to substantially different memory addresses winding up on the same cache line. Matrix operations (primarily benchmarks) can cause these sorts of effects. Reported-by: alexh
- Loading branch information...