Skip to content

Commit fbd0263

Browse files
netoptimizertorvalds
authored andcommitted
slub: initial bulk free implementation
This implements SLUB specific kmem_cache_free_bulk(). SLUB allocator now both have bulk alloc and free implemented. Choose to reenable local IRQs while calling slowpath __slab_free(). In worst case, where all objects hit slowpath call, the performance should still be faster than fallback function __kmem_cache_free_bulk(), because local_irq_{disable+enable} is very fast (7-cycles), while the fallback invokes this_cpu_cmpxchg() which is slightly slower (9-cycles). Nitpicking, this should be faster for N>=4, due to the entry cost of local_irq_{disable+enable}. Do notice that the save+restore variant is very expensive, this is key to why this optimization works. CPU: i7-4790K CPU @ 4.00GHz * local_irq_{disable,enable}: 7 cycles(tsc) - 1.821 ns * local_irq_{save,restore} : 37 cycles(tsc) - 9.443 ns Measurements on CPU CPU i7-4790K @ 4.00GHz Baseline normal fastpath (alloc+free cost): 43 cycles(tsc) 10.834 ns Bulk- fallback - this-patch 1 - 58 cycles(tsc) 14.542 ns - 43 cycles(tsc) 10.811 ns improved 25.9% 2 - 50 cycles(tsc) 12.659 ns - 27 cycles(tsc) 6.867 ns improved 46.0% 3 - 48 cycles(tsc) 12.168 ns - 21 cycles(tsc) 5.496 ns improved 56.2% 4 - 47 cycles(tsc) 11.987 ns - 24 cycles(tsc) 6.038 ns improved 48.9% 8 - 46 cycles(tsc) 11.518 ns - 17 cycles(tsc) 4.280 ns improved 63.0% 16 - 45 cycles(tsc) 11.366 ns - 17 cycles(tsc) 4.483 ns improved 62.2% 30 - 45 cycles(tsc) 11.433 ns - 18 cycles(tsc) 4.531 ns improved 60.0% 32 - 75 cycles(tsc) 18.983 ns - 58 cycles(tsc) 14.586 ns improved 22.7% 34 - 71 cycles(tsc) 17.940 ns - 53 cycles(tsc) 13.391 ns improved 25.4% 48 - 80 cycles(tsc) 20.077 ns - 65 cycles(tsc) 16.268 ns improved 18.8% 64 - 71 cycles(tsc) 17.799 ns - 53 cycles(tsc) 13.440 ns improved 25.4% 128 - 91 cycles(tsc) 22.980 ns - 79 cycles(tsc) 19.899 ns improved 13.2% 158 - 100 cycles(tsc) 25.241 ns - 90 cycles(tsc) 22.732 ns improved 10.0% 250 - 102 cycles(tsc) 25.583 ns - 95 cycles(tsc) 23.916 ns improved 6.9% Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent ebe909e commit fbd0263

File tree

1 file changed

+33
-1
lines changed

1 file changed

+33
-1
lines changed

mm/slub.c

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2753,7 +2753,39 @@ EXPORT_SYMBOL(kmem_cache_free);
27532753
/* Note that interrupts must be enabled when calling this function. */
27542754
void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
27552755
{
2756-
__kmem_cache_free_bulk(s, size, p);
2756+
struct kmem_cache_cpu *c;
2757+
struct page *page;
2758+
int i;
2759+
2760+
/* Debugging fallback to generic bulk */
2761+
if (kmem_cache_debug(s))
2762+
return __kmem_cache_free_bulk(s, size, p);
2763+
2764+
local_irq_disable();
2765+
c = this_cpu_ptr(s->cpu_slab);
2766+
2767+
for (i = 0; i < size; i++) {
2768+
void *object = p[i];
2769+
2770+
BUG_ON(!object);
2771+
page = virt_to_head_page(object);
2772+
BUG_ON(s != page->slab_cache); /* Check if valid slab page */
2773+
2774+
if (c->page == page) {
2775+
/* Fastpath: local CPU free */
2776+
set_freepointer(s, object, c->freelist);
2777+
c->freelist = object;
2778+
} else {
2779+
c->tid = next_tid(c->tid);
2780+
local_irq_enable();
2781+
/* Slowpath: overhead locked cmpxchg_double_slab */
2782+
__slab_free(s, page, object, _RET_IP_);
2783+
local_irq_disable();
2784+
c = this_cpu_ptr(s->cpu_slab);
2785+
}
2786+
}
2787+
c->tid = next_tid(c->tid);
2788+
local_irq_enable();
27572789
}
27582790
EXPORT_SYMBOL(kmem_cache_free_bulk);
27592791

0 commit comments

Comments
 (0)