Skip to content

Commit 810507f

Browse files
Waiman-LongIngo Molnar
authored andcommitted
locking/lockdep: Reuse freed chain_hlocks entries
Once a lock class is zapped, all the lock chains that include the zapped class are essentially useless. The lock_chain structure itself can be reused, but not the corresponding chain_hlocks[] entries. Over time, we will run out of chain_hlocks entries while there are still plenty of other lockdep array entries available. To fix this imbalance, we have to make chain_hlocks entries reusable just like the others. As the freed chain_hlocks entries are in blocks of various lengths. A simple bitmap like the one used in the other reusable lockdep arrays isn't applicable. Instead the chain_hlocks entries are put into bucketed lists (MAX_CHAIN_BUCKETS) of chain blocks. Bucket 0 is the variable size bucket which houses chain blocks of size larger than MAX_CHAIN_BUCKETS sorted in decreasing size order. Initially, the whole array is in one chain block (the primordial chain block) in bucket 0. The minimum size of a chain block is 2 chain_hlocks entries. That will be the minimum allocation size. In other word, allocation requests for one chain_hlocks entry will cause 2-entry block to be returned and hence 1 entry will be wasted. Allocation requests for the chain_hlocks are fulfilled first by looking for chain block of matching size. If not found, the first chain block from bucket[0] (the largest one) is split. That can cause hlock entries fragmentation and reduce allocation efficiency if a chain block of size > MAX_CHAIN_BUCKETS is ever zapped and put back to after the primordial chain block. So the MAX_CHAIN_BUCKETS must be large enough that this should seldom happen. By reusing the chain_hlocks entries, we are able to handle workloads that add and zap a lot of lock classes without the risk of running out of chain_hlocks entries as long as the total number of outstanding lock classes at any time remain within a reasonable limit. Two new tracking counters, nr_free_chain_hlocks & nr_large_chain_blocks, are added to track the total number of chain_hlocks entries in the free bucketed lists and the number of large chain blocks in buckets[0] respectively. The nr_free_chain_hlocks replaces nr_chain_hlocks. The nr_large_chain_blocks counter enables to see if we should increase the number of buckets (MAX_CHAIN_BUCKETS) available so as to avoid to avoid the fragmentation problem in bucket[0]. An internal nfsd test that ran for more than an hour and kept on loading and unloading kernel modules could cause the following message to be displayed. [ 4318.443670] BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low! The patched kernel was able to complete the test with a lot of free chain_hlocks entries to spare: # cat /proc/lockdep_stats : dependency chains: 18867 [max: 65536] dependency chain hlocks: 74926 [max: 327680] dependency chain hlocks lost: 0 : zapped classes: 1541 zapped lock chains: 56765 large chain blocks: 1 By changing MAX_CHAIN_BUCKETS to 3 and add a counter for the size of the largest chain block. The system still worked and We got the following lockdep_stats data: dependency chains: 18601 [max: 65536] dependency chain hlocks used: 73133 [max: 327680] dependency chain hlocks lost: 0 : zapped classes: 1541 zapped lock chains: 56702 large chain blocks: 45165 large chain block size: 20165 By running the test again, I was indeed able to cause chain_hlocks entries to get lost: dependency chain hlocks used: 74806 [max: 327680] dependency chain hlocks lost: 575 : large chain blocks: 48737 large chain block size: 7 Due to the fragmentation, it is possible that the "MAX_LOCKDEP_CHAIN_HLOCKS too low!" error can happen even if a lot of of chain_hlocks entries appear to be free. Fortunately, a MAX_CHAIN_BUCKETS value of 16 should be big enough that few variable sized chain blocks, other than the initial one, should ever be present in bucket 0. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200206152408.24165-7-longman@redhat.com
1 parent 797b82e commit 810507f

File tree

3 files changed

+255
-15
lines changed

3 files changed

+255
-15
lines changed

kernel/locking/lockdep.c

Lines changed: 243 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1071,13 +1071,15 @@ static inline void check_data_structures(void) { }
10711071

10721072
#endif /* CONFIG_DEBUG_LOCKDEP */
10731073

1074+
static void init_chain_block_buckets(void);
1075+
10741076
/*
10751077
* Initialize the lock_classes[] array elements, the free_lock_classes list
10761078
* and also the delayed_free structure.
10771079
*/
10781080
static void init_data_structures_once(void)
10791081
{
1080-
static bool ds_initialized, rcu_head_initialized;
1082+
static bool __read_mostly ds_initialized, rcu_head_initialized;
10811083
int i;
10821084

10831085
if (likely(rcu_head_initialized))
@@ -1101,6 +1103,7 @@ static void init_data_structures_once(void)
11011103
INIT_LIST_HEAD(&lock_classes[i].locks_after);
11021104
INIT_LIST_HEAD(&lock_classes[i].locks_before);
11031105
}
1106+
init_chain_block_buckets();
11041107
}
11051108

11061109
static inline struct hlist_head *keyhashentry(const struct lock_class_key *key)
@@ -2627,7 +2630,233 @@ struct lock_chain lock_chains[MAX_LOCKDEP_CHAINS];
26272630
static DECLARE_BITMAP(lock_chains_in_use, MAX_LOCKDEP_CHAINS);
26282631
static u16 chain_hlocks[MAX_LOCKDEP_CHAIN_HLOCKS];
26292632
unsigned long nr_zapped_lock_chains;
2630-
unsigned int nr_chain_hlocks;
2633+
unsigned int nr_free_chain_hlocks; /* Free chain_hlocks in buckets */
2634+
unsigned int nr_lost_chain_hlocks; /* Lost chain_hlocks */
2635+
unsigned int nr_large_chain_blocks; /* size > MAX_CHAIN_BUCKETS */
2636+
2637+
/*
2638+
* The first 2 chain_hlocks entries in the chain block in the bucket
2639+
* list contains the following meta data:
2640+
*
2641+
* entry[0]:
2642+
* Bit 15 - always set to 1 (it is not a class index)
2643+
* Bits 0-14 - upper 15 bits of the next block index
2644+
* entry[1] - lower 16 bits of next block index
2645+
*
2646+
* A next block index of all 1 bits means it is the end of the list.
2647+
*
2648+
* On the unsized bucket (bucket-0), the 3rd and 4th entries contain
2649+
* the chain block size:
2650+
*
2651+
* entry[2] - upper 16 bits of the chain block size
2652+
* entry[3] - lower 16 bits of the chain block size
2653+
*/
2654+
#define MAX_CHAIN_BUCKETS 16
2655+
#define CHAIN_BLK_FLAG (1U << 15)
2656+
#define CHAIN_BLK_LIST_END 0xFFFFU
2657+
2658+
static int chain_block_buckets[MAX_CHAIN_BUCKETS];
2659+
2660+
static inline int size_to_bucket(int size)
2661+
{
2662+
if (size > MAX_CHAIN_BUCKETS)
2663+
return 0;
2664+
2665+
return size - 1;
2666+
}
2667+
2668+
/*
2669+
* Iterate all the chain blocks in a bucket.
2670+
*/
2671+
#define for_each_chain_block(bucket, prev, curr) \
2672+
for ((prev) = -1, (curr) = chain_block_buckets[bucket]; \
2673+
(curr) >= 0; \
2674+
(prev) = (curr), (curr) = chain_block_next(curr))
2675+
2676+
/*
2677+
* next block or -1
2678+
*/
2679+
static inline int chain_block_next(int offset)
2680+
{
2681+
int next = chain_hlocks[offset];
2682+
2683+
WARN_ON_ONCE(!(next & CHAIN_BLK_FLAG));
2684+
2685+
if (next == CHAIN_BLK_LIST_END)
2686+
return -1;
2687+
2688+
next &= ~CHAIN_BLK_FLAG;
2689+
next <<= 16;
2690+
next |= chain_hlocks[offset + 1];
2691+
2692+
return next;
2693+
}
2694+
2695+
/*
2696+
* bucket-0 only
2697+
*/
2698+
static inline int chain_block_size(int offset)
2699+
{
2700+
return (chain_hlocks[offset + 2] << 16) | chain_hlocks[offset + 3];
2701+
}
2702+
2703+
static inline void init_chain_block(int offset, int next, int bucket, int size)
2704+
{
2705+
chain_hlocks[offset] = (next >> 16) | CHAIN_BLK_FLAG;
2706+
chain_hlocks[offset + 1] = (u16)next;
2707+
2708+
if (size && !bucket) {
2709+
chain_hlocks[offset + 2] = size >> 16;
2710+
chain_hlocks[offset + 3] = (u16)size;
2711+
}
2712+
}
2713+
2714+
static inline void add_chain_block(int offset, int size)
2715+
{
2716+
int bucket = size_to_bucket(size);
2717+
int next = chain_block_buckets[bucket];
2718+
int prev, curr;
2719+
2720+
if (unlikely(size < 2)) {
2721+
/*
2722+
* We can't store single entries on the freelist. Leak them.
2723+
*
2724+
* One possible way out would be to uniquely mark them, other
2725+
* than with CHAIN_BLK_FLAG, such that we can recover them when
2726+
* the block before it is re-added.
2727+
*/
2728+
if (size)
2729+
nr_lost_chain_hlocks++;
2730+
return;
2731+
}
2732+
2733+
nr_free_chain_hlocks += size;
2734+
if (!bucket) {
2735+
nr_large_chain_blocks++;
2736+
2737+
/*
2738+
* Variable sized, sort large to small.
2739+
*/
2740+
for_each_chain_block(0, prev, curr) {
2741+
if (size >= chain_block_size(curr))
2742+
break;
2743+
}
2744+
init_chain_block(offset, curr, 0, size);
2745+
if (prev < 0)
2746+
chain_block_buckets[0] = offset;
2747+
else
2748+
init_chain_block(prev, offset, 0, 0);
2749+
return;
2750+
}
2751+
/*
2752+
* Fixed size, add to head.
2753+
*/
2754+
init_chain_block(offset, next, bucket, size);
2755+
chain_block_buckets[bucket] = offset;
2756+
}
2757+
2758+
/*
2759+
* Only the first block in the list can be deleted.
2760+
*
2761+
* For the variable size bucket[0], the first block (the largest one) is
2762+
* returned, broken up and put back into the pool. So if a chain block of
2763+
* length > MAX_CHAIN_BUCKETS is ever used and zapped, it will just be
2764+
* queued up after the primordial chain block and never be used until the
2765+
* hlock entries in the primordial chain block is almost used up. That
2766+
* causes fragmentation and reduce allocation efficiency. That can be
2767+
* monitored by looking at the "large chain blocks" number in lockdep_stats.
2768+
*/
2769+
static inline void del_chain_block(int bucket, int size, int next)
2770+
{
2771+
nr_free_chain_hlocks -= size;
2772+
chain_block_buckets[bucket] = next;
2773+
2774+
if (!bucket)
2775+
nr_large_chain_blocks--;
2776+
}
2777+
2778+
static void init_chain_block_buckets(void)
2779+
{
2780+
int i;
2781+
2782+
for (i = 0; i < MAX_CHAIN_BUCKETS; i++)
2783+
chain_block_buckets[i] = -1;
2784+
2785+
add_chain_block(0, ARRAY_SIZE(chain_hlocks));
2786+
}
2787+
2788+
/*
2789+
* Return offset of a chain block of the right size or -1 if not found.
2790+
*
2791+
* Fairly simple worst-fit allocator with the addition of a number of size
2792+
* specific free lists.
2793+
*/
2794+
static int alloc_chain_hlocks(int req)
2795+
{
2796+
int bucket, curr, size;
2797+
2798+
/*
2799+
* We rely on the MSB to act as an escape bit to denote freelist
2800+
* pointers. Make sure this bit isn't set in 'normal' class_idx usage.
2801+
*/
2802+
BUILD_BUG_ON((MAX_LOCKDEP_KEYS-1) & CHAIN_BLK_FLAG);
2803+
2804+
init_data_structures_once();
2805+
2806+
if (nr_free_chain_hlocks < req)
2807+
return -1;
2808+
2809+
/*
2810+
* We require a minimum of 2 (u16) entries to encode a freelist
2811+
* 'pointer'.
2812+
*/
2813+
req = max(req, 2);
2814+
bucket = size_to_bucket(req);
2815+
curr = chain_block_buckets[bucket];
2816+
2817+
if (bucket) {
2818+
if (curr >= 0) {
2819+
del_chain_block(bucket, req, chain_block_next(curr));
2820+
return curr;
2821+
}
2822+
/* Try bucket 0 */
2823+
curr = chain_block_buckets[0];
2824+
}
2825+
2826+
/*
2827+
* The variable sized freelist is sorted by size; the first entry is
2828+
* the largest. Use it if it fits.
2829+
*/
2830+
if (curr >= 0) {
2831+
size = chain_block_size(curr);
2832+
if (likely(size >= req)) {
2833+
del_chain_block(0, size, chain_block_next(curr));
2834+
add_chain_block(curr + req, size - req);
2835+
return curr;
2836+
}
2837+
}
2838+
2839+
/*
2840+
* Last resort, split a block in a larger sized bucket.
2841+
*/
2842+
for (size = MAX_CHAIN_BUCKETS; size > req; size--) {
2843+
bucket = size_to_bucket(size);
2844+
curr = chain_block_buckets[bucket];
2845+
if (curr < 0)
2846+
continue;
2847+
2848+
del_chain_block(bucket, size, chain_block_next(curr));
2849+
add_chain_block(curr + req, size - req);
2850+
return curr;
2851+
}
2852+
2853+
return -1;
2854+
}
2855+
2856+
static inline void free_chain_hlocks(int base, int size)
2857+
{
2858+
add_chain_block(base, max(size, 2));
2859+
}
26312860

26322861
struct lock_class *lock_chain_get_class(struct lock_chain *chain, int i)
26332862
{
@@ -2828,15 +3057,8 @@ static inline int add_chain_cache(struct task_struct *curr,
28283057
BUILD_BUG_ON((1UL << 6) <= ARRAY_SIZE(curr->held_locks));
28293058
BUILD_BUG_ON((1UL << 8*sizeof(chain_hlocks[0])) <= ARRAY_SIZE(lock_classes));
28303059

2831-
if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) {
2832-
chain->base = nr_chain_hlocks;
2833-
for (j = 0; j < chain->depth - 1; j++, i++) {
2834-
int lock_id = curr->held_locks[i].class_idx;
2835-
chain_hlocks[chain->base + j] = lock_id;
2836-
}
2837-
chain_hlocks[chain->base + j] = class - lock_classes;
2838-
nr_chain_hlocks += chain->depth;
2839-
} else {
3060+
j = alloc_chain_hlocks(chain->depth);
3061+
if (j < 0) {
28403062
if (!debug_locks_off_graph_unlock())
28413063
return 0;
28423064

@@ -2845,6 +3067,13 @@ static inline int add_chain_cache(struct task_struct *curr,
28453067
return 0;
28463068
}
28473069

3070+
chain->base = j;
3071+
for (j = 0; j < chain->depth - 1; j++, i++) {
3072+
int lock_id = curr->held_locks[i].class_idx;
3073+
3074+
chain_hlocks[chain->base + j] = lock_id;
3075+
}
3076+
chain_hlocks[chain->base + j] = class - lock_classes;
28483077
hlist_add_head_rcu(&chain->entry, hash_head);
28493078
debug_atomic_inc(chain_lookup_misses);
28503079
inc_chains(chain->irq_context);
@@ -2991,6 +3220,8 @@ static inline int validate_chain(struct task_struct *curr,
29913220
{
29923221
return 1;
29933222
}
3223+
3224+
static void init_chain_block_buckets(void) { }
29943225
#endif /* CONFIG_PROVE_LOCKING */
29953226

29963227
/*
@@ -4788,6 +5019,7 @@ static void remove_class_from_lock_chain(struct pending_free *pf,
47885019
return;
47895020

47905021
free_lock_chain:
5022+
free_chain_hlocks(chain->base, chain->depth);
47915023
/* Overwrite the chain key for concurrent RCU readers. */
47925024
WRITE_ONCE(chain->chain_key, INITIAL_CHAIN_KEY);
47935025
dec_chains(chain->irq_context);

kernel/locking/lockdep_internals.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,9 @@ extern unsigned long nr_stack_trace_entries;
140140
extern unsigned int nr_hardirq_chains;
141141
extern unsigned int nr_softirq_chains;
142142
extern unsigned int nr_process_chains;
143-
extern unsigned int nr_chain_hlocks;
143+
extern unsigned int nr_free_chain_hlocks;
144+
extern unsigned int nr_lost_chain_hlocks;
145+
extern unsigned int nr_large_chain_blocks;
144146

145147
extern unsigned int max_lockdep_depth;
146148
extern unsigned int max_bfs_queue_depth;

kernel/locking/lockdep_proc.c

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ static int lc_show(struct seq_file *m, void *v)
137137
};
138138

139139
if (v == SEQ_START_TOKEN) {
140-
if (nr_chain_hlocks > MAX_LOCKDEP_CHAIN_HLOCKS)
140+
if (!nr_free_chain_hlocks)
141141
seq_printf(m, "(buggered) ");
142142
seq_printf(m, "all lock chains:\n");
143143
return 0;
@@ -278,8 +278,12 @@ static int lockdep_stats_show(struct seq_file *m, void *v)
278278
#ifdef CONFIG_PROVE_LOCKING
279279
seq_printf(m, " dependency chains: %11lu [max: %lu]\n",
280280
lock_chain_count(), MAX_LOCKDEP_CHAINS);
281-
seq_printf(m, " dependency chain hlocks: %11u [max: %lu]\n",
282-
nr_chain_hlocks, MAX_LOCKDEP_CHAIN_HLOCKS);
281+
seq_printf(m, " dependency chain hlocks used: %11lu [max: %lu]\n",
282+
MAX_LOCKDEP_CHAIN_HLOCKS -
283+
(nr_free_chain_hlocks + nr_lost_chain_hlocks),
284+
MAX_LOCKDEP_CHAIN_HLOCKS);
285+
seq_printf(m, " dependency chain hlocks lost: %11u\n",
286+
nr_lost_chain_hlocks);
283287
#endif
284288

285289
#ifdef CONFIG_TRACE_IRQFLAGS
@@ -352,6 +356,8 @@ static int lockdep_stats_show(struct seq_file *m, void *v)
352356
#ifdef CONFIG_PROVE_LOCKING
353357
seq_printf(m, " zapped lock chains: %11lu\n",
354358
nr_zapped_lock_chains);
359+
seq_printf(m, " large chain blocks: %11u\n",
360+
nr_large_chain_blocks);
355361
#endif
356362
return 0;
357363
}

0 commit comments

Comments
 (0)