[scudo] Use a tryLock in secondary release to OS #132827

cferris1000 · 2025-03-24T21:05:13Z

In the caching part of the secondary path, when about to try to release memory to the OS, we always wait while acquiring the lock. However, if multiple threads are attempting this at the same time, all other threads will likely do nothing when the release call is made. Change the algorithm to skip the release if there is another release in process.

Also, pull the lock outside the releaseOlderThan function. This is so that in the store path, we use the tryLock and skip if another thread is releasing. But in the path where a forced release call is being made, that call will wait for release to complete which guarantees that all entries are released when requested.

In the caching part of the secondary path, when about to try to release memory to the OS, we always wait while acquiring the lock. However, if multiple threads are attempting this at the same time, all other threads will likely do nothing when the release call is made. Change the algorithm to skip the release if there is another release in process. Also, pull the lock outside the releaseOlderThan function. This is so that in the store path, we use the tryLock and skip if another thread is releasing. But in the path where a forced release call is being made, that call will wait for release to complete which guarantees that all entries are released when requested.

llvmbot · 2025-03-24T21:05:53Z

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Christopher Ferris (cferris1000)

Changes

In the caching part of the secondary path, when about to try to release memory to the OS, we always wait while acquiring the lock. However, if multiple threads are attempting this at the same time, all other threads will likely do nothing when the release call is made. Change the algorithm to skip the release if there is another release in process.

Also, pull the lock outside the releaseOlderThan function. This is so that in the store path, we use the tryLock and skip if another thread is releasing. But in the path where a forced release call is being made, that call will wait for release to complete which guarantees that all entries are released when requested.

Full diff: https://github.com/llvm/llvm-project/pull/132827.diff

1 Files Affected:

(modified) compiler-rt/lib/scudo/standalone/secondary.h (+24-10)

diff --git a/compiler-rt/lib/scudo/standalone/secondary.h b/compiler-rt/lib/scudo/standalone/secondary.h
index f3f91c40e7a03..286e5d332f57c 100644
--- a/compiler-rt/lib/scudo/standalone/secondary.h
+++ b/compiler-rt/lib/scudo/standalone/secondary.h
@@ -206,11 +206,13 @@ class MapAllocatorCache {
     computePercentage(SuccessfulRetrieves, CallsToRetrieve, &Integral,
                       &Fractional);
     const s32 Interval = atomic_load_relaxed(&ReleaseToOsIntervalMs);
-    Str->append(
-        "Stats: MapAllocatorCache: EntriesCount: %zu, "
-        "MaxEntriesCount: %u, MaxEntrySize: %zu, ReleaseToOsIntervalMs = %d\n",
-        LRUEntries.size(), atomic_load_relaxed(&MaxEntriesCount),
-        atomic_load_relaxed(&MaxEntrySize), Interval >= 0 ? Interval : -1);
+    Str->append("Stats: MapAllocatorCache: EntriesCount: %zu, "
+                "MaxEntriesCount: %u, MaxEntrySize: %zu, ReleaseToOsSkips: "
+                "%zu, ReleaseToOsIntervalMs = %d\n",
+                LRUEntries.size(), atomic_load_relaxed(&MaxEntriesCount),
+                atomic_load_relaxed(&MaxEntrySize),
+                atomic_load_relaxed(&ReleaseToOsSkips),
+                Interval >= 0 ? Interval : -1);
     Str->append("Stats: CacheRetrievalStats: SuccessRate: %u/%u "
                 "(%zu.%02zu%%)\n",
                 SuccessfulRetrieves, CallsToRetrieve, Integral, Fractional);
@@ -343,8 +345,15 @@ class MapAllocatorCache {
       unmapCallBack(EvictMemMap);
 
     if (Interval >= 0) {
-      // TODO: Add ReleaseToOS logic to LRU algorithm
-      releaseOlderThan(Time - static_cast<u64>(Interval) * 1000000);
+      // It is very likely that multiple threads trying to do a release at the
+      // same time will not actually release any extra elements. Therefore,
+      // let any other thread continue, skipping the release.
+      if (Mutex.tryLock()) {
+        // TODO: Add ReleaseToOS logic to LRU algorithm
+        releaseOlderThan(Time - static_cast<u64>(Interval) * 1000000);
+        Mutex.unlock();
+      } else
+        atomic_fetch_add(&ReleaseToOsSkips, 1U, memory_order_relaxed);
     }
   }
 
@@ -488,7 +497,12 @@ class MapAllocatorCache {
     return true;
   }
 
-  void releaseToOS() { releaseOlderThan(UINT64_MAX); }
+  void releaseToOS() EXCLUDES(Mutex) {
+    // Since this is a request to release everything, always wait for the
+    // lock so that we guarantee all entries are released after this call.
+    ScopedLock L(Mutex);
+    releaseOlderThan(UINT64_MAX);
+  }
 
   void disableMemoryTagging() EXCLUDES(Mutex) {
     ScopedLock L(Mutex);
@@ -554,8 +568,7 @@ class MapAllocatorCache {
     Entry.Time = 0;
   }
 
-  void releaseOlderThan(u64 Time) EXCLUDES(Mutex) {
-    ScopedLock L(Mutex);
+  void releaseOlderThan(u64 Time) REQUIRES(Mutex) {
     if (!LRUEntries.size() || OldestTime == 0 || OldestTime > Time)
       return;
     OldestTime = 0;
@@ -573,6 +586,7 @@ class MapAllocatorCache {
   atomic_s32 ReleaseToOsIntervalMs = {};
   u32 CallsToRetrieve GUARDED_BY(Mutex) = 0;
   u32 SuccessfulRetrieves GUARDED_BY(Mutex) = 0;
+  atomic_uptr ReleaseToOsSkips = {};
 
   CachedBlock Entries[Config::getEntriesArraySize()] GUARDED_BY(Mutex) = {};
   NonZeroLengthArray<CachedBlock, Config::getQuarantineSize()>

cferris1000 · 2025-03-24T21:06:56Z

I verified that system_server on Android devices does hit this in regular operation, so this seems worthwhile.

ajordanr-google

This all looks good, fairly clear, and I see the value of this! Also neat that it's represented in the stats. This seems like a strict improvement over the previous set up.

cferris1000 requested a review from ajordanr-google March 24, 2025 21:05

llvmbot added compiler-rt compiler-rt:scudo Scudo Hardened Allocator compiler-rt:sanitizer labels Mar 24, 2025

ajordanr-google approved these changes Mar 25, 2025

View reviewed changes

cferris1000 merged commit 134cb88 into llvm:main Mar 25, 2025
14 checks passed

cferris1000 deleted the secondary_trylock branch March 25, 2025 23:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[scudo] Use a tryLock in secondary release to OS #132827

[scudo] Use a tryLock in secondary release to OS #132827

Uh oh!

cferris1000 commented Mar 24, 2025

Uh oh!

llvmbot commented Mar 24, 2025

Uh oh!

cferris1000 commented Mar 24, 2025

Uh oh!

ajordanr-google left a comment

Uh oh!

Uh oh!

Uh oh!

[scudo] Use a tryLock in secondary release to OS #132827

[scudo] Use a tryLock in secondary release to OS #132827

Uh oh!

Conversation

cferris1000 commented Mar 24, 2025

Uh oh!

llvmbot commented Mar 24, 2025

Uh oh!

cferris1000 commented Mar 24, 2025

Uh oh!

ajordanr-google left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!