Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically resize stream info caches based on available memory #3569

Merged
merged 1 commit into from
Oct 7, 2022

Conversation

shaan1337
Copy link
Member

@shaan1337 shaan1337 commented Aug 18, 2022

Added: Dynamically resize stream info caches based on available memory

This PR adds support for caches that can be automatically resized based on free memory. Right now only one cache is dynamic: the StreamInfo cache but others can be added later on. A user can still make the cache static by specifying --stream-info-cache-capacity.

The rationale behind the changes is that a static cache which is scaled to free memory available on startup would not be able to efficiently deal with these scenarios:

  • growth of the database with time
  • heavy operations like index merges, scavenging etc. (an estimate could be done but it would waste memory during non-peak hours)
  • launching multiple db instances on a single machine (not recommended)
  • growth in the number of clients connected
  • spike in memory usage of other processes on the system

Memory allocation ratio

The StreamInfo cache is a logical cache comprised of the LastEventNumber and StreamMetadata cache. Currently 60% of the memory allocated to the logical cache is reserved to the LastEventNumber cache and 40% to StreamMetadata cache (this is based on empirical testing).

Cache resizer hierarchy

The cache resizers are modeled as a tree structure. When resizing caches, we specify the capacity that we wan't to allocate to the root of the tree. The children (and sub-children) will then derive their capacities based on their weights and the amount allocated to them by their parent.

Resizing algorithm

The resizing algorithm is as follows:

  • Every 15 seconds:
    • first check if the amount of memory available for caching has changed by more than 200MB since the last resize. If this condition is true, continue to the next step, otherwise don't do anything.
    • check if there's less than 25% or 6 GiB free memory. If this condition is true, resize the caches, otherwise go to next step. The total free memory takes these factors into consideration
      • free system memory
      • free memory on the .NET heap
      • memory deleted by the caches but not yet garbage collected.
    • check if 10 minutes has passed since the last resize. If condition is true, resize caches, otherwise do nothing.

The 200 MB change threshold ensures that:

  • We don't resize too often if not much has changed.

Resizing the caches every 10 mins ensures that:

  • the caches can grow again when more memory is available
  • we don't abruptly decrease the cache sizes by a large value (unless there's an abrupt free memory increase within past 10 minutes)

Checking every 15 seconds ensures that:

  • we can quickly act if we exceed the 25% / 6 GiB free memory threshold before it's too late and we run out of memory

New stats

The following stats were added under .stats.es:

  "cache": {
    "StreamInfo": {
      "LastEventNumber": {
        "name": "LastEventNumber",
        "count": 266339,
        "sizeBytes": 45833588,
        "capacityBytes": 2032332494,
        "utilizationPercent": 2.2552209412245907
      },
      "Metadata": {
        "name": "Metadata",
        "count": 133169,
        "sizeBytes": 34591868,
        "capacityBytes": 1354888329,
        "utilizationPercent": 2.553115799995942
      },
      "name": "StreamInfo",
      "count": 399508,
      "sizeBytes": 80425456,
      "capacityBytes": 3387220824,
      "utilizationPercent": 2.3743788840145603
    },
    "name": "cache",
    "count": 399508,
    "sizeBytes": 80425456,
    "capacityBytes": 3387220824,
    "utilizationPercent": 2.3743788840145603
  }

.stats.sys.totalMem - total system memory

.stats.proc.gc.fragmentation - what percent of the heap is fragmented

TODOs in later PRs

  • Convert the chunks cache to a dynamic cache (e.g if ChunksCacheSize is set to -1)
  • Add a StreamInfoCacheSize parameter (measured in bytes) which deprecates StreamInfoCacheCapacity (breaking change)

@shaan1337 shaan1337 force-pushed the dynamic-cache-sizing branch 15 times, most recently from 2b70ba5 to bd84af8 Compare August 25, 2022 07:52
@shaan1337 shaan1337 force-pushed the dynamic-cache-sizing branch 5 times, most recently from 5c8323e to b75cbc6 Compare August 30, 2022 06:47
@shaan1337 shaan1337 changed the title Dynamic resizing of stream info cache [wip] Dynamically resize stream info cache based on available memory Aug 30, 2022
@shaan1337 shaan1337 marked this pull request as ready for review August 30, 2022 11:20
@shaan1337
Copy link
Member Author

Based on tests with a large instance, it seems that the size calculation is lower by ~25% than the actual size, i'll update the PR to make the calculation more precise (LRU cache overhead is not properly accounted for)

@timothycoleman
Copy link
Contributor

i pushed a bunch of trivial adjustments to a branch dynamic-cache-sizeing-tim-suggestions, consider these suggestions you can merge/squash in whichever ones you like

@timothycoleman

This comment was marked as resolved.

@shaan1337
Copy link
Member Author

hey @timothycoleman thanks for the feedback! will go through it asap.

i've pushed a couple more commits for more precise calculation and also some fixes (where memory wasn't being properly released)

@shaan1337 shaan1337 force-pushed the dynamic-cache-sizing branch 2 times, most recently from 433b1b1 to ae47617 Compare September 8, 2022 12:37
@shaan1337 shaan1337 changed the title Dynamically resize stream info cache based on available memory Dynamically resize stream info caches based on available memory Sep 23, 2022
hayley-jean
hayley-jean previously approved these changes Oct 6, 2022
Copy link
Member

@hayley-jean hayley-jean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this need to be rebased due to the dotnet6 upgrade?

Co-Authored-By:  Timothy Coleman <timothy.coleman@gmail.com>
@timothycoleman timothycoleman merged commit 3b1bdd7 into master Oct 7, 2022
@timothycoleman timothycoleman deleted the dynamic-cache-sizing branch October 7, 2022 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants