-
Notifications
You must be signed in to change notification settings - Fork 512
Memory management
This document describes BOINC's mechanisms related to RAM and swap space.
The design is based on the following (simplified) model of virtual memory:
-
Each process P has a 'virtual address space'. This can increase over time, e.g. as the program does
malloc(). -
At any point, some pages of P's virtual address space are mapped to physical memory (RAM).
-
If P references a page that's not currently mapped (a 'page fault'), the OS allocates a page of RAM and maps it.
-
If no pages are free, the OS finds the least-recently-used (LRU) page (possibly belonging to another process P2). It writes that page to an area of disk called 'swap space', unmaps it from P2, and maps it in P.
The set of pages mapped in a process P is called its 'working set'. This is a function of the memory accesses both of P and of the other processes. This 'working set size' (WSS) can change up and down over time. For example, if the program scans a large array the WSS might go way up, then gradually go back down as the RAM gets claimed by other processes.
If processes reference lots of memory, the system can enter at state where all RAM is used, and a large fraction of memory reference result in page faults. Since disk I/O is far slower than RAM access, this causes all processes to run slowly; this is called 'thrashing'. Some OSs deal with it by swapping processes entirely to disk and suspending them.
BOINC applications run at the lowest CPU priority. However, they can impact user-visible performance because of their memory usage:
- When the system is in use (i.e. when there's mouse/keyboard input), the memory usage of running BOINC apps can cause thrashing.
- If several user apps are open and the system is idle for a long period, the memory usage of BOINC apps may cause the user apps to be paged out. When the user eventually returns, it may take a while (10-20 seconds) for the user apps to get paged back in.
These effects can be minimized by limiting the memory usage of BOINC apps. However, this can reduce the CPU time available to BOINC, and on some systems BOINC would do no work at all. In general, the more computing BOINC does, the greater its potential impact on user-visible performance. One goal of our design is to provide user preferences](GlobalPrefs)) that control this tradeoff (see below).
We want to maximize the CPU efficiency of BOINC apps, i.e. to ensure that they don't thrash. On a multiprocessor, it may sometimes be more efficient (in terms of throughput) to not use all available CPUs.
Some applications can trade off memory usage for speed (e.g. by using bigger hash tables), but beyond some point increased memory usage causes thrashing and the advantage is negated. Such applications should be made aware the current memory situation, so that they can adapt their usage accordingly.
BOINC has the following preferences:
ram_max_used_busy_frac, ram_max_used_idle_frac
These limit the total WSS of running apps when the computer is in use and idle, respectively.
vm_max_used_frac
This limits the total virtual sizes of app processes (both running and suspended). It's expressed as a fraction of the swap space size. The goal is to prevent BOINC jobs from exceeding swap space size.
Notes:
- This assumes that the entire virtual space may be swapped. This is not the case: for example, the part of the space containing the program is backed by the program file.
- On Win and Linux, swap space is a dedicated part of disk, and has a fixed size. On Mac it's dynamically allocated and potentially can use the entire disk not being used for files; BOINC currently assumes this. But there are conflicting statements on Google saying that there's an additional limit in the range of 50 or 100 GB.
On startup, the client measures:
- The amount of RAM
- The amount of swap space (see above).
It measures the following periodically (every 10 seconds):
- For each running BOINC app: the working set size (for compound apps, this includes all processes).
To accommodate spikes in memory usage, BOINC the 'smoothed working set size' SWSS as
SWSS = .5*SWSS + .5*WSS
where WSS is the new value.
- For each BOINC app: the virtual space size.
What should the client use as WSS for a job J that hasn't run yet?
We don't want to start J if it's going to exceed RAM limits.
We could use the workunit rsc_memory_bound,
but most projects don't set that accurately.
We could use the max WSS of all jobs that used the same app version as J. But jobs for a given app version may vary widely over time. So instead we take the max WSS of currently running jobs that use the same app version as J.
The scheduler is divided into two parts:
- Make a list of tasks to run, ordered by 'importance' (deadline-critical ones first, then high-debt).
- Enforcement: go through the run list, starting tasks in order, and preempting other tasks as needed. Don't preempt a task that hasn't checkpointed in favor of a non-deadline-critical task.
This will be modified as follows:
- In building the run list, compute the available RAM, based on preferences. In building the list, keep track of RAM used so far. Skip any task that would cause this to exceed available RAM.
- Enforcement: compute the available RAM, based on preferences. In running tasks, keep track of RAM used so far. Skip any task that would cause the limit to be exceeded. Preempt tasks that haven't checkpointed if they would cause the limit to be exceeded.
In addition, we will add a new 'memory usage check' that runs every 30 seconds or so. This will compute the working sets of all running tasks. If the total is too large, it will trigger CPU scheduler enforcement (see above). If an individual task's working set is too large for it to ever run, it is aborted (see below).
Note: the above policies may cause some tasks to not get run for long periods. For example, suppose that
- A 2-CPU machine has 1 GB RAM,
- There's a small-RAM job X with a close deadline
- There's a 1 GB job Y
- There are several small-RAM jobs.
In this case, Y won't run until X has finished, even if it more deserving (in terms of debt) than the other small jobs. However, Y won't starve indefinitely. Eventually it will run into deadline trouble, and will run ahead of everything else.
A task is aborted if, at any point, its working set size is larger than
(RAM size)*max(ram_max_used_frac_busy, ram_max_used_frac_idle)
since this means it can't be scheduled.
The BOINC_STATUS structure contains:
double working_set_size; // app's current WS (non-smoothed)
double max_working_set_size; // app will be aborted if WS exceeds thisSo the app might size arrays to fit in the different between these.
Each workunit includes:
-
rsc_memory_bound: an estimate of the app's largest working set size.
Note: most projects supply inaccurate (usually too small) values.
A result is sent to a client only if
rsc_memory_bound < (RAM size)*max(ram_max_used_frac_busy, ram_max_used_frac_idle)
In other words, a job is sent only if the client can run it at least some of the time.
Possible ideas:
- Measure non-BOINC RAM usage (WSS size). A possible policy: if non-BOINC RAM usage is X, BOINC can use total-X.
- Make the round-robin simulator aware of memory issues.
- Measure page-fault rates for each process, and suspend BOINC apps as needed to limit this. Problem: this info doesn't seem to be available on Win; the reported page fault rate includes faults that don't read from disk.