# Virtual Memory: Just an Illusion

Hung-Wei Tseng

### Let's dig into this code

```
int main(int argc, char *argv[])
    int i,j;
    double **a;
    double sum=0, average;
    int dim=32768;
    if(argc < 2)
        fprintf(stderr, "Usage: %s dimension\n", argv[0]);
        exit(1);
    dim = atoi(argv[1]);
    a = (double **)malloc(sizeof(double *)*dim);
    for(i = 0 ; i < dim; i++)
        a[i] = (double *)malloc(sizeof(double)*dim);
    for(i = 0 ; i < dim; i++)
        for(j = 0 ; j < dim; j++)
            a[i][j] = rand();
    for(i = 0 ; i < dim; i++)
        for(j = 0 ; j < dim; j++)
            sum+=a[i][j];
    average = sum/(dim*dim);
    fprintf(stderr, "average: %lf\n", average);
    for(i = 0 ; i < dim; i++)
        free(a[i]);
    free(a);
    return 0;
```

### Let's dig into this code

```
#define GNU SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <sched.h>
#include <sys/syscall.h>
#include <time.h>
double a;
int main(int argc, char *argv[])
    int i, number_of_total_processes=4;
    number of total processes = atoi(argv[1]);
    // Create processes
    for(i = 0; i< number_of_total_processes-1 && fork(); i++);</pre>
    // Generate rand seed
    srand((int)time(NULL)+(int)getpid());
    a = rand();
    fprintf(stderr, "\nProcess %d. Value of a is %lf and address of a is %p\n",getpid(), a, &a);
    sleep(10);
    fprintf(stderr, "\nProcess %d. Value of a is %lf and address of a is %p\n",getpid(), a, &a);
    return 0;
```

### **Outline**

- Virtual memory
- Architectural support for virtual memory

# Virtual Memory

# Virtual memory



### Virtual memory

- An abstraction of memory space available for programs/ software/programmer
- Programs execute using virtual memory address
- The operating system and hardware work together to handle the mapping between virtual memory addresses and real/ physical memory addresses
- Virtual memory organizes memory locations into "pages"

Processor Core

Registers

The virtual memory abstraction

load 0x0009

Page table

MaiPage#hory

(DRAM)





### Why Virtual memory?

- Allowing multiple applications to share physical main memory
  - Memory protection/isolation among programs/processes is automatically achieved
- Allowing applications to work even the installed physical memory or available physical memory is smaller than the working set of the application
  - Programmer does not need to worry about the physical memory capacity of different machines — make compiled program compatible
  - Multiple programs can work concurrently even through their total memory demand is larger than the installed physical memory

### Mechanism: Demand paging

- Treating physical main memory as a "cache" of virtual memory
- The block size is the "page size"
- The page table is the "tag array"
- It's a "fully-associate" cache a virtual page can go anywhere in the physical main memory
- The storage serves as the lower level memory hierarchy for physical main memory

### **Takeaways: Virtual Memory**

Virtual memory is essential to support the success of software industry

### **Address translation**

- Processor receives virtual addresses from the running code, main memory uses physical memory addresses
- Virtual address space is organized into "pages"
- The system references the page table to translate addresses
  - Each process has its own page table
  - The page table content is maintained by OS



### Conventional page table

0xffffffffffffff

#### **Virtual Address Space**

- must be consecutive in the physical memory
- need a big segment! difficult to find a spot
- simply too big to fit in memory if address space is large!

$$\frac{2^{64} B}{2^{12} B}$$
 page table entries/leaf nodes -

### Do we really need a large table?



# "Paged" page table

0xfffffffffffffff

Code Data Heap Virtual Address Space Stack

Break up entries into pages!

Each of these occupies exactly a page

$$-\frac{2^{12} B}{2^{3} P} = 2^{9}$$
 PTEs per node

0x0

**Question:** 

These nodes are spread out, how to locate them in the memory?

Otherwise, you always need to find more than one consecutive pages — difficult!

These are nodes are not presented if they are not referenced at all — save space

Allocate page table entry nodes "on demand"

### **B-tree**



https://en.wikipedia.org/wiki/B-tree#/media/File:B-tree.svg

# **Hierarchical Page Table**

0x0 0xffffffffffffff Code Data Heap **Virtual Address Space** Stack  $\lceil log_{2^9} \frac{2^{64} B}{2^{12} B} \rceil = \lceil log_{2^9} 2^{52} \rceil = 6 \text{ levels}$ These are nodes are not presented as they are not referenced at all.  $\frac{2}{2^{12}}$  page table entries/leaf nodes (worst case)

### Address translation in x86-64



### **Takeaways: Virtual Memory**

- Virtual memory is essential to support the success of software industry
- To reduce the page table size, we introduced hierarchical page table data structure

# Avoiding the address translation overhead

### **TLB: Translation Look-aside Buffer**



- TLB a small SRAM stores frequently used page table entries
- Good A lot faster than having everything going to the DRAM
- Bad Still on the critical path

### TLB + Virtual cache

- L1 \$ accepts virtual address you don't need to translate
- Good you can access both TLB and L1-\$ at the same time and physical address is only needed if L1-\$ missed d/sd
- Bad it doesn't work in practice
  - Many applications have the same virtual address but should be pointing different physical addresses
  - An application can have "aliasing virtual addresses" pointing to the same physical address



# Virtually indexed, physically tagged cache

- Can we find physical address directly in the virtual address
  - Not everything but the page offset isn't changing!
- Can we indexing the cache using the "partial physical address"?
  - Yes Just make set index + block set to be exactly the page offset



## Virtually indexed, physically tagged cache



# Virtually indexed, physically tagged cache

If page size is 4KB —

$$lg(B) + lg(S) = lg(4096) = 12$$

$$C = ABS$$

$$C = A \times 2^{12}$$

$$if A = 1$$

$$C = 4KB$$



### **Takeaways: Virtual Memory**

- Virtual memory is essential to support the success of software industry
- To reduce the page table size, we introduced hierarchical page table data structure
- Virtually-indexed, physically tagged cache provides the efficiency for accessing cache and TLB together — but limited cache design

# Translation Caching: Skip, Don't Walk (the Page Table)

Thomas W. Barr, Alan L. Cox, Scott Rixner

### Why should we care about this paper?

- TLB miss is expensive
  - You have to walk through multiple nodes in the hierarchical page table
  - Each node is a memory access 100 ns
- Modern processors use memory management units (MMUs)
  - MMUs have caches, but not optimized for the timing critical TLB miss
  - Page table caches
  - Translational caches

### Page table caches



### Page table caches

- PTC caches the addresses of "page table nodes"
- PTC uses the physical address of page table nodes as the index
  - Unified page table cache (UPTC)
  - Split page table cache (SPTC)
    - Each page level get a private cache location

### **Translation cache**



### **Translation caches**

- Indexed by the prefix of the requesting virtual address
  - Split translational cache (STC)
  - Unified translational cache (UTC)
  - Translational-path Cache (TPC)
- Pros:
  - Allowing each level lookup to perform independently, in parallel
- Cons:
  - Less space efficient

### **Takeaways: Virtual Memory**

- Virtual memory is essential to support the success of software industry
- To reduce the page table size, we introduced hierarchical page table data structure
- Virtually-indexed, physically tagged cache provides the efficiency for accessing cache and TLB together — but limited cache design
- Page table caches & translation caching can help reducing the TLB miss penalty

#### Announcement

- Assignment #3 due this Thursday
- Programming Assignment #2 due 11/7
- Midterm next Tuesday
  - 80 minutes, in-person only
  - Closed book, closed note, no laptop, no mobile phones (including the calculator app)
  - You may use a calculator
  - Will release sample midterm questions on Thursday

# Computer Science & Engineering

203



