## **Virtual Memory: Systems**

Introduction to Computer Systems

# **Today**

- Virtual memory questions and answers
- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping

# Virtual memory reminder/review

#### Programmer's view of virtual memory

- Each process has its own private linear address space
- Cannot be corrupted by other processes

#### System view of virtual memory

- Uses memory efficiently by caching virtual memory pages
  - Efficient only because of locality
- Simplifies memory management and programming
- Simplifies protection by providing a convenient interpositioning point to check permissions

## **Recall: Address Translation With a Page Table**



### **Recall: Address Translation: Page Hit**



- 1) Processor sends virtual address to MMU
- 2-3) MMU fetches PTE from page table in memory
- 4) MMU sends physical address to cache/memory
- 5) Cache/memory sends data word to processor

#### **Question #1**

Are the PTEs cached like other memory accesses?

Yes (and no: see next question)

### Page tables in memory, like other data



VA: virtual address, PA: physical address, PTE: page table entry, PTEA = PTE address

#### Question #2

■ Isn't it slow to have to go to memory twice every time?

■ Yes, it would be... so, real MMUs don't

## **Speeding up Translation with a TLB**

- Page table entries (PTEs) are cached in L1 like any other memory word
  - PTEs may be evicted by other data references
  - PTE hit still requires a small L1 delay
- Solution: Translation Lookaside Buffer (TLB)
  - Small, dedicated, super-fast hardware cache of PTEs in MMU
  - Contains complete page table entries for small number of pages

#### **TLB Hit**



A TLB hit eliminates a memory access

#### **TLB Miss**



#### A TLB miss incurs an additional memory access (the PTE)

Fortunately, TLB misses are rare. Why?

#### **Question #3**

Aren't the TLB contents wrong after a context switch?

- Yes, they would be, so something must be done..
  - Option 1: flush TLB on context switch
  - Option 2: associate a process ID with each TLB entry

#### **Question #4**

Isn't the page table huge? How can it be stored in RAM?

■ Yes, it would be... so, real page tables aren't simple arrays

## **Multi-Level Page Tables**

#### Suppose:

4KB (2<sup>12</sup>) page size, 64-bit address space, 8-byte PTE

#### Problem:

- Would need a 32,000 TB page table!
  - $2^{64} * 2^{-12} * 2^3 = 2^{55}$  bytes

#### Common solution:

- Multi-level page tables
- Example: 2-level page table
  - Level 1 table: each PTE points to a page table (always memory resident)
  - Level 2 table: each PTE points to a page (paged in and out like any other data)



# A Two-Level Page Table Hierarchy



## Translating with a k-level Page Table



# **Today**

- Virtual memory questions and answers
- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping

### **Review of Symbols**

#### Basic Parameters

- N = 2<sup>n</sup>: Number of addresses in virtual address space
- M = 2<sup>m</sup>: Number of addresses in physical address space
- **P = 2**<sup>p</sup> : Page size (bytes)

#### Components of the virtual address (VA)

- TLBI: TLB index
- TLBT: TLB tag
- VPO: Virtual page offset
- VPN: Virtual page number

#### Components of the physical address (PA)

- PPO: Physical page offset (same as VPO)
- PPN: Physical page number
- **CO**: Byte offset within cache line
- CI: Cache index
- CT: Cache tag

### **Simple Memory System Example**

#### Addressing

- 14-bit virtual addresses
- 12-bit physical address
- Page size = 64 bytes



## 1. Simple Memory System TLB

- 16 entries
- 4-way associative



| Set | Tag | PPN | Valid |
|-----|-----|-----|-------|-----|-----|-------|-----|-----|-------|-----|-----|-------|
| 0   | 03  | _   | 0     | 09  | 0D  | 1     | 00  | _   | 0     | 07  | 02  | 1     |
| 1   | 03  | 2D  | 1     | 02  | _   | 0     | 04  | _   | 0     | 0A  | _   | 0     |
| 2   | 02  | _   | 0     | 08  | _   | 0     | 06  | _   | 0     | 03  | _   | 0     |
| 3   | 07  | _   | 0     | 03  | 0D  | 1     | 0A  | 34  | 1     | 02  | _   | 0     |

## 2. Simple Memory System Page Table

Only show first 16 entries (out of 256)

| VPN | PPN | Valid |
|-----|-----|-------|
| 00  | 28  | 1     |
| 01  | 1   | 0     |
| 02  | 33  | 1     |
| 03  | 02  | 1     |
| 04  | _   | 0     |
| 05  | 16  | 1     |
| 06  | _   | 0     |
| 07  | _   | 0     |

| VPN        | PPN | Valid |
|------------|-----|-------|
| 80         | 13  | 1     |
| 09         | 17  | 1     |
| <b>0</b> A | 09  | 1     |
| ОВ         | _   | 0     |
| OC         | -   | 0     |
| <b>0</b> D | 2D  | 1     |
| 0E         | 11  | 1     |
| OF         | 0D  | 1     |

### 3. Simple Memory System Cache

- 16 lines, 4-byte block size
- Physically addressed
- Direct mapped



| Idx | Tag | Valid | В0 | B1 | B2 | В3 |
|-----|-----|-------|----|----|----|----|
| 0   | 19  | 1     | 99 | 11 | 23 | 11 |
| 1   | 15  | 0     | _  | _  | _  | _  |
| 2   | 1B  | 1     | 00 | 02 | 04 | 08 |
| 3   | 36  | 0     | _  | -  | _  | _  |
| 4   | 32  | 1     | 43 | 6D | 8F | 09 |
| 5   | 0D  | 1     | 36 | 72 | F0 | 1D |
| 6   | 31  | 0     |    | _  | _  |    |
| 7   | 16  | 1     | 11 | C2 | DF | 03 |

| Idx | Tag | Valid | В0 | B1 | B2 | В3 |
|-----|-----|-------|----|----|----|----|
| 8   | 24  | 1     | 3A | 00 | 51 | 89 |
| 9   | 2D  | 0     | _  | _  | _  | _  |
| Α   | 2D  | 1     | 93 | 15 | DA | 3B |
| В   | 0B  | 0     | -  | _  | _  | _  |
| С   | 12  | 0     | -  | _  | _  | -  |
| D   | 16  | 1     | 04 | 96 | 34 | 15 |
| Е   | 13  | 1     | 83 | 77 | 1B | D3 |
| F   | 14  | 0     | _  | _  | _  | _  |

## **Address Translation Example #1**

Virtual Address: 0x03D4



#### **Physical Address**



## **Address Translation Example #2**

Virtual Address: 0x0B8F



#### **Physical Address**



### **Address Translation Example #3**

Virtual Address: 0x0020



#### **Physical Address**



# **Today**

- Virtual memory questions and answers
- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping

### **Intel Core i7 Memory System**





### **Review of Symbols**

#### Basic Parameters

- N = 2<sup>n</sup>: Number of addresses in virtual address space
- M = 2<sup>m</sup>: Number of addresses in physical address space
- **P = 2**<sup>p</sup> : Page size (bytes)

#### Components of the virtual address (VA)

- TLBI: TLB index
- TLBT: TLB tag
- VPO: Virtual page offset
- VPN: Virtual page number

#### Components of the physical address (PA)

- PPO: Physical page offset (same as VPO)
- PPN: Physical page number
- CO: Byte offset within cache line
- CI: Cache index
- CT: Cache tag

#### **End-to-end Core i7 Address Translation**



## **Core i7 Level 1-3 Page Table Entries**



Available for OS (page table location on disk)

P=0

#### Each entry references a 4K child page table. Significant fields:

**P:** Child page table present in physical memory (1) or not (0).

**R/W:** Read-only or read-write access access permission for all reachable pages.

**U/S:** user or supervisor (kernel) mode access permission for all reachable pages.

**WT:** Write-through or write-back cache policy for the child page table.

A: Reference bit (set by MMU on reads and writes, cleared by software).

PS: Page size either 4 KB or 4 MB (defined for Level 3 PTEs only).

Page table physical base address: 40 most significant bits of physical page table address (forces page tables to be 4KB aligned)

**XD:** Disable or enable instruction fetches from all pages reachable from this PTE.

## **Core i7 Level 4 Page Table Entries**



Available for OS (page location on disk)

P=0

#### Each entry references a 4K child page. Significant fields:

**P:** Child page is present in memory (1) or not (0)

**R/W:** Read-only or read-write access permission for child page

**U/S:** User or supervisor mode access

**WT:** Write-through or write-back cache policy for this page

A: Reference bit (set by MMU on reads and writes, cleared by software)

**D:** Dirty bit (set by MMU on writes, cleared by software)

Page physical base address: 40 most significant bits of physical page address (forces pages to be 4KB aligned)

**XD:** Disable or enable instruction fetches from this page.

### **Core i7 Page Table Translation**



## **Cute Trick for Speeding Up L1 Access**



#### Observation

- Bits that determine CI identical in virtual and physical address
- Can index into cache while address translation taking place
- Generally we hit in TLB, so PPN bits (CT bits) available next
- "Virtually indexed, physically tagged"
- Cache carefully sized to make this possible

### **Virtual Address Space of a Linux Process**



# Linux Organizes VM as Collection of "Areas"



# **Linux Page Fault Handling**



**Segmentation fault:** accessing a non-existing page

Normal page fault

#### **Protection exception:**

e.g., violating permission by writing to a read-only page (Linux reports as Segmentation fault)

## **Today**

- Virtual memory questions and answers
- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping

### **Memory Mapping**

- VM areas initialized by associating them with disk objects.
  - Process is known as memory mapping.
- Area can be backed by (i.e., get its initial values from) :
  - Regular file on disk (e.g., an executable object file)
    - Initial page bytes come from a section of a file
  - Anonymous file (e.g., nothing)
    - First fault will allocate a physical page full of 0's (demand-zero page)
    - Once the page is written to (dirtied), it is like any other page
- Dirty pages are copied back and forth between memory and a special swap file.

# Revisiting fork()

Shouldn't fork() be really slow, since the child needs a copy of the parent's address space?

Yes, it would be... so, fork() doesn't really work that way

### **Sharing Revisited: Shared Objects**



Process 1 maps the shared object.

### **Sharing Revisited: Shared Objects**



- Process 2 maps the shared object.
- Notice how the virtual addresses can be different.

# Sharing Revisited: Private Copy-on-write (COW) Objects



- Two processes mapping a private copy-on-write (COW) object.
- Area flagged as private copy-onwrite
- PTEs in private areas are flagged as read-only

# Sharing Revisited: Private Copy-on-write (COW) Objects



- Instruction writing to private page triggers protection fault.
- Handler creates new R/W page.
- Instruction restarts upon handler return.
- Copying deferred as long as possible!

#### The fork Function Revisited

- VM and memory mapping explain how fork provides private address space for each process.
  - Perfect approach for common case of fork() followed by exec()
- To create virtual address for new new process
  - Create exact copies of current mm\_struct, vm\_area\_struct, and page tables.
  - Flag each page in both processes as read-only
  - Flag each vm area struct in both processes as private COW
- On return, each process has exact copy of virtual memory
- Subsequent writes create new pages using COW mechanism

#### The execve Function Revisited



- To load and run a new program a . out in the current process using execve:
- Free vm\_area\_struct's and page tables for old areas
- Create vm\_area\_struct's and page tables for new areas
  - Programs and initialized data backed by object files.
  - .bss and stack backed by anonymous files.
- Set PC to entry point in . text
  - Linux will fault in code and data pages as needed.

### **User-Level Memory Mapping**

- Map len bytes starting at offset offset of the file specified by file description fd, preferably at address start
  - start: may be 0 for "pick an address"
  - prot: PROT\_READ, PROT\_WRITE, ...
  - flags: MAP\_ANON, MAP\_PRIVATE, MAP\_SHARED, ...
- Return a pointer to start of mapped area (may not be start)

### **User-Level Memory Mapping**



## Example: Using mmap to Copy Files

Copying a file to stdout without transferring data to user space.

```
#include "csapp.h"
void mmapcopy (int fd, int size)
   /* Ptr to memory mapped area */
    char *bufp:
    bufp = Mmap(NULL, size,
                PROT READ,
                MAP PRIVATE,
                fd, 0);
    Write(1, bufp, size):
   return:
                             mmapcopy.c
```

```
/* mmapcopy driver */
int main(int argc, char **argv)
    struct stat stat;
   int fd:
   /* Check for required cmd line arg */
    if (argc != 2) {
       printf("usage: %s <filename>\n",
               argv[0]);
       exit(0):
   /* Copy input file to stdout */
   fd = Open(argv[1], O RDONLY, 0);
   Fstat(fd, &stat):
   mmapcopy(fd, stat.st_size);
    exit(0):
                                    mmapcopy.c
```