# **Virtual Memory: Details**

COMP402127: Introduction to Computer Systems

Hao Li Xi'an Jiaotong University

# **Today**

- Review concepts from last lecture
- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping

### **Review: Virtual Addressing**



- Virtual address space is an abstraction, not real memory
- Physical memory refers to the actual computer memory (DRAM)

### **Review: Per-process Virtual Address Space**

- Each process has its own *virtual address space*
- All processes share the same Physical Memory



### **Review: Page Table**



 A page table contains page table entries (PTEs) that map virtual pages to physical pages.

#### **Conceptual Question**

The MMU must know the *physical* address of the page table in order to read page table entries from memory. Why does it need a physical address?

If the MMU knew only a virtual address for the page table, then, in order to find the page table in memory, it would first need to look up the physical address of the page table, in the page table itself, ...

### Review: Translating with a k-level Page Table

Having multiple levels greatly reduces total page table size



#### **Conceptual Questions**

#### Why are one-level page tables impractical?

For typical system sizes, the table would require more physical memory (e.g., 512 GBs) than most computers have.

#### How does a multi-level page table fix this problem?

Only allocates the part of the page table tree that's needed for the virtual addresses the program uses.

# Why is memory access slower with a multi-level page table than with a one-level page table?

A k-level page table requires k memory loads in order to determine the physical address. There is no spatial locality to these loads (see next slide).

### The problem (with k-level page tables)



### **Review: Translation Lookaside Buffer (TLB)**

- A small cache dedicated to storing mappings from virtual addresses to physical addresses (page table entries)
- MMU consults the TLB for each address as its first action. If there is a TLB hit, it does not need to fetch anything from the page table (avoiding k lookups)



#### **Review: Accessing the TLB**

■ MMU uses the VPN portion of the virtual address to access the TLB:



### **Conceptual Question**

#### How does virtual memory interact with the CPU cache(s)?

The cache's function is to speed up access to whatever data is most frequently used. The MMU sits "in between" the CPU and the cache; the cache works only with physical addresses. This means data from multiple processes may coexist in the cache (or compete for cache space).





1. MMU uses VA to find PTE & get PA

2. PA is used to look in cache for data

# **Today**

- Review concepts from last lecture
- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping

### **Simple Memory System Example**

#### Addressing

- 14-bit virtual addresses
- 12-bit physical address
- Page size = 64 bytes

Why is the Why is the VPO 6 bits? PPO 6 bits?

Why is the Why is the VPN 8 bits? PPN 6 bits?





**Physical Page Number** 

**Physical Page Offset** 

### **Simple Memory System TLB**

- 16 entries
- 4-way associative



VPN = 0b1101 = 0x0D

#### **Translation Lookaside Buffer (TLB)**

| Set | Tag | PPN | Valid |
|-----|-----|-----|-------|-----|-----|-------|-----|-----|-------|-----|-----|-------|
| 0   | 03  | _   | 0     | 09  | 0D  | 1     | 00  | -   | 0     | 07  | 02  | 1     |
| 1   | 03  | 2D  | 1     | 02  | _   | 0     | 04  | _   | 0     | 0A  | _   | 0     |
| 2   | 02  | _   | 0     | 08  | _   | 0     | 06  | _   | 0     | 03  | _   | 0     |
| 3   | 07  | -   | 0     | 03  | 0D  | 1     | 0A  | 34  | 1     | 02  | -   | 0     |

# **Simple Memory System Page Table**

Only showing the first 16 entries (out of 256)

| VPN | PPN | Valid |
|-----|-----|-------|
| 00  | 28  | 1     |
| 01  | _   | 0     |
| 02  | 33  | 1     |
| 03  | 02  | 1     |
| 04  | _   | 0     |
| 05  | 16  | 1     |
| 06  | _   | 0     |
| 07  | _   | 0     |

| VPN | PPN | Valid |
|-----|-----|-------|
| 08  | 13  | 1     |
| 09  | 17  | 1     |
| 0A  | 09  | 1     |
| ОВ  | _   | 0     |
| OC  | 1   | 0     |
| 0D  | 2D  | 1     |
| 0E  | 11  | 1     |
| OF  | 0D  | 1     |

 $0x0D \rightarrow 0x2D$ 



### **Simple Memory System Cache**

- 16 lines, 4-byte cache line size
- Physically addressed

Direct mapped





| ldx | Tag | Valid | <i>B0</i> | B1 | B2 | В3 |
|-----|-----|-------|-----------|----|----|----|
| 0   | 19  | 1     | 99        | 11 | 23 | 11 |
| 1   | 15  | 0     | 1         | -  | _  | _  |
| 2   | 1B  | 1     | 00        | 02 | 04 | 08 |
| 3   | 36  | 0     | _         | -  | _  | _  |
| 4   | 32  | 1     | 43        | 6D | 8F | 09 |
| 5   | 0D  | 1     | 36        | 72 | F0 | 1D |
| 6   | 31  | 0     | _         | _  | _  | _  |
| 7   | 16  | 1     | 11        | C2 | DF | 03 |

| ldx | Tag | Valid | В0 | B1 | B2 | В3 |
|-----|-----|-------|----|----|----|----|
| 8   | 24  | 1     | 3A | 00 | 51 | 89 |
| 9   | 2D  | 0     | _  | _  | _  | _  |
| Α   | 2D  | 1     | 93 | 15 | DA | 3B |
| В   | 0B  | 0     | _  | _  | _  | _  |
| С   | 12  | 0     | _  | _  | _  | _  |
| D   | 16  | 1     | 04 | 96 | 34 | 15 |
| Е   | 13  | 1     | 83 | 77 | 1B | D3 |
| F   | 14  | 0     | _  | -  | _  | _  |

# **Address Translation Example**

Virtual Address: 0x03D4



VPN <u>0x0F</u> TLBI <u>0x3</u> TLBT <u>0x03</u> TLB Hit? <u>Y</u> Page Fault? <u>N</u> PPN: <u>0x0D</u>

**TLB** 

| 3 | Set | Tag | PPN | Valid |
|---|-----|-----|-----|-------|-----|-----|-------|-----|-----|-------|-----|-----|-------|
|   | 0   | 03  | -   | 0     | 09  | 0D  | 1     | 00  | -   | 0     | 07  | 02  | 1     |
|   | 1   | 03  | 2D  | 1     | 02  | _   | 0     | 04  | _   | 0     | 0A  | _   | 0     |
|   | 2   | 02  | _   | 0     | 08  | _   | 0     | 06  | _   | 0     | 03  | _   | 0     |
|   | 3   | 07  | _   | 0     | 03  | 0D  | 1     | 0A  | 34  | 1     | 02  | _   | 0     |

#### **Physical Address**



# **Address Translation Example**

#### **Physical Address**



#### Cache

CO <u>0</u>

| Idx | Tag | Valid | В0 | B1 | B2 | В3 |
|-----|-----|-------|----|----|----|----|
| 0   | 19  | 1     | 99 | 11 | 23 | 11 |
| 1   | 15  | 0     | 1  | -  | -  | _  |
| 2   | 1B  | 1     | 00 | 02 | 04 | 08 |
| 3   | 36  | 0     | _  | _  | _  | -  |
| 4   | 32  | 1     | 43 | 6D | 8F | 09 |
| 5   | 0D  | 1     | 36 | 72 | F0 | 1D |
| 6   | 31  | 0     | _  | _  | _  | _  |
| 7   | 16  | 1     | 11 | C2 | DF | 03 |

| ldx | Tag | Valid | В0 | B1 | B2 | В3 |
|-----|-----|-------|----|----|----|----|
| 8   | 24  | 1     | 3A | 00 | 51 | 89 |
| 9   | 2D  | 0     | -  | -  | ı  | -  |
| Α   | 2D  | 1     | 93 | 15 | DA | 3B |
| В   | 0B  | 0     | -  | _  | _  | _  |
| С   | 12  | 0     | _  | _  | _  | _  |
| D   | 16  | 1     | 04 | 96 | 34 | 15 |
| E   | 13  | 1     | 83 | 77 | 1B | D3 |
| F   | 14  | 0     | _  | _  | _  | _  |

#### Address Translation Example: TLB/Cache Miss

Virtual Address: 0x0020



#### **Physical Address**



| VPN | PPN | Valid |
|-----|-----|-------|
| 00  | 28  | 1     |
| 01  | _   | 0     |
| 02  | 33  | 1     |
| 03  | 02  | 1     |
| 04  | _   | 0     |
| 05  | 16  | 1     |
| 06  | _   | 0     |
| 07  | _   | 0     |

#### Address Translation Example: TLB/Cache Miss

#### Cache

| ldx | Tag | Valid | <i>B0</i> | B1 | B2 | B3 |
|-----|-----|-------|-----------|----|----|----|
| 0   | 19  | 1     | 99        | 11 | 23 | 11 |
| 1   | 15  | 0     | -         | _  | 1  | _  |
| 2   | 1B  | 1     | 00        | 02 | 04 | 08 |
| 3   | 36  | 0     | _         | _  | -  | -  |
| 4   | 32  | 1     | 43        | 6D | 8F | 09 |
| 5   | 0D  | 1     | 36        | 72 | F0 | 1D |
| 6   | 31  | 0     | _         | _  | _  | _  |
| 7   | 16  | 1     | 11        | C2 | DF | 03 |

| Idx | Tag | Valid | В0 | B1 | B2 | В3 |
|-----|-----|-------|----|----|----|----|
| 8   | 24  | 1     | 3A | 00 | 51 | 89 |
| 9   | 2D  | 0     | _  | _  | -  | _  |
| Α   | 2D  | 1     | 93 | 15 | DA | 3B |
| В   | 0B  | 0     | -  | _  | _  | _  |
| С   | 12  | 0     | -  | _  | -  | -  |
| D   | 16  | 1     | 04 | 96 | 34 | 15 |
| Е   | 13  | 1     | 83 | 77 | 1B | D3 |
| F   | 14  | 0     | _  | _  | _  | _  |

#### **Physical Address**



# **Today**

- Review concepts from last lecture
- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping

### **Intel Core i7 Memory System**



Main memory

#### **End-to-end Core i7 Address Translation**



### **Core i7 Level 1-3 Page Table Entries**



#### Each entry references a 4K child page table. Significant fields:

**P:** Child page table present in physical memory (1) or not (0).

**R/W:** Read-only or read-write access access permission for all reachable pages.

**U/S:** user or supervisor (kernel) mode access permission for all reachable pages.

**WT:** Write-through or write-back cache policy for the child page table.

A: Reference bit (set by MMU on reads and writes, cleared by software).

PS: Page size either 4 KB or 4 MB (defined for Level 1 PTEs only).

Page table physical base address: 40 most significant bits of physical page table address (forces page tables to be 4KB aligned)

**XD:** Disable or enable instruction fetches from all pages reachable from this PTE.

### **Core i7 Level 4 Page Table Entries**



#### Each entry references a 4K child page. Significant fields:

**P:** Child page is present in memory (1) or not (0)

**R/W:** Read-only or read-write access permission for child page

**U/S:** User or supervisor mode access

**WT:** Write-through or write-back cache policy for this page

A: Reference bit (set by MMU on reads and writes, cleared by software)

**D:** Dirty bit (set by MMU on writes, cleared by software)

**G:** Global page (don't evict from TLB on task switch)

Page physical base address: 40 most significant bits of physical page address (forces pages to be 4KB aligned)

**XD:** Disable or enable instruction fetches from this page.

#### **Core i7 Page Table Translation**



### **Trick for Speeding Up L1 Access**



#### ■ The story so far

- MMU accessed before L1 cache
- Doesn't that make L1 cache hits slower?
- Yes! So real systems don't do this...

### **Trick for Speeding Up L1 Access**



#### Observation

- Bits that determine CI identical in virtual and physical address
- Can index into cache while address translation taking place
- Generally we hit in TLB, so PPN bits (CT bits) available quickly
- "Virtually indexed, physically tagged"
- Cache carefully sized to make this possible

### **Trick for Speeding Up L1 Access**



- Virtual memory with no impact on memory performance!
  - MMU moved off critical path (faster than L1 cache)

# **Today**

- Review concepts from last lecture
- Simple memory system example
- Case study: Core i7/Linux memory system
- Memory mapping

### **Memory-Mapped Files**

- Paging = every page of a program's physical memory is backed by some page of disk\*
- Normally, those pages belong to swap space
- But what if some pages were backed by ... files?

\* This is how it used to work 20 years ago. Nowadays, not always true.

# **Memory-Mapped Files**



# **Memory-Mapped Files**



# **Copy-on-write sharing**

- fork creates a new process by copying the entire address space of the parent process
  - That sounds slow
  - It is slow



#### Clever trick:

- Just duplicate the page tables
- Mark everything read only (PTE permission bits for all pages set to read-only)
- Copy only on write faults

# **Copy-on-write sharing**



#### Clever trick:

- Just duplicate the page tables
- Mark everything read only
- Copy only on write faults

# **Copy-on-write sharing**



#### Clever trick:

- Just duplicate the page tables
- Mark everything read only
- Copy only on write faults

### **User-Level Memory Mapping**

- Map len bytes starting at offset offset of the file specified by file description fd, preferably at address start
  - start: may be 0 for "pick an address"
  - prot: PROT\_READ, PROT\_WRITE, PROT\_EXEC, ...
  - flags: MAP\_ANON, MAP\_PRIVATE, MAP\_SHARED, ...
- Return a pointer to start of mapped area (may not be start)

#### **User-Level Memory Mapping**



### **Uses of mmap**

#### Reading big files

Uses paging mechanism to bring files into memory

#### Shared data structures

- When call with MAP\_SHARED flag
  - Multiple processes have access to same region of memory (Risky!)

#### File-based data structures

- E.g., database
- When unmap region, file will be updated via write-back
- Can implement load from file / update / write back to file

### **Summary**

#### Programmer's view of virtual memory

- Each process has its own private linear address space
- Cannot be corrupted by other processes

#### System view of virtual memory

- Uses memory efficiently by caching virtual memory pages
  - Efficient only because of locality
- Simplifies memory management and programming
- Simplifies protection by providing a convenient interpositioning point to check permissions

#### Implemented via combination of hardware & software

- MMU, TLB, exception handling mechanisms part of hardware
- Page fault handlers, TLB management performed in software