# Paging

We want to be dynamically allocate memory, but also ensure that our physical memory doesn't become fragmented.

The basic idea is that we break physical memory is equal size chunks which give out as needed.

We have two memory spaces to keep in mind, the Virtual Address Space and the Physical memory.

We break the Virtual Address Space (VAS) into **Virtual Pages**.

For example, an 64 byte address space:

<br>
<img src="images/07-paging-VAS.png" width="500">
<br>

We break physical memory into **Physical Frames**.

The OS maps virtual pages to physical frames.

`VP_0` has to be in some physical frame in memory.

Virtual pages and physical frames must be of the same size.

<br>
<img src="images/08-paging-phys-mem.png" width="500">
<br>

The OS stores the mapping from virtual pages to physical frames as an array referred to as the **Page Table**.

This is a per-process data structure.

The 0th index of the page table tells us where `VP-0` is stored at in physical memory.

The 1st index of the page table tells us where `VP-1` is stored at in physical memory.

The Page Table is stored in OS memory. There is a specific register which holds the memory address of the beginning of the Page Table array.

This register is called the **Page Table Base Register (PTBR)**

# The Anatomy of a Virtual Address

Each VA must contain:

1) which virtual page this address is referencing: the Virtual Page Number (VPN)
2) how far into that virtual page we want to access: the OFFSET

<br>
<img src="images/09-paging-VA.png" width="300">
<br>

With a 64 byte address space, we need 6 bits to distinctly refer to every byte.

With 4 total pages, we need 2 bits to serve as the VPN, leaving 4 bits as the OFFSET.

For the above example, the VPN is 1, and we want the 5th byte in Virtual Page 0

Another Example:

```
VA: 111010
VA: 11 | 1010
   VPN | OFFSET
VPN: 3
OFFSET: 10

We want the 10th byte in the `VP_3`.

# Address Translation

When a process requests access to some Virtual Address, the OS:

1) breaks the VA into its parts: the VPN and OFFSET
2) looks up in the Page Table which Physical Frame this Virtual Page is stored on
3) calculates the physical address by concatenating the OFFSET to the Physical Frame Number

<br>
<img src="images/10-VA-to-PA.png" width="400">
<br>

<br>
<img src="images/11-phys-mem.png" width="500">
<br>

# The Page Table

The Page Table is an array of Page Table Entries (PTE).

Every PTE contains the Physical Frame Number that this page maps to and extra bits of information such:

- is this frame valid (has it been allocated to this process yet)
- is it protected (kernel mode needed to acces it)
- is it in RAM or has it been swapped to disk
- ...

<br>
<img src="images/12-PTE.png" width="500">
<br>

To perform an address translation:

<br>
<img src="images/13-paging-mem-access.png" width="500">
<br>

Problem with the above:

It is TOO SLOW. Every memory access requires two memory accesses. 

## Virtual Addresses in Practice

We're familiar with the idea that there are 32-bit or 64-bit computer.

32-bit computers have registers on the CPU that are 32-bits large.

Since memory addresses are stored in registers, this gives an upper bounds for how large address spaces can be.

On a 32-bit computer, virtual addresses are 32-bits large.

### Standard Pages

A good size for a page is 4KB: 4096 bytes

4KB hits the sweet spont between being being able to allocate memory in large enough chunks, but not so large that alot of space is wasted by having many pages with only a few bytes used in them.

We need 12 bits in order to access into pages of size 4096 because

$2^{12} = 4096$

$2^{12} = 2^2 * 2^{10} = 4 * 1024 = \text{4 KB}$

For reference:

$2^{10} = 1024 = \text{1 KB}$

$2^{20} = \text{1 MB}$

$2^{30} = \text{1 GB}$

If we have a 32 bit address and use 12 of them for the OFFSET, then that leaves 20 bits for the VPN.

```
|               VPN             |  OFFSET |
31                              11        0
```

With 20 bits for the VPN, we can have 

$2^{20} = \text{1048576 pages}$


### Size of the Page Table

The page table has an entry for every possible virtual page.

So every page table (we have one for every process) has 1048576 pages.

Every PTE is 32-bits, bytes.

The entire page table is:

$1048576 * \text{4 bytes} = \text{4194304 bytes}$

$= 4MB$

The OS stores one page table for every process.

If we have 1000 running processes, then we need 4000 MB (~4GB) of memory just to store the page tables.

Our page tables are too large!

In practice, most processes only use a fraction of their total addressable space. So why store PTEs for every possible translation when only a few (relatively speaking) are needed

## 64-bit address spaces

A 32-bit address space allows a process to use up to 4 GB of memory.

If a 32-bit address space is the size of tennis court, a 64-bit space is the size of Euroope.

This is so large, we don't need that much space.

Modern computers use 48 of the bits for virtual addresses.

A 48-bit address space is:

$2^{48} = 2^8 * 2^{40} = \text{256 TB}$

The size of our page table for this address space would be:

$2^{36} \text{ PTEs} * \text{4 bytes}$

$=2^{36} * 2^2 = 2^{38} = 2^8 * 2^{30} =  \text{256 GB}$



# The Translation Lookaside Buffer (TLB)

The TLB is a cache directly on the CPU.

It stored commonly accessed address translation information so that we don't have to go to the page table (in RAM) for every translation.

TLB entries contain a VPN and its corresponding PFN along with a couple of bits of meta information (valid, protection, etc) and a process identifier.

The process identifier is not the PID. It is an Address Space IDentifier (ASID).

The TLB is small, on the order of 128 or 256 entries.

<br>
<img src="images/14-TLB.png" width="500">
<br>

## TLB Speedup

With a TLB, most memory accesses can be resolved by referring to it instead of the Page Table.

Most processes are frequently accessing the same pages.

After any of these pages are accessed the first time, all subsequence accesses can be handled through the TLB.

<br>
<img src="images/15-array.png" width="350">
<br>

In practice a page is 4096 bytes. If we have an array of ints, every value requirs 4 bytes.

We can 1024 entries of this array in a single page.

When we access the first one, this page is added to the TLB and this access is slow because we had to go to the Page Table.

For the next 1023 accesses, we can get the translation info from the TLB and these accesses are fast.

When the TLB gets full, we will have to evict entries. We will discuss these policies when we get to page swapping. 

# Smaller Page Tables

We know that our page tables are too large. The VAST majority of a linear page table contains invalid entries, pages that have not been allocated yet. Why store that information, why not just try to store what we are actually using?

<br>
<img src="images/16-VAS-PM.png" width="500">
<br>

<br>
<img src="images/18-linear-PT.png" width="250">
<br>

The middle of the address space and middle of the page table is unused.

Let's only store the parts that are used.

We can break the page table into equal sized chucks and only store thoses needed.

<br>
<img src="images/19-multilevel-PT.png" width="500">
<br>