(cont:mm:pate-tables)=
# Page Tables

### Single-level Page Table

One of the simplest ways to structure a page table for mapping 20-bit
page numbers is as a simple array with $2^{20}$ entries. With this
configuration, each virtual page has an entry, and the value in that
entry is the corresponding physical page number, as seen in
[\[fig:vm:fig11\]](#fig:vm:fig11){reference-type="autoref"
reference="fig:vm:fig11"}. This single-level table is located in
physical memory, and the MMU is given a pointer to this table, which is
stored in an MMU register. (On Intel-compatible CPUs, the page table
pointer is Control Register 3, or CR3.) This is shown in
[\[fig:vm:fig11\]](#fig:vm:fig11){reference-type="autoref"
reference="fig:vm:fig11"}, where we see the first two entries in a
$2^{20}$ or 1048576-entry mapping table. In addition to the translated
page number, each entry contains a *P* bit to indicate whether or not
the entry is "present," i.e., valid. Unlike in C or Java we can't use a
special null pointer, because 0 is a perfectly valid page number[^3].

![Single-level 32-bit page
table](../images/pb-figures/mm/virt-mem-pic11.png){#fig:vm:fig11 width="85%"}

In [\[lst:map:pcode\]](#lst:map:pcode){reference-type="autoref"
reference="lst:map:pcode"} we see pseudo-code for the translation
algorithm implemented in an MMU using a single-level table; VA and PA
stand for virtual and physical addresses, and VPN and PPN are the
virtual and physical page numbers.

``` {#lst:map:pcode float="" xleftmargin="12pt" framexleftmargin="12pt" caption="Address translation pseudo-code for single-level page table." label="lst:map:pcode"}
PA = translate(VA):
            VPN, offset = split[20 bits, 12 bits](VA)
            PTE = physical_read(CR3 + VPN*sizeof(PTE), sizeof(PTE))
            if not PTE.present:
                fault
            return PTE.PPN + offset
```

Note that this means that every memory operation performed by the CPU
now requires two physical memory operations: one to translate the
virtual address, and a second one to perform the actual operation. If
this seems inefficient, it is, and it will get worse. However, in a page
or two we'll discuss the *translation lookaside buffer* or TLB, which
caches these translations to eliminate most of the overhead.

The single-level page table handles the problem of encoding the
virtual-to-physical page map, but causes another: it uses 4 MB of memory
per map. Years ago (e.g. in the mid-80s when the first Intel CPUs using
this paging structure were introduced) this was entirely out of the
question, as a single computer might have a total of 4 MB of memory or
less. Even today, it remains problematic. As an example, when these
notes were first written (2013), the most heavily-used machine in the
CCIS lab (login.ccs.neu.edu) had 4 GB of memory, and when I checked it
had 640 running processes. With 4 MB page tables and one table per
process, this would require 2.5GB of memory just for page tables, or
most of the machine's memory. Worse yet, each table would require a
contiguous 4MB region of memory, running into the same problem of
external fragmentation that paged address translation was supposed to
solve.

### 2-Level Page Table Operation

In [\[fig:vm:pic13\]](#fig:vm:pic13){reference-type="autoref"
reference="fig:vm:pic13"} we see a page table constructed of 3 pages:
physical pages 00000 (the root directory), 00001, and 00003. Two data
pages are mapped: 00002 and 00004. Any entries not shown are assumed to
be null, i.e., the present bit is set to 0. As an example we use this
page table to translate a read from virtual address 0x0040102C.

![2-level Page Table Example](../images/pb-figures/mm/virt-mem-pic13.png){#fig:vm:pic13
width="90%"}

The steps involved in translating this address are:

1\) Split the address into page number and offset

\
![image](../images/pb-figures/mm/virt-mem-pic14.png){width="0.35\\columnwidth"}

2\) Split the page number into top and bottom 10 bits, giving `0x001`
and `0x001`. (in the figure the top row is hex, the middle two rows are
binary, and the bottom is hex again.)

\

![image](../images/pb-figures/mm/virt-mem-pic15.png){width="0.9\\columnwidth"}

3\) Read entry `[001]` from the top-level page directory (physical page
`00000`) (note sizeof(entry) is 4 bytes):\

``` {xleftmargin="12pt" framexleftmargin="12pt"}
address = start [00000000] + index [001] * sizeof(entry)
read 4 bytes from physical address 00000004 (page 00000, offset 004)
result = [p=1, pgnum = 00001]
```

4\) Read entry `[001]` from the page table in physical page `00001`:

``` {xleftmargin="12pt" framexleftmargin="12pt"}
address = 00001000 + 001*4 = 00001004
read 4 bytes from physical address 00001004
:result = [p=1, pgnum = 00002]
```

This means that the translated physical page number is `00002`. The
offset in the original virtual address is `02C`, so combining the two we
get the final physical address, `0000202C`.

#### Review questions

![Reference page table for review
questions](../images/pb-figures/mm/virt-mem-pic16.png){#fig:vm:review1 width="95%"}

::: enumerate
:::

::: gsidebarN
13 A famous computer science quote attributed to David Wheeler is: "All
problems in computer science can be solved by another level of
indirection," to which some add "except the performance problems caused
by indirection." A corollary to this is that most performance problems
can be solved by adding caching. How are these quotes applicable to
paged address translation?
:::

### Page Table Entries

The components of a 32-bit Intel page table entry are shown in
[\[fig:vm:pic17\]](#fig:vm:pic17){reference-type="autoref"
reference="fig:vm:pic17"}; for more information you may wish to refer to
<http://wiki.osdev.org/Paging>.

![32-bit Intel page table entry
(PTE).](../images/pb-figures/mm/virt-mem-pic17.png){#fig:vm:pic17 width="\\textwidth"}

### Page Permissions - P, W, and U bits

Page tables allow different permissions to be applied to memory at a
per-page level of granularity.

**P=0/1** - If the present bit is zero, the entry is ignored entirely by
the MMU, thus preventing any form of access to the corresponding virtual
page.

**W = 0/1** - Write permission. If the W bit is zero, then read accesses
to this page will be allowed, but any attempt to write will cause a
fault. By setting the W bit to zero, pages that should not be modified
(i.e., program instructions) can be protected. Since
correctly-functioning programs in most languages do not change the code
generated by the compiler, any attempt to write to such a page must be a
bug, and stopping the program earlier rather than later may reduce the
amount of damage caused.

**U = 0/1** - User permission. If the U bit is zero, then accesses to
this page will fail unless the CPU is running in supervisor mode.
Typically the OS kernel will "live" in a portion of the same address
space as the current process, but will hide its code and data structures
from access by user processes by setting U=0 on the OS-only mappings.