/
vm.tex
2887 lines (2695 loc) · 148 KB
/
vm.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%% This file is a portion of the source for Revised Edition 1.1 of
%% Operating Systems and Middleware: Supporting Controlled
%% Interaction, Copyright 2011 by Max Hailperin. This work is
%% licensed under the Creative Commons Attribution-ShareAlike 3.0
%% Unported License. To view a copy of this license, visit
%% http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to
%% Creative Commons, 171 Second Street, Suite 300, San Francisco,
%% California, 94105, USA.
\chapter{Virtual Memory}\label{vm-chapter}
\section{Introduction}
\label{vm-intro-section}
In Chapters \ref{synchronization-chapter} and \ref{transactions-chapter}, you have seen that synchronization
(including transactions) can control the interactions between
concurrent threads. For example, synchronization can ensure that only
one thread at a time updates the memory locations holding a shared
data structure. Now you will learn about another
form of control, which can provide each thread with its own private storage, rather than regulating the threads' access to shared
storage.
In this chapter, I will present a mechanism, \vocab{virtual memory},
that can be used to provide threads with private storage, thereby
controlling their interaction. However, virtual memory turns out to
be a very general-purpose abstraction, useful for many goals other
than just giving threads some privacy. Therefore, after using this
introductory section to present the basic concept of virtual memory,
I will devote Section~\ref{vm-uses-section} to surveying the applications of virtual
memory. Only afterward will I turn to the details of mechanisms and
policies; you'll find the related discussions in Sections \ref{vm-reps} and \ref{vm-policies-section}. The
chapter concludes with the standard features: security issues in
Section~\ref{vm-security-section}, then exercises, programming and
exploration projects, and notes.
The essence of virtual memory is to decouple the addresses that
running programs use to identify objects from the addresses that the
memory uses to identify storage locations. The former are known as
\foldvocabes{virtual}{address} and the latter as \foldvocabes{physical}{address}. As background for understanding this distinction, consider first
a highly simplified diagram of a computer system, without virtual
memory, as shown in Figure~\ref{PM-diagram}.
\begin{figure}
\centerline{\includegraphics{hail_f0601}}
\iffalse
\centerline{\begin{graph}(252,72)
\graphnodesize{0}
\graphnodecolour{1}
\grapharrowlength{5}
\rectnode{P}[72,72](36,36)
\rectnode{M}[72,72](216,36)
\squarenode{PA}(72,54)
\squarenode{PD}(72,18)
\squarenode{MA}(180,54)
\squarenode{MD}(180,18)
\nodetext{P}{Processor}
\nodetext{M}{Memory}
\diredge{PA}{MA}
\diredge{MD}{PD}
\diredge{PD}{MD}
\bowtext{PA}{MA}{.06}{Address}
\bowtext{PD}{MD}{.06}{Data}
\end{graph}}
\fi
\caption{In a system without virtual memory, the processor sends addresses directly
to the memory.}
\label{PM-diagram}
\end{figure}
In this system, the processor sends an address to the memory whenever
it wants to store a value into memory or load a value from memory.
The data being loaded or stored is also transferred in the appropriate
direction. Each load operation retrieves the most recent value stored
with the specified address. Even though the processor and
memory are using a common set of addresses to communicate, the role
played by addresses is somewhat different from the perspective of the
processor than from the perspective of the memory, as I will now
explain.
From the perspective of the processor (and the program the processor
is executing), addresses are a way of differentiating stored objects
from one another. If the processor stores more than one value, and
then wishes to retrieve one of those values, it needs to specify which
one should be retrieved. Hence, it uses addresses essentially as
names. Just as an executive might tell a clerk to ``file this under
`widget suppliers'\,'' and then later ask the clerk to ``get me that
document we filed under `widget suppliers','' the processor tells the
memory to store a value with a particular address and then later
loads from that address. Addresses used by executing programs to refer to objects are known as
\foldvocabes{virtual}{address}.
Of course, virtual addresses are not arbitrary names; each virtual address is a
number. The processor may make use of this to give a group of related
objects related names, so that it can easily compute the name of any
object in the group. The simplest example of this kind of grouping of
related objects is an array. All the array elements are stored at
consecutive virtual addresses. That allows the virtual address of any individual
element to be computed from the base virtual address of the array and the
element's position within the array.
From the memory's perspective, addresses are not identifying names for
objects, but rather are spatial locations of storage cells. The
memory uses addresses to determine which cells to steer the data into
or out of. Addresses used by the memory
to specify storage locations are known as
\foldvocabes{physical}{address}. Figure~\ref{scan-6-1} shows the
processor's and memory's views of addresses in a system like
that shown in Figure~\ref{PM-diagram}, where the physical addresses come directly from the virtual addresses, and so are numerically equal.
\begin{figure}
\centerline{\includegraphics{hail_f0602}}
%\centerline{\epsfbox{scan-6-1.eps}}
\caption{In a system without virtual memory, virtual addresses (the
processor's names for objects) equal physical addresses (the
memory's storage locations).}
\label{scan-6-1}
\end{figure}
The difference between the processor's and memory's perspectives becomes apparent when you consider that
the processor may be dividing its time between multiple programs.
As I will explain briefly in Section~\ref{vm-private-storage-section}
and in detail in Chapter~\ref{processes-chapter}, the operating system
represents running programs as \emph{processes}.
Sometimes the processes will each need a private object,
yet the natural name to use will be the same in more than one process.
Figure~\ref{scan-6-2} shows how this necessitates using different
addresses in the processor and the memory. That is, virtual addresses
can no longer be equal to physical addresses.
\begin{figure}
\centerline{\includegraphics{hail_f0603}}
%\centerline{\epsfbox{scan-6-2.eps}}
\caption{When two processes each use the same virtual
addresses as names for their own objects, the virtual addresses
cannot equal the physical addresses, because each process's objects
need to be stored separately.}
\label{scan-6-2}
\end{figure}
To make this work, general-purpose computers are
structured as shown in Figure~\ref{PMM-diagram}.
\begin{figure}
\centerline{\includegraphics{hail_f0604}}
\iffalse
\centerline{\begin{graph}(312,83)
\graphnodesize{0}
\graphnodecolour{1}
\grapharrowlength{5}
\rectnode{P}[72,72](36,36)
\rectnode{M}[72,72](276,36)
\rectnode{MMU}[36,36](156,54)
\squarenode{PA}(72,54)
\squarenode{PD}(72,18)
\squarenode{MA}(240,54)
\squarenode{MD}(240,18)
\squarenode{MMUV}(138,54)
\squarenode{MMUP}(174,54)
\nodetext{P}{Processor}
\nodetext{M}{Memory}
\nodetext{MMU}{MMU}
\diredge{PA}{MMUV}
\diredge{MMUP}{MA}
\diredge{MD}{PD}
\diredge{PD}{MD}
\bowtext{PA}{MMUV}{.22}{\begin{tabular}{c}Virtual\\address\end{tabular}}
\bowtext{MMUP}{MA}{.22}{\begin{tabular}{c}Physical\\address\end{tabular}}
\bowtext{PD}{MD}{.04}{data}
\end{graph}}
\fi
\caption{The memory management unit (MMU) translates the processor's virtual
addresses into the memory's physical addresses.}
\label{PMM-diagram}
\end{figure}
Program execution within the processor works entirely in terms of
virtual addresses. However, when a load or store operation is
executed, the processor sends the virtual address to an intermediary,
the \vocab{memory management unit} (\vocab{MMU}). The MMU translates
the virtual address into a corresponding physical address, which it
sends to the memory.
In Figure~\ref{scan-6-2}, each process uses the virtual address 0 as a
name for its own triangle. This is a simplified model of how more complicated
objects are referenced by real processes.
Consider next a more realistic example of why each process might use the same
virtual addresses for its own objects. Suppose several copies of the
same spreadsheet program are running. Each copy will naturally want
to refer to ``the spreadsheet,'' but it should be a different
spreadsheet object in each process. Even if each process uses a numerical name
(that is, a virtual address), it would be natural for all running instances of
the spreadsheet program to use the same address; after all, they are
running the same code. Yet from the memory's perspective, the
different processes' objects need to be stored separately---hence, at
different physical addresses.
The same need for private names arises, if not quite so strongly, even
if the concurrent processes are running different programs. Although in
principle each application program could use different names (that is,
virtual addresses) from all other programs, this requires a rather unwieldy
amount of coordination.
Even for shared objects, addresses as names behave somewhat
differently from addresses as locations. Suppose two processes are
communicating via a shared bounded buffer; one is the producer, while
the other is the consumer. From the perspective of one process, the
buffer is the ``output channel,'' whereas for the other process, it is
the ``input channel.'' Each process may have its own name for the
object; yet, the memory still needs to store the object in one location.
This holds true as well if the names used by the processes are numerical
virtual addresses.
Thus, once again, virtual addresses and physical
addresses should not be forced to be equal; it should be possible for
two processes to use the same virtual address to refer to different
physical addresses or to use different virtual addresses to refer to the
same physical address.
You have seen that the MMU maps virtual addresses to physical
addresses. However, I have not yet discussed the nature of this mapping. So
far as anything up to this point goes, the mapping could be as
simple as computing each physical address as twice the virtual
address. However, that would not yield the very general mechanism
known as virtual memory. Instead, virtual memory must have the following
additional properties:
\begin{itemize}
\item
The function that maps virtual addresses to physical addresses is
represented by a table, rather than by a computational rule (such as
doubling). That way, the mapping can be much more general.
\item
However, to keep its size manageable, the table does not independently
list a physical address for each virtual address. Instead, the
virtual addresses are grouped together into blocks known as
\vocabs{page}, and the table shows for each page of virtual addresses
the corresponding \vocab{page frame} of physical addresses. I'll explain
this in greater detail in Section~\ref{vm-reps}.
In that same section, I also briefly
consider an alternative, \vocab{segmentation}.
\item
The contents of the table are controlled by the operating system.
This includes both incremental adjustments to the table (for purposes
you will see in Section~\ref{vm-uses-section}) and wholesale changes of the table
when switching between threads. The latter allows each thread to have
its own private virtual address space, in which case, the threads belong to
different processes, as explained in Section~\ref{vm-private-storage-section}.
\item
The table need not contain a physical address translation for every
page of virtual addresses; in effect, some entries can be left blank.
These undefined virtual addresses are illegal for the processor to
use. If the processor generates an illegal address, the MMU
interrupts the processor, transferring control to the operating
system. This interrupt is known as a \vocab{page fault}. This
mechanism serves not only to limit the usable addresses but also to
allow address translations to be inserted into the table only when
needed. By creating address translations in this demand-driven fashion, many applications of virtual memory arrange to move data only when necessary,
thereby improving performance.
\item
As a refinement of the notion of illegal addresses, some entries in
the table may be marked as legal for use, but only in specific ways. Most
commonly, it may be legal to read from some particular page of virtual
addresses but not to write into that page. The main purpose this
serves is to allow trouble-free sharing of memory between processes.
\end{itemize}
In summary, then, virtual memory consists of an
operating system--defined table of mappings from virtual addresses to
physical addresses (at the granularity of pages), with the opportunity
for intervention by the operating system on accesses that the table
shows to be illegal. You should be able to see that this is a very
flexible mechanism. The operating system can switch between multiple
views of the physical memory. Parts of physical memory may be
completely invisible in some views, because no virtual addresses map
to those physical addresses. Other parts may be visible in more than
one view, but appearing at different virtual addresses. Moreover, the
mappings between virtual and physical addresses need not be
established in advance. By marking pages as illegal to access, and
then making them available when an interrupt indicates that they are
first accessed, the operating system can provide mappings on a
demand-driven basis. In Section~\ref{vm-uses-section}, you will see several
uses to which this general mechanism can be put.
\section{Uses for Virtual Memory}\label{vm-uses-section}
This section contains a catalog of uses for virtual memory, one per subsection.
The applications of virtual memory enumerated are all
in everyday use in most general-purpose operating systems. A
comprehensive list would be much longer and would include some
applications that have thus far been limited to research systems or
other esoteric settings.
\subsection{Private Storage}\label{vm-private-storage-section}
The introductory section of this chapter has already explained that each computation
running on a computer may want to have its own private storage,
independent of the other computations that happen to be running on the
same computer. This goal of private storage can be further elaborated
into two subgoals:
\begin{itemize}
\item
Each computation should be able to use whatever virtual addresses it
finds most convenient for its objects, without needing to avoid using
the same address as some other computation.
\item
Each computation's objects should be protected from accidental (or
malicious) access by other computations.
\end{itemize}
Both subgoals---independent allocation and protection---can be
achieved by giving the computations their own virtual memory
mappings. This forms the core of the process concept.
A \vocab{process} is a group of one or more threads with an associated
protection context. I will introduce processes more fully in
Chapter~\ref{processes-chapter}. In
particular, you will learn that the phrase ``protection context'' is
intentionally broad, including such protection features as file access permissions,
which you will study in Chapters \ref{processes-chapter} and \ref{persistence-chapter}. For now, I will focus on
one particularly important part of a process's context: the mapping
of virtual addresses to physical addresses. In other words, for the
purposes of this chapter, a process is a group of threads that share a virtual
address space.
As I will describe in Chapter~\ref{processes-chapter}, the computer hardware and
operating system software collaborate to achieve protection by
preventing any software outside the operating system from updating the
MMU's address mapping. Thus, each process is restricted to accessing
only those physical memory locations that the operating system
has allocated as page frames for that process's pages. Assuming that
the operating system allocates different processes disjoint portions
of physical memory, the processes will have no ability to interfere with one
another. The physical memory areas for the processes need only
be disjoint at each moment in time; the processes can take
turns using the same physical memory.
This protection model, in which processes are given separate virtual
address spaces, is the mainstream approach today; for the purposes of
the present chapter, I will take it for granted. In Chapter~\ref{processes-chapter}, I will also explore alternatives that allow all processes to
share a single address space and yet remain protected from one
another.
\subsection{Controlled Sharing}
\label{controlled-sharing-subsection}
Although the norm is for processes to use disjoint storage, sometimes
the operating system will map a limited portion of memory into more
than one process's address space. This limited sharing may be a way
for the processes to communicate, or it may simply be a way to reduce
memory consumption and the time needed to initialize memory. Regardless of the motivation, the shared physical
memory can occupy a different range of virtual addresses in each
process. (If this flexibility is exercised, the shared memory should
not be used to store pointer-based structures, such as linked lists,
because pointers are represented as virtual addresses.)
The simplest example of memory-conserving sharing occurs when multiple
processes are running the same program. Normally, each process divides
its virtual address space into two regions:
\begin{itemize}
\item
A read-only region holds the machine language instructions of the
program, as well as any read-only data the program contains, such as
the character strings printed for error messages. This region is
conventionally called the \vocab{text} of the program.
\item
A read/write region holds the rest of the process's data. (Many
systems actually use two read/write regions, one for the stack and one
for other data.)
\end{itemize}
All processes running the same program can share the same text. The
operating system maps the text into each process's virtual memory address
space, with the protection bits in the MMU set to enforce read-only
access. That way, the shared text does not accidentally become a
communication channel.
Modern programs make use of large libraries of supporting
code. For example, there is a great deal of code related to graphical
user interfaces that can be shared among quite different programs,
such as a web browser and a spreadsheet. Therefore, operating systems
allow processes to share these libraries with read-only protection,
just as for main programs. Microsoft refers to shared libraries as
\vocabyies{dynamic-link librar} (\vocabs{DLL}).
Figure~\ref{scan-6-3} illustrates how processes can share in read-only
form both program text and the text of DLLs. In this figure,
processes A and B are running program 1, which uses DLLs 1 and 2.
Processes C and D are running program 2, which uses DLLs 1 and 3.
Each process is shown as encompassing the appropriate program text,
DLLs, and writable data area. In other words, each process encompasses
those areas mapped into its virtual address space.
\begin{figure}
\centerline{\includegraphics{hail_f0605}}
%\centerline{\epsfbox{scan-6-3.eps}}
\caption{The address space of a process includes the text of the
program the process is running, the text of any DLLs used by that
program, and a writable storage area for data.}
\label{scan-6-3}
\end{figure}
From the operating system's perspective, the simplest way to support
interprocess communication is to map some physical memory into two
processes' virtual address spaces with full read/write permissions.
Then the processes can communicate freely; each
writes into the shared memory and reads what the other one writes.
Figure~\ref{scan-6-4} illustrates this sharing of a writable
area of memory for communication.
\begin{figure}
\centerline{\includegraphics{hail_f0606}}
%\centerline{\epsfbox{scan-6-4.eps}}
\caption{Two processes can communicate by sharing a writable storage area.}
\label{scan-6-4}
\end{figure}
Simple as this may be for the operating system, it is anything but
simple for the application programmers. They need to include mutexes,
readers-writers locks, or some similar synchronization structure in
the shared memory, and they need to take scrupulous care to use those
locks. Otherwise, the communicating processes will exhibit races,
which are difficult to debug.
Therefore, some operating systems (such as Mac OS~X) use virtual
memory to support a more structured form of communication, known as
\vocab{message passing}, in which
one process writes a message into a block of memory and then asks the
operating system to send the message to the other process. The
receiving process seems to get a copy of the sent message. For small
messages, the operating system may literally copy the message from one
process's memory to the other's. For efficiency, though, large
messages are not actually copied. Instead, the operating system
updates the receiver's virtual memory map to point at the same
physical memory as the sender's message; thus, sender and receiver
both have access to the message, without it being copied. To maintain
the ease of debugging that comes from message passing, the operating
system marks the page as read-only for both the sender and the
receiver. Thus, they cannot engage in any nasty races. Because the
sender composes the message before invoking the operating system, the
read-only protection is not yet in place during message composition
and so does not stand in the way.
As a final refinement to message passing by read-only sharing, systems
such as Mac OS~X offer \vocab{copy on write} (\vocab{COW}). If either
process tries to write into the shared page, the MMU will use an
interrupt to transfer control to the operating system. The operating
system can then make a copy of the page, so that the sender and
receiver now have their own individual copies, which can be
writable. The operating system resumes the process that was trying
to write, allowing it to now succeed. This provides the complete
illusion that the page was copied at the time the message was sent, as
shown in Figure~\ref{scan-6-5}.
\begin{figure}
\centerline{\includegraphics{hail_f0607}}
%\centerline{\epsfbox{scan-6-5.eps}}
\caption{To use copy on write (COW) message passing, process~A
writes a message into part of its private memory (Step~1) and then
asks the operating system to map the memory containing the message into process~B's
address space as well (Step~2). Neither process has permission to write into
the shared area. If either violates this restriction, the operating
system copies the affected page, gives each process write
permission for its own copy, and allows the write operation to
proceed (Step~3). The net effect is the same as if the message were copied
when it was sent, but the copying is avoided if neither process writes into
the shared area.}
\label{scan-6-5}
\end{figure}
The advantage is that if the processes do not write into most message pages, most
of the copying is avoided.
\subsection{Flexible Memory Allocation}
The operating system needs to divide the computer's memory among
the various processes, as well as retain some for its own use.
At first glance, this memory allocation problem doesn't seem too
difficult. If one process needs 8 megabytes (MB) and another needs
10, the operating system could allocate the first 8~MB of
the memory (with the lowest physical addresses) to the first process
and the next 10~MB to the second. However, this kind of
contiguous allocation runs into two difficulties.
The first problem with contiguous allocation is that the amount of
memory that each process requires may grow and shrink as the program
runs. If the first process is immediately followed in memory by the
second process, what happens if the first process needs more space?
The second problem with contiguous allocation is that processes exit,
and new processes (with different sizes) are started. Suppose you have
512~MB of memory available and three processes running, of sizes
128~MB, 256~MB, and 128~MB. Now suppose the first and third processes
terminate, freeing up their 128-MB chunks of memory. Suppose a 256-MB
process now starts running. There is enough memory available, but not
all in one contiguous chunk, as illustrated in Figure~\ref{scan-6-6}.
\begin{figure}
\centerline{\includegraphics{hail_f0608}}
%\centerline{\epsfbox{scan-6-6.eps}}
\caption{Contiguous allocation leads to external fragmentation. In
this example, there is no contiguous 256-MB space available for
process~D, even though the termination of processes A and C has
freed up a total of 256~MB.}
\label{scan-6-6}
\end{figure}
This situation is known as \foldvocab{external}{fragmentation}. I
will discuss external fragmentation more carefully in
Chapter~\ref{persistence-chapter}, because contiguous allocation is
important for disk space. (I will also define the contrasting term,
internal fragmentation, in that same chapter.)
Because all modern general-purpose systems have virtual memory, these
contiguous allocation difficulties are a non-issue for main memory.
The operating system can allocate any available physical page frames
to a process, independent of where they are located in memory.
For example, the conundrum of
Figure~\ref{scan-6-6} could be solved as shown in
Figure~\ref{discontiguous}.
\begin{figure}
\centerline{\includegraphics{hail_f0609}}
%\centerline{\epsfbox{discontiguous.eps}}
\caption{With virtual memory, the physical memory allocated to a
process need not be contiguous, so process~D can be accommodated even
without sufficient memory in any one place.}
\label{discontiguous}
\end{figure}
In a more realistic setting, it would be
surprising for the pattern of physical memory allocation to display
even this degree of contiguity. However, the
virtual addresses can be contiguous even if the physical addresses
are scattered all over the memory.
\subsection{Sparse Address Spaces}
Just as virtual memory provides the operating system with flexibility
in allocating physical memory space, it provides each application
program (or process) with flexibility in allocating virtual address
space. A process can use whatever addresses make sense for its data
structures, even if there are large gaps between them. This provides
flexibility for the compiler and runtime environment, which assign
addresses to the data structures.
Suppose, for example, that a process has three data structures (S1, S2, and
S3) that it needs to store. Each needs to be allocated in a
contiguous range of addresses, and each needs to be able to grow at
its upper end. The picture might look like this, with addresses in megabytes:
\[\begin{tikzpicture}[scale=.03]
\draw (0,0) rectangle (40,20);
\draw (40,0) rectangle (120,20);
\draw (120,0) rectangle (160,20);
\draw (160,0) rectangle (240,20);
\draw (240,0) rectangle (280,20);
\draw (280,0) rectangle (360,20);
\draw (20,10) node{S1};
\draw (80,10) node{free};
\draw (140,10) node{S2};
\draw (200,10) node{free};
\draw (260,10) node{S3};
\draw (320,10) node{free};
\draw (0,-7) node{0};
\draw (40,-7) node{2};
\draw (120,-7) node{6};
\draw (160,-7) node{8};
\draw (240,-7) node{12};
\draw (280,-7) node{14};
\draw (360,-7) node{18};
\end{tikzpicture}\]
\iffalse
\[\begin{graph}(370,32)(-3,-12)
\graphlinecolour{0}
\fillednodesfalse
\rectnode{a}[40,20](20,10)
\rectnode{b}[80,20](80,10)
\rectnode{c}[40,20](140,10)
\rectnode{d}[80,20](200,10)
\rectnode{e}[40,20](260,10)
\rectnode{f}[80,20](320,10)
\autonodetext{a}{S1}
\autonodetext{b}{free}
\autonodetext{c}{S2}
\autonodetext{d}{free}
\autonodetext{e}{S3}
\autonodetext{f}{free}
\freetext(0,-7){0}
\freetext(40,-7){2}
\freetext(120,-7){6}
\freetext(160,-7){8}
\freetext(240,-7){12}
\freetext(280,-7){14}
\freetext(360,-7){18}
\end{graph}\]
\fi
In this example, only one third of the 18-MB address range is actually
occupied. If you wanted to allow each structure to grow more, you would
have to position them further apart and wind up with an even lower
percentage of occupancy. Many real processes span an address range of
several gigabytes without using anywhere near that much storage.
(Typically, this is done to allow one region to grow up from the
bottom of the address space and another to grow down from the top.)
In order to allow processes to use this kind of
\foldvocab{sparse}{address space} without wastefully occupying a
corresponding amount of physical memory, the operating system simply
doesn't provide physical address mappings for virtual addresses in the
gaps.
\subsection{Persistence}
Any general-purpose operating system must provide some way for users
to retain important data even if the system is shut down and
restarted. Most commonly, the data is kept in files, although other
kinds of persistent objects can be used. The persistent objects are
normally stored on disk. For example, as I write this book, I am
storing it in files on disk. That way, I don't have to retype the
whole book every time the computer is rebooted. I will consider
persistence in more detail in Chapter~\ref{persistence-chapter}; for
now, the only question is how it relates to virtual memory.
When a process needs to access a file (or other persistent object),
it can ask the operating system to map the file into its address
space. The operating system doesn't actually have to read the whole
file into memory. Instead, it can do the reading on a demand-driven
basis. Whenever the process accesses a particular page of the file
for the first time, the MMU signals a page fault. The operating
system can respond by reading that page of the file into memory,
updating the mapping information, and resuming the process. (For
efficiency reasons, the operating system might choose to fetch
additional pages at the same time, on the assumption they are likely
to be needed soon. I discuss this possibility in Section~\ref{fetch-policy}.)
If the process writes into any page that
is part of a mapped file, the operating system must remember to write
the page back to disk, in order to achieve persistence. For efficiency, the
operating system should not write back pages that have not been
modified since they were last written back or since they were read in.
This implies the operating system needs to know which pages have been
modified and hence are not up to date on disk. (These are called
\vocab{dirty} pages.)
One way to keep track of dirty pages, using only techniques I have
already discussed, is by initially marking all pages read-only. That
way, the MMU will generate an interrupt on the first attempt to write
into a clean page. The operating system can then make the page
writable, add it to a list of dirty pages, and allow the operation to
continue. When the operating system makes the page clean again, by
writing it to disk, it can again mark the page read-only.
Because keeping track of dirty pages is such a common requirement and
would be rather inefficient using the approach just described, MMUs
generally provide a more direct approach. In this approach, the MMU keeps a
\vocab{dirty bit} for each page. Any write into the page causes the
hardware to set the dirty bit without needing operating system
intervention. The operating system can later read the dirty bits and
reset them. (The Intel Itanium architecture contains a compromise:
the operating system sets the dirty bits, but with some hardware
support. This provides the flexibility of the software approach
without incurring so large a performance cost.)
\subsection{Demand-Driven Program Loading}
One particularly important case in which a file gets mapped into memory is
running a program. Each executable program is ordinarily stored as a
file on disk. Conceptually, running a program consists of reading the
program into memory from disk and then jumping to the first
instruction.
However, many programs are huge and contain parts that may not always
be used. For example, error handling routines will get used only if
the corresponding errors occur. In addition, programs often support more
features and optional modes than any one user will ever need. Thus,
reading in the whole program is quite inefficient.
Even in the rare case that the whole program gets used, an interactive
user might prefer several short pauses for disk access to one long
one. In particular, reading in the whole program initially means that
the program will be slow to start, which is frustrating. By reading
in the program incrementally, the operating system can start it
quickly at the expense of brief pauses during operation. If each of
those pauses is only a few tens of milliseconds in duration and
occurs at the time of a user interaction, each will be below the
threshold of human perception.
In summary, operating system designers have two reasons to use virtual memory
techniques to read in each program on a demand-driven basis: in order to avoid
reading unused portions and in order to quickly start the program's execution. As with more general
persistent storage, each page fault causes the operating system to
read in more of the program.
One result of demand-driven program loading is that application programmers can make their
programs start up more quickly by grouping all the necessary code together
on a few pages. Of
course, laying out the program text is really not a job for the
human application programmer, but for the compiler and linker.
Nonetheless, the programmer may be able to provide some guidance to these tools.
\subsection{Efficient Zero Filling}
For security reasons, as well as for ease of debugging, the operating
system should never let a process read from any memory location that
contains a value left behind by some other process that previously
used the memory. Thus, any memory not occupied by a persistent
object should be cleared out by the operating system before a new process accesses it.
Even this seemingly mundane job---filling a region of memory with
zeros---benefits from virtual memory. The operating system can fill
an arbitrarily large amount of virtual address space with zeros using
only a single zeroed-out page frame of physical memory. All it needs
to do is map all the virtual pages to the same physical page frame
and mark them as read-only.
In itself, this technique of sharing a page frame of zeros
doesn't address the situation where a process writes into one of its
zeroed pages. However, that situation can be handled using a variant of the COW technique
mentioned in Section~\ref{controlled-sharing-subsection}. When the MMU
interrupts the processor due to a write into the read-only page of
zeros, the operating system can update the mapping for that one page
to refer to a separate read/write page frame of zeros and then resume the
process.
If it followed the COW principle literally, the operating system would
copy the read-only page frame of zeros to produce the separate,
writable page frame of zeros. However, the operating system can run faster
by directly writing zeros into the new page frame without needing to
copy them out of the read-only page frame. In fact, there is no
need to do the zero filling only on demand. Instead, the operating
system can keep some spare page frames of zeros around, replenishing
the stock during idle time. That way, when a page fault occurs from
writing into a read-only page of zeros, the operating system can simply
adjust the address map to refer to one of the spare prezeroed page
frames and then make it writable.
When the operating system proactively fills spare page frames with zeros
during idle time, it should bypass the processor's normal cache memory
and write directly into main memory. Otherwise, zero filling can
seriously hurt performance by displacing valuable data from the cache.
\subsection{Substituting Disk Storage for RAM}
\label{disk-for-RAM}
In explaining the application of virtual memory to persistence, I
showed that the operating system can read accessed pages into memory
from disk and can write dirty pages back out to disk. The reason for
doing so is that disk storage has different properties from main
semiconductor memory (RAM). In the case of persistence, the relevant
difference is that disk storage is nonvolatile; that is, it retains its
contents without power. However, disk differs from RAM in other
regards as well. In particular, it is a couple orders of magnitude
cheaper per gigabyte. This motivates another use of virtual memory,
where the goal is to simulate having lots of RAM using less-expensive
disk space. Of course, disk is also five orders of magnitude slower
than RAM, so this approach is not without its pitfalls.
Many processes have long periods when they are not actively running.
For example, on a desktop system, a user may have several applications
in different windows---a word processor, a web browser, a mail reader,
a spreadsheet---but focus attention on only one of them for minutes or
hours at a time, leaving the others idle. Similarly, within a
process, there may be parts that remain inactive. A spreadsheet user
might look at the online help once, and then not again during several
days of spreadsheet use.
This phenomenon of inactivity provides an opportunity to capitalize on
inexpensive disk storage while still retaining most of the
performance of fast semiconductor memory. The computer system needs
to have enough RAM to hold the \vocab{working set}---the active
portions of all active processes. Otherwise, the performance will be
intolerably slow, because of disk accesses made on a routine basis. However,
the computer need not have enough RAM for the entire storage needs of
all the processes: the inactive portions can be shuffled off to disk,
to be paged back in when and if they again become active. This will
incur some delays for disk access when the mix of activity changes,
such as when a user sets the word processor aside and uses a spreadsheet
for the first time in days. However, once the new working set of
active pages is back in RAM, the computer will again be as responsive
as ever.
Much of the history of virtual memory focuses on this one application,
dating back to the invention of virtual memory in the early 1960s.
(At that time, the two memories were magnetic cores and magnetic drum,
rather than semiconductor RAM and magnetic disk.) Even though this
kind of paging to disk has become only one of many roles played by
virtual memory, I will still pay it considerable attention. In particular,
some of the most interesting policy questions arise only for this
application of virtual memory. When the operating system needs to
free up space in overcrowded RAM, it needs to guess which pages are
unlikely to be accessed soon. I will come back to this topic
(so-called replacement policies) after first considering other
questions of mechanism and policy that apply across the full spectrum
of virtual memory applications.
\section{Mechanisms for Virtual Memory}
\label{vm-reps}
Address mapping needs to be flexible, yet efficient. As I mentioned
in Section~\ref{vm-intro-section}, this means that the mapping function is stored in an explicit
table, but at the granularity of pages rather than individual bytes or
words. Many systems today use fixed-size pages, perhaps with a few
exceptions for the operating system itself or hardware access, though
research suggests that more general mixing of page sizes can be
beneficial. (As explained in the notes, Linux has moved in this direction.)
Typical page sizes have grown over the decades, for reasons you can
explore in Exercises \ref{page-size-execercise-1} and \ref{page-size-execercise-2}; today, the most common
is 4 kibibytes (KiB).\footnote{\label{kibibyte}Sizes related to main memory are usually
based on powers of 2, not powers of 10 used
elsewhere (for example, sizes of secondary storage devices). To
avoid confusion, units based on powers of 2 have been standardized
with their own prefixes such as ``kibi'' and ``mebi'' instead of
``kilo'' and ``mega''; see the Wikipedia article on
{\href{https://en.wikipedia.org/wiki/Kibibyte}{Kibibytes}}
for more details.}
Each page of virtual memory and each page frame of physical
memory is this size, and each starts at an address that is a multiple
of the page size. For example, with 4-KiB pages, the first page (or
page frame) has address 0, the next has address 4096, then 8192, and
so forth.
Each page of virtual memory address space maps to an underlying page
frame of physical memory or to none. For example,
Figure~\ref{example-mapping} shows one possible mapping, on a system
with unrealistically few pages and page frames.
\begin{figure}
\centerline{\includegraphics{hail_f0610}}
\iffalse
\centerline{\begin{graph}(100,200)
\graphlinecolour{0}
\grapharrowlength{5}
\fillednodesfalse
\graphnodesize{20}
\squarenode{p0}(20,150)
\squarenode{p1}(20,130)
\squarenode{p2}(20,110)
\squarenode{p3}(20,90)
\squarenode{p4}(20,70)
\squarenode{p5}(20,50)
\squarenode{p6}(20,30)
\squarenode{p7}(20,10)
\squarenode{f0}(100,110)
\squarenode{f1}(100,90)
\squarenode{f2}(100,70)
\squarenode{f3}(100,50)
\textnode{x2}(50,110){X}[\graphlinewidth{0}\graphlinecolour{1}]
\textnode{x3}(50,90){X}[\graphlinewidth{0}\graphlinecolour{1}]
\textnode{x4}(50,70){X}[\graphlinewidth{0}\graphlinecolour{1}]
\textnode{x5}(50,50){X}[\graphlinewidth{0}\graphlinecolour{1}]
\textnode{x7}(50,10){X}[\graphlinewidth{0}\graphlinecolour{1}]
\autonodetext{p0}[w]{0}
\autonodetext{p1}[w]{1}
\autonodetext{p2}[w]{2}
\autonodetext{p3}[w]{3}
\autonodetext{p4}[w]{4}
\autonodetext{p5}[w]{5}
\autonodetext{p6}[w]{6}
\autonodetext{p7}[w]{7}
\autonodetext{f0}[e]{0}
\autonodetext{f1}[e]{1}
\autonodetext{f2}[e]{2}
\autonodetext{f3}[e]{3}
\diredge{p0}{f1}
\diredge{p1}{f0}
\diredge{p2}{x2}
\diredge{p3}{x3}
\diredge{p4}{x4}
\diredge{p5}{x5}
\diredge{p6}{f3}
\diredge{p7}{x7}
\freetext(20,170){Pages}
\freetext(100,130){Page frames}
\end{graph}}
\fi
\caption{In this example mapping of eight pages to four page frames,
page~0 has been allocated page frame~1, page~1 has been allocated
page frame~0, and page~6 has been allocated page frame~3. The Xs
indicate that no page frame is assigned to hold pages 2--5 or
page 7. Page frame~2 is unused.}
\label{example-mapping}
\end{figure}
The numbers next to the
boxes are page numbers and page frame numbers. The starting
addresses are these numbers multiplied by the page size.
At the top of this figure, you can see that page~0 is stored in page
frame~1. If the page size is 4~KiB, this means that virtual address 0
translates to physical address 4096, virtual address 100 translates to
physical address 4196, and so forth. The virtual address of the last
4-byte word in
page~0, 4092, translates to the physical address of the last word in page frame~1,
8188. Up until this point, all physical addresses were found by
adding 4096 to the virtual address. However, the very next virtual
address, 4096, translates to physical address 0, because it starts a
new page, which is mapped differently. Note also that page frame~2 is
currently not holding any page, and that pages 2--5 and page~7 have no
translation available. In
Exercise~\ref{page-6-translation-exercise}, you can gain experience working with this
translation of virtual addresses into physical addresses by
translating the addresses for page~6.
Of course, a realistic computer system will have many more page frames
of physical memory and pages of virtual address space. Often there
are tens or hundreds of thousands of page frames and at least
hundreds of thousands of pages. As a result, operating system
designers need to think carefully about the data structure used to
store the table that maps virtual page numbers to physical page frame
numbers. Sections \ref{linear-page-tables-section} through \ref{hashed-page-tables-section} will be devoted to presenting three
alternative structures that are in current use for page tables:
linear, multilevel, and hashed. (Other alternatives that have fallen
out of favor, or have not yet been deployed, are briefly mentioned in
the end-of-chapter notes.)
Whatever data structure the operating system uses for its page table,
it will need to communicate the mapping information to the hardware's
MMU, which actually performs the mapping. The nature of this
software/hardware interface constrains the page table design and also
provides important context for comparing the performance of
alternative page table structures. Therefore, in
Section~\ref{vm-hw-sw-interface}, I will explain the two
forms the software/hardware interface can take.
Finally, Section~\ref{vm-segmentation-section} provides a brief look at segmentation, which was
historically important both as an alternative to paging and as an
adjunct to it.
\subsection{Software/Hardware Interface}\label{vm-hw-sw-interface}
You have seen that the operating system stores some form of page table
data structure in memory, showing which physical memory page frame (if
any) holds each virtual memory page. Although I will present
several possible page table structures shortly, the most important
design issue applies equally to all of them: the page table should
almost never be used.
Performance considerations explain why such an important data
structure should be nearly useless (in the literal sense). Every
single memory access performed by the processor generates a virtual
address that needs translation to a physical address. Naively, this
would mean that every single memory access from the processor requires
a lookup operation in the page table. Performing that lookup
operation would require at least one more memory access, even if the
page table were represented very efficiently. Thus, the number of
memory accesses would at least double: for each real access, there
would be one page table access. Because memory performance is often the
bottleneck in modern computer systems, this means that virtual memory
might well make programs run half as fast---unless the page table
lookup can be mostly avoided. Luckily, it can.
The virtual addresses accessed by realistic software are not random;
instead, they exhibit both \foldvocab{temporal}{locality} and \foldvocab{spatial}{locality}.
That is, addresses that are accessed once are likely to be accessed
again before long, and nearby addresses are also likely to be accessed
soon. Because a nearby address is likely to be on the same page, both
kinds of locality wind up creating temporal locality when considered
at the level of whole pages. If a page is accessed, chances are good
that the same page will be accessed again soon, whether for the same
address or another.
The MMU takes advantage of this locality by keeping a
quickly accessible copy of a modest number of recently used
virtual-to-physical translations. That is, it stores a limited number
of pairs, each with one page number and the corresponding page frame
number. This collection of pairs is called the \vocab{translation
lookaside buffer} (\vocab{TLB}). Most memory accesses will refer to
page numbers present in the TLB, and so the MMU will be able to
produce the corresponding page frame number without needing to access
the page table. This happy circumstance is known as a \foldvocab{TLB}{hit}; the
less fortunate case, where the TLB does not contain the needed
translation, is a \foldvocab{TLB}{miss}.
The TLB is one of the most performance-critical components of a modern
microprocessor. In order for the system to have a fast clock cycle
time and perform well on small benchmarks, the TLB must be very
quickly accessible. In order for the system's performance not to fall
off sharply on larger workloads, the TLB must be reasonably large
(perhaps hundreds of entries), so that it can still prevent most page
table accesses. Unfortunately, these two goals are in conflict with
one another: chip designers know how to make lookup tables large or
fast, but not both. Coping as well as possible with this dilemma
requires cooperation from the designers of hardware, operating system,
and application software:
\begin{itemize}
\item
The hardware designers ameliorate the problem by including
{\em two} TLBs, one for instruction fetches and one for data loads
and stores. That way, these two categories of memory access don't
need to compete for the same TLB.
\item
The hardware designers may further ameliorate the problem by including
a hierarchy of TLBs, analogous to the cache hierarchy. A small, fast
level-one (L1) TLB makes most accesses fast, while a larger,
slower level-two (L2) TLB ensures that the page table won't need to be
accessed every time the L1 TLB misses. As an example, the AMD Opteron
microprocessor contains 40-entry L1 instruction and data TLBs,
and it also contains 512-entry L2 instruction and data TLBs.
\item
The hardware designers also give the operating system designers some
tools for reducing the demand for TLB entries. For example, if
different TLB entries can provide mappings for pages of varying sizes,
the operating system will be able to map large, contiguously allocated
structures with fewer TLB entries, while still retaining flexible
allocation for the rest of virtual memory.
\item
The operating system designers need to use tools such as variable page
size to reduce TLB entry consumption. At a minimum, even if all
application processes use small pages (4~KiB), the operating system
itself can use larger pages. Similarly, a video frame buffer of many
consecutive megabytes needn't be carved up into 4-KiB chunks. As a
secondary benefit, using larger pages can reduce the size of page tables.
In many cases, smaller page tables are also quicker to access.
\item
More fundamentally, all operating system design decisions need to be
made with an eye to how they will affect TLB pressure, because this is
such a critical performance factor. One obvious example is the normal
page size. Another, less obvious, example is the size of the
scheduler's time slices: switching processes frequently will increase
TLB pressure and thereby hurt performance, even if the TLB doesn't need
to be flushed at every process switch. (I will take up that latter
issue shortly.)
\item
The application programmers also have a role to play. Programs that