-
Notifications
You must be signed in to change notification settings - Fork 34
/
synchronization.tex
2944 lines (2710 loc) · 151 KB
/
synchronization.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%% This file is a portion of the source for Revised Edition 1.1 of
%% Operating Systems and Middleware: Supporting Controlled
%% Interaction, Copyright 2011 by Max Hailperin. This work is
%% licensed under the Creative Commons Attribution-ShareAlike 3.0
%% Unported License. To view a copy of this license, visit
%% http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to
%% Creative Commons, 171 Second Street, Suite 300, San Francisco,
%% California, 94105, USA.
\chapter{Synchronization and Deadlocks}
\label{synchronization-chapter}
\section{Introduction}
In Chapters \ref{threads-chapter} and \ref{scheduling-chapter}, you have seen how an operating system
can support concurrent threads of execution. Now the time has come
to consider how the system supports controlled interaction
between those threads. Because threads running at the same time on
the same computer can inherently interact by reading and writing a
common set of memory locations, the hard part is providing control.
In particular, this chapter will examine control over the
relative timing of execution steps that take place in differing
threads.
Recall that the scheduler is granted considerable authority to
temporarily preempt the execution of a thread and dispatch another
thread. The scheduler may do so in response to unpredictable external
events, such as how long an I/O request takes to complete. Therefore,
the computational steps taken by two (or more) threads will be
interleaved in a quite unpredictable manner, unless the programmer has
taken explicit measures to control the order of events. Those control
measures are known as \vocab{synchronization}.
The usual way for synchronization to control event ordering is by causing
one thread to wait for another.
In Section~\ref{races-section}, I will provide a more detailed case for why
synchronization is needed by describing the problems that can occur
when interacting threads are not properly synchronized. The
uncontrolled interactions are called races. By examining some
typical races, I will illustrate the need for one particular form of
synchronization, mutual exclusion. Mutual exclusion ensures that only
one
thread at a time can operate on a shared data structure or other
resource. Section~\ref{mutexes-and-monitors-section} presents two
closely related ways mutual exclusion can be obtained. They are known as mutexes and monitors.
After covering mutual exclusion, I will turn to other, more general
synchronization challenges and to mechanisms that can address those
challenges. To take one example, you may want to ensure that some
memory locations are read after they have been filled with useful
values, rather than before. I devote Section~\ref{other-synchronization-problems-section} to enumerating
several of the most common synchronization patterns other than mutual
exclusion. Afterward, I devote
Sections \ref{condition-variables-section} and \ref{semaphores-section} to two popular
mechanisms used to handle these situations. One, condition variables,
is an important extension to monitors; the combination of monitors
with condition variables allows many situations to be cleanly handled.
The other, semaphores, is an old favorite because it provides a
single, simple mechanism that in principle suffices for all
synchronization problems. However, semaphores can be hard to
understand and use correctly.
Synchronization solves the problem of races, but it can create a new
problem of its own: deadlock. Recall that synchronization typically
involves making threads wait; for example, in mutual exclusion, a
thread may need to wait its turn in order to enforce the rule of one
at a time. Deadlock results when a cycle of waiting threads forms;
for example, thread A waits for thread B, which happens to be waiting
for thread A, as shown in Figure~\ref{scan-4-2}.
\begin{figure}
\centerline{\includegraphics{hail_f0401}}
%\centerline{\epsfbox{scan-4-2.eps}}
\caption{Deadlock results when threads wait for one another in a
complete cycle. In this simple example, thread A is waiting for
thread B, which is waiting for thread A.}
\label{scan-4-2}
\end{figure}
Because this pathology results from waiting, I
will address it and three of the most practical cures in Section~\ref{deadlock-section}, after
completing the study of waiting-based means of synchronization.
Waiting also interacts with scheduling (the topic of
Chapter~\ref{scheduling-chapter}) in some interesting ways. In particular, unless special
precautions are taken, synchronization mechanisms can subvert priority
scheduling, allowing a low-priority thread to run while a
high-priority thread waits. Therefore, in
Section~\ref{synchronization-and-scheduling-section}, I will briefly consider the
interactions between synchronization and scheduling, as well as what can be
done to tame them.
Although sections \ref{deadlock-section} and \ref{synchronization-and-scheduling-section}
address the problems of deadlock and unwanted scheduling interactions, the root
cause of these problems is also worth considering.
The underlying problem is that
one thread can block the progress of another thread, which is undesirable
even in the absence of such dramatic symptoms as deadlock. After all, a blocked thread can't take
advantage of available processing power to produce useful results.
Alternative, nonblocking synchronization techniques have become increasingly important
as the number of processor cores in a typical computer system has grown.
Section~\ref{nonblocking-synchronization-section} briefly addresses this topic,
showing how data structures can safely support concurrent threads without ever blocking progress.
Finally, I conclude the chapter in
Section~\ref{synchronization-and-security-section} by looking at security issues related
to synchronization. In particular, I show how subtle synchronization
bugs, which may nearly never cause a malfunction unless provoked, can
be exploited by an attacker in order to circumvent the system's normal
security policies.
After this concluding section, I provide exercises, programming and exploration projects, and notes.
Despite the wide range of synchronization-related topics I cover in
this chapter, there are two I leave for later chapters.
Atomic transactions are a particularly sophisticated and important
synchronization pattern, commonly encountered in middleware;
therefore, I devote Chapter~\ref{transactions-chapter} entirely to them. Also,
explicitly passing a message between threads (for example, via a network)
provides synchronization as well as communication, because the message
cannot be received until after it has been transmitted. Despite this
synchronization role, I chose to address various forms of message
passing in Chapters \ref{networking-chapter} and
\ref{distmid-chapter}, the chapters related to communication.
\section{Races and the Need for Mutual Exclusion}\label{races-section}
When two or more threads operate on a shared data structure, some very
strange malfunctions can occur if the timing of the threads turns out
precisely so that they
interfere with one another. For example, consider the following
code that might appear in a \verb|sellTicket| procedure (for an event
without assigned seats):
\begin{verbatim}
if(seatsRemaining > 0){
dispenseTicket();
seatsRemaining = seatsRemaining - 1;
} else
displaySorrySoldOut();
\end{verbatim}
On the surface, this code looks like it should never sell more tickets
than seats are available. However, what happens if multiple threads
(perhaps controlling different points of sale) are executing the same
code? Most of the time, all will be well. Even if two people try to
buy tickets at what humans perceive as the same moment, on the time scale of
the computer, probably one will happen first and the other second, as
shown in Figure~\ref{ticket-ok-figure}.
\begin{figure}
\centerline{\begin{tabular}{|l|l|}
\hline\multicolumn{1}{|c|}{\bf Thread A} & \multicolumn{1}{c|}{\bf Thread B}\\\hline
\tt if(seatsRemaining > 0) & \\
\tt dispenseTicket(); & \\
\tt seatsRemaining=seatsRemaining-1; & \\
&\tt if(seatsRemaining > 0)...else \\
&\tt displaySorrySoldOut(); \\\hline
\end{tabular}}
\caption{Even if two humans think they are trying to buy the last
ticket at the same time, chances are good that one's thread (thread A
in this example) will run before the other's. Thread B will then
correctly discover that no seats remain.}\label{ticket-ok-figure}
\end{figure}
In that case, all is well. However, once in a blue moon, the timing may be
exactly wrong, and the following scenario results, as shown in
Figure~\ref{ticket-race-figure}.
\begin{figure}
\centerline{\begin{tabular}{|l|l|}
\hline\multicolumn{1}{|c|}{\bf Thread A} & \multicolumn{1}{c|}{\bf Thread B}\\\hline
\tt if(seatsRemaining > 0) & \\
&\tt if(seatsRemaining > 0) \\
\tt dispenseTicket(); & \\
&\tt dispenseTicket(); \\
\tt seatsRemaining=seatsRemaining-1; & \\
&\tt seatsRemaining=seatsRemaining-1; \\\hline
\end{tabular}}
\caption{If threads A and B are interleaved, both can act as though
there were a ticket left to sell, even though only one really exists
for the two of them.}\label{ticket-race-figure}
\end{figure}
\begin{enumerate}
\item
Thread A checks \verb|seatsRemaining > 0|. Because
\verb|seatsRemaining| is 1, the test succeeds. Thread A will take the
first branch of the \verb|if|.
\item
Thread B checks \verb|seatsRemaining > 0|. Because
\verb|seatsRemaining| is 1, the test succeeds. Thread B will take the
first branch of the \verb|if|.
\item
Thread A dispenses a ticket and decreases \verb|seatsRemaining| to 0.
\item
Thread B dispenses a ticket and decreases \verb|seatsRemaining| to $-1$.
\item
One customer winds up sitting on the lap of another.
\end{enumerate}
Of course, there are plenty of other equally unlikely scenarios that
result in misbehavior. In Exercise~\ref{race-exercise}, you can come
up with a scenario
where, starting with \verb|seatsRemaining| being 2, two threads each
dispense a ticket, but \verb|seatsRemaining| is left as 1 rather than
0.
These scenarios are examples of \vocabs{race}. In a race,
two threads use the same data structure, without any mechanism to
ensure only one thread uses the data structure at a time. If either thread precedes the
other, all is well. However, if the two are interleaved, the program
malfunctions. Generally, the malfunction can be expressed as some
invariant property being violated. In the ticket-sales example, the
invariant is that the value of \verb|seatsRemaining|
should be nonnegative and when added to
the number
of tickets dispensed should equal the total number of seats.
(This invariant assumes that \verb|seatsRemaining| was initialized to
the total number of seats.)
When an invariant involves more than one variable, a race can result
even if one of the threads only reads the variables, without modifying
them. For example, suppose there are two variables, one recording how
many tickets have been sold and the other recording the amount of
cash in the money drawer. There should be an invariant relation between
these: the number of tickets sold times the price per ticket, plus the
amount of starting cash, should equal the cash on hand. Suppose one
thread is in the midst of selling a ticket. It has updated one of the
variables, but not yet the other. If at exactly that moment another
thread chooses to run an audit function, which inspects the values of
the two variables, it will find them in an inconsistent state.
That inconsistency may not sound so terrible, but what if a similar
inconsistency occurred in a
medical setting, and one variable recorded the drug to administer,
while the other recorded the dose? Can you see how dangerous an
inconsistency could be? Something very much like that happened in a
radiation therapy machine, the \index{Therac-25}Therac-25, with occasionally lethal
consequences. (Worse, some patients suffered terrible but not
immediately lethal injuries and lingered for some time in
excruciating, intractable pain.)
From the ticket-sales example, you can see that having two
threads carrying out operations on the same data structure is
harmless, as long as there never are two operations under way at the
same time. In other words, the interleaving of the threads' execution
needs to be at the granularity of complete operations, such as selling
a ticket or auditing the cash drawer. When interleaving the operations,
it's OK if one thread performs
several complete operations in a row; the threads don't need to alternate back and forth.
However, each sale or audit should be completed without interruption.
The reason why any interleaving of complete operations is safe is
because each is designed to both rely on the invariant and preserve
it. Provided that you initially construct the data structure in a state
where the invariant holds, any sequence whatsoever of
invariant-preserving operations will leave the invariant intact.
What is needed, then, is a synchronization mechanism that allows one
thread to obtain private access to the data structure before it begins work, thereby
excluding all other threads from operating on that structure. The
conventional metaphor is to say that the thread \vocabs{lock} the data structure.
When
the thread that locked the structure is done, it unlocks, allowing
another thread to take its turn. Because any thread in the midst of
one of the operations temporarily excludes all the others,
this arrangement is called \foldvocab{mutual}{exclusion}. Mutual exclusion establishes the
granularity at which threads may be interleaved by the scheduler.
\section{Mutexes and Monitors}\label{mutexes-and-monitors-section}
As you saw in Section~\ref{races-section}, threads that share data structures
need to have a mechanism for obtaining exclusive access to those
structures. A programmer can arrange for this exclusive access by
creating a special lock object associated with each shared data
structure. The lock can only be locked by one thread at a time.
A thread that has locked the lock is said to \vocabindex{hold}{hold (a lock)} the lock, even though that vocabulary has no obvious connection to the metaphor of real-world locks.
If
the threads operate on (or even examine) the data structure
only when holding the corresponding lock, this discipline will prevent
races.
To
support this form of race prevention, operating systems and middleware generally
provide mutual exclusion locks. Because the name \foldvocab{mutual exclusion}{lock}
is rather ungainly, something shorter is generally used. Some
programmers simply talk of \vocabs{lock}, but that can lead to confusion
because other synchronization mechanisms are also called locks. (For
example, I introduce readers/writers locks in Section~\ref{rw-section}.) Therefore, the
name \vocab{mutex} has become popular as a shortened form of \foldvocab{mutual
exclusion}{lock}. In particular, the POSIX standard refers to
mutexes. Therefore, I will use that name in this book as well.
Section~\ref{mutex-api-section} presents the POSIX application
programming interface (API) for
mutexes. Section~\ref{monitors-section} presents an alternative, more
structured interface to mutexes, known as monitors. Finally,
Section~\ref{mutex-mechanisms-section} shows what lies behind both
of those interfaces by explaining the mechanisms typically used to
implement mutexes.
\subsection{The Mutex Application Programming Interface}\label{mutex-api-section}
A mutex can be in either of two states: locked (that is, held by some
thread), or unlocked (that is, not held by any thread). Any
implementation of mutexes must have some way to create a mutex and
initialize its state. Conventionally, mutexes are initialized to the
unlocked state. As a minimum, there must be two other operations: one
to lock a mutex, and one to unlock it.
The lock and unlock operations are much less symmetrical than they
sound. The unlock operation can be applied only when the mutex is
locked; this operation does its job
and returns, without making the calling thread wait. The lock
operation, on the other hand, can be invoked even when the lock is
already locked. For this reason, the calling thread may need to wait, as shown in
Figure~\ref{scan-4-3}.
\begin{figure}
\centerline{\includegraphics{hail_f0404}}
%\centerline{\epsfbox{scan-4-3.eps}}
\caption{Locking an unlocked mutex and unlocking a locked one change
the mutex's state. However, a thread can also try to lock an
already-locked mutex. In this case, the thread waits and acquires
the mutex lock when another thread unlocks it.}
\label{scan-4-3}
\end{figure}
When a thread invokes the lock operation on a mutex, and that mutex is
already in the locked state, the thread is made to wait until another
thread has unlocked the mutex. At that point, the thread that wanted
to lock the mutex can resume execution, find the mutex unlocked, lock
it, and proceed.
If more than one thread is trying to lock the same
mutex, only one of them will switch the mutex from unlocked to locked;
that thread will be allowed to proceed. The others will wait until
the mutex is again unlocked. This behavior of the lock operation
provides mutual exclusion. For a thread to proceed past the point
where it invokes the lock operation, it must be the single thread that
succeeds in switching the mutex from unlocked to locked. Until the
thread unlocks the mutex, one can say it \vocabindex{holds}{hold (a lock)} the mutex (that is, has
exclusive rights) and can safely operate on the associated data
structure in a race-free fashion.
This freedom from races exists regardless which one of the waiting
threads is chosen as the one to lock the mutex. However, the question of
which thread goes first may matter for other reasons; I return to it
in Section~\ref{convoy-section}.
Besides the basic operations to initialize a mutex, lock it, and
unlock it, there may be other, less essential, operations as well.
For example, there may be one to test whether a mutex is immediately
lockable without waiting, and then to lock it if it is so. For systems that
rely on manual reclamation of memory, there may also be an operation
to destroy a mutex when it will no longer be used.
Individual operating systems and middleware systems provide mutex APIs
that fit the general pattern I described, with varying details.
In order to see one concrete example of an API, I will present the
mutex operations included in the POSIX standard. Because this is a
standard, many different operating systems provide this API, as well
as perhaps other system-specific APIs.
In the POSIX API, you can declare \verb|my_mutex| to be a mutex and
initialize it with the default attributes as follows:
\index{pthread_mutex_t@\verb"|pthread_mutex_t"|}%
\index{pthread_mutex_init@\verb"|pthread_mutex_init"|}%
\begin{verbatim}
pthread_mutex_t my_mutex;
pthread_mutex_init(&my_mutex, 0);
\end{verbatim}
A thread that wants to lock the mutex, operate on the associated data
structure, and then unlock the mutex would do the following (perhaps
with some error-checking added):
\index{pthread_mutex_lock@\verb"|pthread_mutex_lock"|}%
\index{pthread_mutex_unlock@\verb"|pthread_mutex_unlock"|}%
\begin{verbatim}
pthread_mutex_lock(&my_mutex);
// operate on the protected data structure
pthread_mutex_unlock(&my_mutex);
\end{verbatim}
As an example, Figure~\ref{tickets-pthreads-code} shows the key
procedures from the ticket sales example, written in C using the POSIX API.
\begin{figure}
\begin{verbatim}
void sellTicket(){
pthread_mutex_lock(&my_mutex);
if(seatsRemaining > 0){
dispenseTicket();
seatsRemaining = seatsRemaining - 1;
cashOnHand = cashOnHand + PRICE;
} else
displaySorrySoldOut();
pthread_mutex_unlock(&my_mutex);
}
void audit(){
pthread_mutex_lock(&my_mutex);
int revenue = (TOTAL_SEATS - seatsRemaining) * PRICE;
if(cashOnHand != revenue + STARTING_CASH){
printf("Cash fails to match.\n");
exit(1);
}
pthread_mutex_unlock(&my_mutex);
}
\end{verbatim}
\caption{Each of these procedures begins by locking {\tt my\_mutex} and
ends by unlocking it. Therefore, they will never race, even if
called from concurrent threads. Additional code not shown here
(perhaps in the main procedure) would first initialize {\tt my\_mutex}.}
\label{tickets-pthreads-code}
\end{figure}
When all threads are done using the mutex (leaving it in the unlocked
state), the programmer is expected to destroy it, so that any
underlying memory can be reclaimed. This is done by
executing the following procedure call:
\index{pthread_mutex_destroy@\verb"|pthread_mutex_destroy"|}%
\begin{verbatim}
pthread_mutex_destroy(&my_mutex);
\end{verbatim}
POSIX also provides a couple variants on \verb|pthread_mutex_lock|
that are useful under particular circumstances. One,
\index{pthread_mutex_trylock@\verb"|pthread_mutex_trylock"|}\verb|pthread_mutex_trylock|, differs in that it will never wait to
acquire a mutex. Instead, it returns an error code if unable to
immediately acquire the lock. The other,
\index{pthread_mutex_timedlock@\verb"|pthread_mutex_timedlock"|}\verb|pthread_mutex_timedlock|, allows the programmer to specify a
maximum amount of time to wait. If the mutex cannot be acquired
within that time, \verb|pthread_mutex_timedlock| returns an error
code.
Beyond their wide availability, another reason why POSIX mutexes are
worth studying is that the programmer is allowed to choose among several
variants, which provide different answers to two questions about exceptional
circumstances.
Other mutex APIs might include one specific answer to these questions,
rather than exposing the full range of possibilities. The questions
at issue are as follows:
\begin{itemize}
\item
What happens if a thread tries to unlock a mutex that is unlocked, or that was locked by a
different thread?
\item
What happens if a thread tries to lock a mutex that it already holds?
(Note that if the thread were to wait for itself to unlock the mutex,
this situation would constitute the simplest possible case of a deadlock. The cycle of
waiting threads would consist of a single thread, waiting for itself.)
\end{itemize}
The POSIX standard allows the programmer to select from four different
types of mutexes, each of which answers these two questions in a
different way:
\begin{description}
\item[\texttt{PTHREAD\_MUTEX\_DEFAULT}]
If a thread tries to
lock a mutex it already holds or unlock one it doesn't hold, all bets are off as to what will happen.
The programmer has a responsibility never to make either of these attempts.
Different POSIX-compliant systems may behave differently.
\item[\texttt{PTHREAD\_MUTEX\_ERROR\_CHECK}]
If a thread tries to lock a mutex that it already holds, or unlock a
mutex that it doesn't hold, the operation returns an error code.
\item[\texttt{PTHREAD\_MUTEX\_NORMAL}]
If a thread tries to lock a mutex that it already holds, it goes into
a deadlock situation, waiting for itself to unlock the mutex, just
as it would wait for any other thread. If a thread tries to unlock
a mutex that it doesn't hold, all bets are off; each POSIX-compliant
system is free to respond however it likes.
\item[\texttt{PTHREAD\_MUTEX\_RECURSIVE}]
If a thread tries to unlock a mutex that it doesn't hold, the
operation returns an error code. If a thread tries to lock a mutex
that it already holds, the system simply increments a count of how
many times the thread has locked the mutex and allows the thread to
proceed. When the thread invokes the unlock operation, the counter is
decremented, and only when it reaches 0 is the mutex really unlocked.
\end{description}
If you want to provoke a debate among experts on concurrent
programming, ask their opinion of \vocab{recursive locking}, that is,
of the mutex behavior specified by the POSIX option \texttt{PTHREAD\_MUTEX\_RECURSIVE}.
On the one hand, recursive locking gets rid of one especially silly
class of deadlocks, in which a thread waits for a mutex it already holds.
On the other hand, a programmer with recursive locking available may
not follow as disciplined a development approach. In particular, the programmer
may not keep track of exactly which locks are held at each point
in the program's execution.
\subsection{Monitors: A More Structured Interface to Mutexes}\label{monitors-section}
Object-oriented programming involves packaging together data
structures with the procedures that operate on them. In this context,
mutexes can be used in a very rigidly structured way:
\begin{itemize}
\item
All state variables within an object should be kept private, accessible only to
code associated with that object.
\item
Every object (that might be shared between threads) should contain a
mutex as an additional field, beyond those fields containing the object's
state.
\item
Every method of an object (except private ones used internally) should
start by locking that object's mutex and end by unlocking the mutex
immediately before returning.
\end{itemize}
If these three rules are followed, then it will be impossible for two
threads to race on the state of an object, because all access to the
object's state will be protected by the object's mutex.
Programmers can follow these rules manually, or the programming
language can provide automatic support for the rules. Automation
ensures that the rules are consistently followed. It also means the
source program will not be cluttered with mutex clich{\'e}s, and hence
will be more readable.
An object that automatically follows the mutex rules is called a
\vocab{monitor}. Monitors are found in some programming languages,
such as Concurrent Pascal, that
have been used in research settings without becoming commercially
popular. In these languages, using monitors can be as simple as using
the keyword \verb|monitor| at the
beginning of a declaration for a class of objects. All public methods
will then automatically lock and unlock an automatically supplied
mutex. (Monitor languages also support another synchronization
feature, condition variables, which I discuss in Section~\ref{condition-variables-section}.)
Although true monitors have not become popular, the Java programming
language provides a close approximation. To achieve monitor-style
synchronization, the Java programmer needs to exercise some self-discipline, but less than with raw mutexes. More importantly, the
resulting Java program is essentially as uncluttered as a true monitor
program would be; all that is added is one keyword,
\verb|synchronized|, at the declaration of each nonprivate method.
Each Java object automatically has a mutex associated with it, of the
recursively lockable kind. The programmer can choose to lock any
object's mutex for the duration of any block of code by using a
\index{synchronized statement@\verb"|synchronized"| statement}\verb|synchronized| statement:
\begin{verbatim}
synchronized(someObject){
// the code to do while holding someObject's mutex
}
\end{verbatim}
Note that in this case, the code need not be operating on the state of
\verb|someObject|; nor does this code need to be in a method
associated with that object. In other words, the \verb|synchronized|
statement is essentially as flexible as using raw mutexes, with the
one key advantage that locking and unlocking are automatically paired.
This advantage is important, because it eliminates one big class of
programming errors. Programmers often forget to unlock mutexes under
exceptional circumstances. For example, a procedure may lock a mutex
at the beginning and unlock it at the end. However, in between may
come an \verb|if| statement that can terminate the procedure with
the mutex still locked.
Although the \verb|synchronized|
statement is flexible, typical Java programs don't use it much. Instead, programmers add the keyword
\index{synchronized method@\verb"|synchronized"| method}\verb|synchronized| to the declaration of public methods. For
example, a \verb|TicketVendor| class might follow the outline in
Figure~\ref{TicketVendor}.
\begin{figure}
\begin{verbatim}
public class TicketVendor {
private int seatsRemaining, cashOnHand;
private static final int PRICE = 1000;
public synchronized void sellTicket(){
if(seatsRemaining > 0){
dispenseTicket();
seatsRemaining = seatsRemaining - 1;
cashOnHand = cashOnHand + PRICE;
} else
displaySorrySoldOut();
}
public synchronized void audit(){
// check seatsRemaining, cashOnHand
}
private void dispenseTicket(){
// ...
}
private void displaySorrySoldOut(){
// ...
}
public TicketVendor(){
// ...
}
}
\end{verbatim}
\caption{Outline of a monitor-style class in Java}
\label{TicketVendor}
\end{figure}
Marking a method \verb|synchronized| is
equivalent to wrapping the entire body of that method in
a \verb|synchronized| statement:
\begin{verbatim}
synchronized(this){
// the body
}
\end{verbatim}
In other words, a synchronized method on an object will be executed
while holding that object's mutex. For example, the \verb|sellTicket|
method is synchronized, so if two different threads invoke it, one
will be served while the other waits its turn, because the
\verb|sellTicket| method is implicitly locking a mutex upon entry and
unlocking it upon return, just as was done explicitly in the POSIX
version of Figure~\ref{tickets-pthreads-code}. Similarly, a thread
executing the \verb|audit| method will need to wait until no ticket
sale is in progress, because this method is also marked synchronized,
and so acquires the same mutex.
In order to program in a monitor style in Java, you need to be
disciplined in your use of the \verb|private| and \verb|public|
keywords (including making all state \verb|private|), and you need to
mark all the public methods as \verb|synchronized|.
\subsection{Underlying Mechanisms for Mutexes}\label{mutex-mechanisms-section}
In this subsection, I will show how mutexes typically operate behind
the scenes. I start with a version that functions correctly, but is
inefficient, and then show how to build a more efficient version on
top of it, and then a yet more efficient version on top of that. Keep
in mind that I will not throw away my first two versions: they play
a critical role in the final version. For simplicity, all three
versions will be of the \texttt{PTHREAD\_MUTEX\_NORMAL} kind;
a deadlock results if a thread tries to lock a mutex it already
holds. In Exercise~\ref{recursive-mutex-exercise},
you can figure out the changes needed for \texttt{PTHREAD\_MUTEX\_RECURSIVE}.
The three versions of mutex are called the basic spinlock,
cache-conscious spinlock, and queuing mutex, in increasing order of
sophistication. The meaning of these names will become apparent as I
explain the functioning of each kind of mutex. I will start with the
basic spinlock.
All modern processor architectures have at least one instruction that
can be used to both change the contents of a memory location and
obtain information about the previous contents of the location.
Crucially, these instructions are executed \vocabindex{atomically}{atomic}, that is, as
an indivisible unit that cannot be broken up by the arrival of an
interrupt nor interleaved with the execution of an instruction on
another processor. The details of these instructions vary; for
concreteness, I will use the \vocab{exchange} operation, which atomically
swaps the contents of a register with the contents of a memory
location.
Suppose I represent a basic spinlock as a memory location that contains 1 if
the mutex is unlocked and 0 if the mutex is locked. The unlock
operation can be trivial: to unlock a mutex, just store 1 into it.
The lock operation is a bit trickier and uses the atomic exchange
operation; I can express it in pseudocode, as shown in
Figure~\ref{basic-spinlock-lock}.
\begin{figure}
\begin{verbatim}
to lock mutex:
let temp = 0
repeat
atomically exchange temp and mutex
until temp = 1
\end{verbatim}
\caption{The basic spinlock version of a mutex is a memory location
storing 1 for unlocked and 0 for locked. Locking the mutex consists of
repeatedly exchanging a register containing 0 with the memory
location until the location is changed from 1 to 0.}
\label{basic-spinlock-lock}
\end{figure}
The key idea here is to keep looping until the thread succeeds in changing the
mutex from 1 to 0. So long as some other thread holds the
lock, the thread keeps swapping one 0 with another 0, which does no harm.
This process is illustrated in Figure~\ref{scan-4-4}.
\begin{figure}
\centerline{\includegraphics{hail_f0408}}
%\centerline{\epsfbox{scan-4-4.eps}}
\caption{Unlocking a basic spinlock consists of storing a 1 into it.
Locking it consists of storing a 0 into it using an atomic exchange
instruction. The exchange instruction allows the locking thread to verify that the
value in memory really was changed from 1 to 0. If not, the thread
repeats the attempt.}
\label{scan-4-4}
\end{figure}
To understand the motivation behind the cache-conscious spinlock, you need to
know a little about \foldindex{cache}{coherence}cache coherence
protocols in \index{multiprocessor system}multiprocessor
systems. Copies of a given block of memory can reside in several
different processors' caches, as long as the processors only read from the
memory locations. As soon as one processor wants to write into the
cache block, however, some communication between the caches is
necessary so that other processors don't read out-of-date
values. Most typically, the cache where the writing occurs invalidates all the other caches' copies so
that it has exclusive ownership. If one of the other processors now
wants to write, the block needs to be flushed out of the first cache
and loaded exclusively into the second. If the two processors keep
alternately writing into the same block, there will be continual
traffic on the memory interconnect as the cache block is transferred
back and forth between the two caches.
This is exactly what will happen with the basic spinlock version of mutex
locking if two threads (on two processors) are both waiting for the
same lock. The atomic exchange instructions on the two processors will
both be writing into the cache block containing the spinlock.
Contention for a mutex may not
happen often. When it does, however, the performance will be
sufficiently terrible to motivate an improvement. Cache-conscious spinlocks
will use the same simple approach as basic spinlocks when there is no
contention, but will get rid of the cache coherence traffic while waiting
for a contended mutex.
In order to allow multiple processors to wait for a lock without
generating traffic outside their individual caches, they must be
waiting while using only reads of the mutex. When they see the mutex become
unlocked, they then need to try grabbing it with an atomic
exchange. This approach leads to the pseudocode shown in Figure~\ref{cache-conscious-lock}.
\begin{figure}
\begin{verbatim}
to lock mutex:
let temp = 0
repeat
atomically exchange temp and mutex
if temp = 0 then
while mutex = 0
do nothing
until temp = 1
\end{verbatim}
\caption{Cache-conscious spinlocks are represented the same way as
basic spinlocks, using a single memory location. However, the lock
operation now uses ordinary read instructions in place of most of
the atomic exchanges while waiting for the mutex to be unlocked.}
\label{cache-conscious-lock}
\end{figure}
Notice that in the common case where the mutex can be acquired
immediately, this version acts just like the original. Only if the
attempt to acquire the mutex fails is anything done differently. Even
then, the mutex will eventually be acquired the same way as before.
The two versions of mutexes that I have presented thus far share one
key property, which explains why both are called spinlocks.
They both engage in \foldindex{busy}{waiting}busy waiting if the mutex
is not immediately available. Recall from my discussion of scheduling
that busy waiting means waiting by continually executing instructions
that check for the awaited event. A mutex that uses busy waiting is
called a \vocab{spinlock}. Even fancier versions of spinlocks exist,
as described in the end-of-chapter notes.
The alternative to busy waiting is to notify the operating system that
the thread needs to wait. The operating system can then change the
thread's state to waiting and move it to a
wait queue, where it is not eligible for time on the processor.
Instead, the scheduler will use the processor to run other threads.
When the mutex is unlocked, the waiting thread can be made runnable
again. Because this form of mutex makes use of a wait queue, it is
called a queuing mutex.
Spinlocks are inefficient, for the same reason as any busy waiting is inefficient.
The thread does not make any more headway, no matter how many times it
spins around its loop. Therefore, using the processor for a different
thread would benefit that other thread without harming the waiting
one.
However, there is one flaw in this argument. There is some overhead
cost for notifying the operating system of the desire to wait,
changing the thread's state, and doing a context switch, with the
attendant loss of cache locality. Thus, in a situation where the
spinlock needs to spin only briefly before finding the mutex unlocked,
the thread might actually waste less time busy waiting than it would waste
getting out of other threads' ways. The relative efficiency of
spinlocks and queuing mutexes
depends on how long the thread needs to wait before the
mutex becomes available.
For this reason, spinlocks are appropriate to use for mutexes that are
held only very briefly, and hence should be quickly acquirable.
As an example, the Linux kernel uses spinlocks to protect many of its
internal data structures during the brief operations on them. For
example, I mentioned that the scheduler keeps the runnable threads in
a run queue. Whenever the scheduler wants to
insert a thread into this data structure, or otherwise operate on
it, it locks a spinlock, does the brief operation, and then
unlocks the spinlock.
Queuing mutexes are still needed for those cases where a
thread might hold a mutex a long time---long enough that other
contenders shouldn't busy wait. These mutexes will be more complex.
Rather than being stored in a single memory location (as with
spinlocks), each mutex will have three components:
\begin{itemize}
\item
A memory location used to record the mutex's state, 1 for unlocked or
0 for locked.
\item
A list of threads waiting to acquire the mutex. This list is what
allows the scheduler to place the threads in a waiting state, instead
of busy waiting. Using the terminology of
Chapter~\ref{scheduling-chapter}, this list is a wait queue.
\item
A cache-conscious spinlock, used to protect against races in
operations on the mutex itself.
\end{itemize}
In my pseudocode, I will refer to these three components as
\verb|mutex.state|, \verb|mutex.waiters|, and \verb|mutex.spinlock|,
respectively.
Under these assumptions, the locking and unlocking operations can be
performed as shown in the pseudocode of Figures \ref{lock-pseudo-code} and
\ref{unlock-pseudo-code}. Figures \ref{scan-4-5} and \ref{scan-4-6}
illustrate the functioning of these operations.
\begin{figure}
\begin{verbatim}
to lock mutex:
lock mutex.spinlock (in cache-conscious fashion)
if mutex.state = 1 then
let mutex.state = 0
unlock mutex.spinlock
else
add current thread to mutex.waiters
remove current thread from runnable threads
unlock mutex.spinlock
yield to a runnable thread
\end{verbatim}
\caption{An attempt to lock a queuing mutex that is already in the
locked state causes the thread to join the wait queue, {\tt mutex.waiters}.}
\label{lock-pseudo-code}
\end{figure}
\begin{figure}
\begin{verbatim}
to unlock mutex:
lock mutex.spinlock (in cache-conscious fashion)
if mutex.waiters is empty then
let mutex.state = 1
else
move one thread from mutex.waiters to runnable
unlock mutex.spinlock
\end{verbatim}
\caption{If there is any waiting thread, the unlock operation on a queuing mutex causes a thread to become
runnable. Note that in this case, the mutex is left in the locked
state; effectively, the locked mutex is being passed directly from
one thread to another.}
\label{unlock-pseudo-code}
\end{figure}
\begin{figure}
\centerline{\includegraphics{hail_f0412}}
%\centerline{\epsfbox{scan-4-5.eps}}
\caption{Locking a queuing mutex that is unlocked simply changes the
mutex's state. Locking an already-locked queuing mutex, on the
other hand, puts the thread into the waiters list.}
\label{scan-4-5}
\end{figure}
\begin{figure}
\centerline{\includegraphics{hail_f0413}}
%\centerline{\epsfbox{scan-4-6.eps}}
\caption{Unlocking a queuing mutex with no waiting threads simply
changes the mutex's state. Unlocking a queuing mutex with waiting
threads, on the other hand, leaves the state set to locked but
causes one of the waiting threads to start running again, having
acquired the lock.}
\label{scan-4-6}
\end{figure}
One important feature to note in this mutex design concerns what
happens when a thread performs the unlock operation on a mutex that has one or more threads in the
waiters list. As you can see in Figure~\ref{unlock-pseudo-code}, the
mutex's state variable is not changed from the locked state (0) to the
unlocked state (1). Instead, the mutex is left locked, and one of the
waiting threads is woken up. In other words, the locked mutex is
passed directly from one thread to another, without ever really being
unlocked. In Section~\ref{convoy-section}, I will explain how this
design is partially responsible for the so-called convoy
phenomenon, which I describe there. In that same section, I will also present an
alternative design for mutexes that puts the mutex into the unlocked
state.
\section{Other Synchronization Patterns}\label{other-synchronization-problems-section}
Recall that synchronization refers to any form of control over the
relative timing of two or more threads. As such, synchronization
includes more than just mutual exclusion; a programmer may want to impose some
restriction on relative timing other than the rule of one thread at a
time. In this section, I present three other patterns of
synchronization that crop up over and over again in many applications:
bounded buffers, readers/writers locks, and barriers.
Sections \ref{bounded-buffers-section} through
\ref{barriers-section} will just describe the desired synchronization;
Sections \ref{condition-variables-section} and \ref{semaphores-section} show techniques that can be used to achieve the
synchronization.
\subsection{Bounded Buffers}\label{bounded-buffers-section}
Often, two threads are linked together in a processing
\vocab{pipeline}. That is,
the first thread produces a sequence of values that are consumed by
the second thread. For example, the first thread may be extracting
all the textual words from a document (by skipping over the formatting
codes) and passing those words to a second thread that speaks the
words aloud.
One simple way to organize the processing would be by strict
alternation between the producing and consuming threads. In the preceding
example, the first thread would extract a word, and then wait while
the second thread converted it into sound. The second thread would
then wait while the first thread extracted the next word. However,
this approach doesn't yield any concurrency: only one thread is
runnable at a time. This lack of concurrency may result in suboptimal
performance if the computer system has two processors, or
if one of the threads spends a lot of time waiting for an I/O device.
Instead, consider running the \index{producer}producer and the \index{consumer}consumer
concurrently. Every time the producer has a new value ready, the producer will
store the value into an intermediate storage area, called a \vocab{buffer}.
Every time the consumer is ready for the next value, it will retrieve
the value from the buffer. Under normal circumstances, each can operate at
its own pace. However, if the consumer goes to the buffer to retrieve
a value and finds the buffer empty, the consumer will need to wait for the
producer to catch up. Also, if you want to limit the size of the
buffer (that is, to use a \foldvocab{bounded}{buffer}), you need to make the producer wait
if it gets too far ahead of the consumer and fills the buffer. Putting
these two synchronization restrictions in place ensures that over the
long haul, the rate of the two threads will match up, although
over the short term, either may run faster than the other.
You should be familiar with the bounded buffer pattern from businesses
in the real world. For example, the cooks at a fast-food
restaurant fry burgers concurrently with the cashiers selling them.
In between the two is a bounded buffer of already-cooked burgers.
The exact number of burgers in the buffer will grow or shrink somewhat
as one group of workers is temporarily a little faster than the
other. Only under extreme circumstances does one group of workers
have to wait for the other. Figure~\ref{scan-4-7} illustrates a situation where no one needs to wait.
\begin{figure}
\centerline{\includegraphics{hail_f0414}}
%\centerline{\epsfbox{scan-4-7.eps}}
\caption{A cook fries burgers and places them in a bounded buffer,
queued up for later sale. A cashier takes burgers from the
buffer to sell. If there are none available, the cashier waits.
Similarly, if the buffer area is full, the cook takes a break from
frying burgers.}
\label{scan-4-7}
\end{figure}
One easy place to see bounded buffers at work in computer systems is
the \vocab{pipe} feature built into UNIX-family operating systems,
including Linux and Mac OS~X. (Microsoft Windows also now has an
analogous feature.) Pipes allow the output produced by one process to
serve as input for another. For example, on a Mac OS~X
system, you could open a terminal window with a shell in it and give the following command:
\begin{verbatim}
ls | say
\end{verbatim}
This runs two programs concurrently. The first, \verb|ls|,
lists
the files in your current directory.
The second one, \verb|say|, converts its textual input into speech and
plays it over the computer's speakers. In the shell
command, the vertical bar character (\verb:|:) indicates the pipe from
the first program to the second. The net result is a spoken
listing of your files.
A more mundane version of this
example works not only on Mac OS~X, but also on other UNIX-family
systems such as Linux:
\begin{verbatim}
ls | tr a-z A-Z
\end{verbatim}
Again, this runs two programs concurrently. This time the second one, \verb|tr|, copies
characters from its input to its output, with some changes
(transliterations) along the way; in this case, replacing lowercase
letters \verb|a-z| with the corresponding uppercase letters \verb|A-Z|. The net
result is an uppercase listing of your files. The file listing may
get ahead of the transliteration, as long as it doesn't overflow a
buffer the operating system provides for the pipe. Once there is a
backlog of listed files in the buffer, the transliteration can run as
fast as it wants until it exhausts that backlog.
\subsection{Readers/Writers Locks}\label{rw-section}
My next example of a synchronization pattern is actually quite
similar to mutual exclusion. Recall that in the ticket-sales
example, the audit function needed to acquire the mutex, even though auditing
is a read-only operation, in order to make sure that the audit read a consistent
combination of state variables. That design achieved correctness, but
at the cost of needlessly limiting concurrency: it prevented two
audits from being underway at the same time, even though two (or more)
read-only operations cannot possibly interfere with each other.
My goal now is to rectify that problem.
A \foldvocab{readers/writers}{lock} is much like a mutex, except that when a thread
locks the lock, it specifies whether it is planning to do any writing
to the protected data structure
or only reading from it. Just as with a mutex, the lock operation may not
immediately complete; instead, it waits until such time as the lock
can be acquired. The difference is that any number of readers can
hold the lock at the same time, as shown in Figure~\ref{scan-4-8}; they will not wait for each other. A
reader will wait, however, if a writer holds the lock. A writer will
wait if the lock is held by any other thread, whether by another
writer or by one or more readers.
\begin{figure}
\centerline{\includegraphics{hail_f0415}}
%\centerline{\epsfbox{scan-4-8.eps}}
\caption{A readers/writers lock can be held either by
any number of readers or by one writer. When the lock is held by
readers, all the reader threads can read the protected data
structure concurrently.}
\label{scan-4-8}
\end{figure}
Readers/writers locks are particularly valuable in situations where
some of the read-only operations are time consuming, as when reading a
file stored on disk. This is especially true if many readers are
expected. The choice between a mutex and a readers/writers lock is a