|
c334ce77
»
|
Erik |
2008-07-26 |
Yay for ispell |
1 |
= OSCON 2008, Tutorial 3: Ubiquitous Multi-threading in a Multi-core World |
|
d8e7fba2
»
|
Erik |
2008-07-22 |
Adding Ubiquitous Multithre... |
2 |
|
| |
3 |
== Shift from serial to parallel |
| |
4 |
=== Process |
| |
5 |
- Find things that can be done almost independendently |
| |
6 |
- Analyze communication (dependences) |
| |
7 |
- Organize dependences for parallelism |
| |
8 |
- <b>Do this early!</b> |
| |
9 |
|
| |
10 |
=== Generic programming |
| |
11 |
- Make assumptions |
| |
12 |
- eg. Quicksort -> walk bidirectionally and swap items. |
| |
13 |
|
| |
14 |
=== Generic iteration |
|
c334ce77
»
|
Erik |
2008-07-26 |
Yay for ispell |
15 |
- Dependences hinder parallel execution |
|
d8e7fba2
»
|
Erik |
2008-07-22 |
Adding Ubiquitous Multithre... |
16 |
- STL foreach is a good example, has to check the iterator before calling the function |
| |
17 |
|
| |
18 |
=== Dealing with Dependences |
| |
19 |
- Remove dependences |
| |
20 |
- Rearrange dependences to shorten critical path |
| |
21 |
- Domain experts are better than programmers since they know where to break rules. |
| |
22 |
|
| |
23 |
=== Parallel iteration |
| |
24 |
- Know number of iterations ahead of time to control dependence |
| |
25 |
- Linked-lists/variable length structures suck for what we're talking about |
| |
26 |
|
| |
27 |
|
| |
28 |
== Correctness |
| |
29 |
- Make sure you have the sequential version right first |
| |
30 |
- <i>"Embarrassing parallelism is good"</i> (big arrays, no dependences) |
| |
31 |
|
| |
32 |
=== First, define what is Correct |
| |
33 |
- Matching a serial program bit-for-bit might be unrealistic |
| |
34 |
|
| |
35 |
==== Examples |
| |
36 |
- Floating point round-off in fluid solvers (iteration process that solves a parameter, inaccuracies of floating point will introduce error) |
| |
37 |
- MPEG compression - trading compression for parallelism |
|
c334ce77
»
|
Erik |
2008-07-26 |
Yay for ispell |
38 |
- Search returns one of several acceptable answers |
|
d8e7fba2
»
|
Erik |
2008-07-22 |
Adding Ubiquitous Multithre... |
39 |
|
| |
40 |
=== Race conditions |
| |
41 |
- Shared data, winners and losers |
| |
42 |
|
|
c334ce77
»
|
Erik |
2008-07-26 |
Yay for ispell |
43 |
=== Synchronization |
|
d8e7fba2
»
|
Erik |
2008-07-22 |
Adding Ubiquitous Multithre... |
44 |
==== Low-level |
| |
45 |
- Mutexes, condition variables (wait on condition, no lock), tricky events |
| |
46 |
- Atomic operations: guaranteed to happen without interruption |
| |
47 |
- Emphasis on a pair of threads |
| |
48 |
|
| |
49 |
==== Higher-level |
| |
50 |
- Parallel loops |
| |
51 |
- Pipelines |
| |
52 |
- Barriers - serialization after parallel, waiting for parallel to finish |
| |
53 |
- Work queues - dynamic scheduling |
| |
54 |
|
| |
55 |
==== Mutex |
| |
56 |
- A lock on a (critical) section of code. |
| |
57 |
- We have 2 things (or more) we want to change at the same time |
| |
58 |
|
| |
59 |
==== Semaphore |
| |
60 |
- Let up to N threads in at the same time |
| |
61 |
|
| |
62 |
==== Reader-writer lock |
| |
63 |
- Multiple readers or one writer at a time |
| |
64 |
- Useful when there's lots of reading, little writing |
| |
65 |
|
| |
66 |
==== Condition variables |
| |
67 |
- Allow threads to wait for state protected by mutex to change, without holding the mutex and without timing holes (uses signaling) |
| |
68 |
|
| |
69 |
=== Problems with locks |
| |
70 |
==== Composition |
| |
71 |
- Locking lower level operations does not guarantee higher level is race free |
| |
72 |
|
| |
73 |
==== Deadlock |
| |
74 |
- Everyone's waiting for a lock that no one can give |
| |
75 |
|
| |
76 |
==== Convoying |
| |
77 |
- Similar to deadlocking, owner of lock is preempted, other threads wait behind it |
| |
78 |
- Owner lock crashes, other threads wait forever |
| |
79 |
- Minimize convoying with atomics and minimize lock-length time |
| |
80 |
|
| |
81 |
==== Priority Inversion |
| |
82 |
- Can occur with prioritized preemptive scheduling |
| |
83 |
- Low-priority thread is preempted when holding lock |
| |
84 |
- Medium-priority thread runs in preference to low-priority thread |
| |
85 |
- High-priority thread waits forever on a lock, times out, and restarts sys |
| |
86 |
- Mars Pathfinder example: http://research.microsoft.com/~mbj/Mars_Pathfinder/Mars_Pathfinder.html |
| |
87 |
|
| |
88 |
=== Composition problem |
| |
89 |
- Multiple threads might append the same thing to a list, for example |
| |
90 |
- Move your locks to the outermost invariant |
| |
91 |
|
| |
92 |
=== Notes on Mutexes |
| |
93 |
- Avoid exposing mutexes to other packages |
| |
94 |
- Look into invariant-based programming |
| |
95 |
- <b>Remember exception handling</b> |
| |
96 |
|
| |
97 |
=== Exception-safe mutexing using RAII |
|
c334ce77
»
|
Erik |
2008-07-26 |
Yay for ispell |
98 |
- RAII = Resource Acquisition is initialization |
|
d8e7fba2
»
|
Erik |
2008-07-22 |
Adding Ubiquitous Multithre... |
99 |
- Constructor acquires resource |
| |
100 |
- Destructor releases resource |
| |
101 |
|
| |
102 |
=== Lockless problems |
| |
103 |
- Livelock - when everyone gets a lock! |
| |
104 |
- ABA problem - When you read a var as A, it then is changed to B, then back to A, then you screw up (linked list example). |
| |
105 |
- Memory reclamation - compare and swaps required, so one has to succeed, the rest have to fail. You might have trouble when freeing memory in case it gets. See Hazard Pointers: http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf |
| |
106 |
- Memory Consistency model |
| |
107 |
- Lock-free data structures are difficult to understand |
| |
108 |
- Often publishable, even for simple structures |
| |
109 |
- Verification is tricky: consider using Spin to verify (http://www.spinroot.com) |
| |
110 |
|
| |
111 |
=== Tools for correctness |
| |
112 |
- KISS: Keep it simple stupid |
| |
113 |
- Use automatic race detectors: Detect races like memory checkers detect leaks |
| |
114 |
- Helgrind (part of Valgrind) |
| |
115 |
- Intel thread checker: more general race detection based on inter-thread communication |
| |
116 |
|
| |
117 |
== Scalability tidbits |
| |
118 |
- Creating and destroying a thread can take on the order of <b>25,000 clock cycles!</b> |
| |
119 |
|