## **Contents**

|           | roleword                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                         |  |  |
|-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|--|--|
|           | Preface                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                         |  |  |
|           | Acknowledgments                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | xxii                                                                    |  |  |
| Chapter 1 | Fundamentals of Computer Design                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                                                         |  |  |
|           | <ul> <li>1.1 Introduction</li> <li>1.2 Classes of Computers</li> <li>1.3 Defining Computer Architecture</li> <li>1.4 Trends in Technology</li> <li>1.5 Trends in Power in Integrated Circuits</li> <li>1.6 Trends in Cost</li> <li>1.7 Dependability</li> <li>1.8 Measuring, Reporting, and Summarizing Performance</li> <li>1.9 Quantitative Principles of Computer Design</li> <li>1.10 Putting It All Together: Performance and Price-Performance</li> <li>1.11 Fallacies and Pitfalls</li> <li>1.12 Concluding Remarks</li> <li>1.13 Historical Perspectives and References Case Studies with Exercises by Diana Franklin</li> </ul> | 2<br>2<br>8<br>14<br>17<br>19<br>25<br>28<br>37<br>44<br>48<br>52<br>54 |  |  |
| Chapter 2 | Instruction-Level Parallelism and Its Exploitation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                                                         |  |  |
|           | <ul> <li>2.1 Instruction-Level Parallelism: Concepts and Challenges</li> <li>2.2 Basic Compiler Techniques for Exposing ILP</li> <li>2.3 Reducing Branch Costs with Prediction</li> <li>2.4 Overcoming Data Hazards with Dynamic Scheduling</li> <li>2.5 Dynamic Scheduling: Examples and the Algorithm</li> <li>2.6 Hardware-Based Speculation</li> <li>2.7 Exploiting ILP Using Multiple Issue and Static Scheduling</li> </ul>                                                                                                                                                                                                        | 66<br>72<br>80<br>89<br>97<br>104<br>114                                |  |  |

|           | 2.11<br>2.12                                                | and Speculation Advanced Techniques for Instruction Delivery and Speculation Putting It All Together: The Intel Pentium 4 Fallacies and Pitfalls Concluding Remarks Historical Perspective and References Case Studies with Exercises by Robert P. Colwell                                                                                                                                                                                                         | 118<br>121<br>131<br>138<br>140<br>141<br>142                             |  |  |  |
|-----------|-------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|--|--|--|
| Chapter 3 | Limits on Instruction-Level Parallelism                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                                                           |  |  |  |
|           | 3.1<br>3.2<br>3.3<br>3.4<br>3.5<br>3.6<br>3.7<br>3.8<br>3.9 | Introduction Studies of the Limitations of ILP Limitations on ILP for Realizable Processors Crosscutting Issues: Hardware versus Software Speculation Multithreading: Using ILP Support to Exploit Thread-Level Parallelism Putting It All Together: Performance and Efficiency in Advanced Multiple-Issue Processors Fallacies and Pitfalls Concluding Remarks Historical Perspective and References Case Study with Exercises by Wen-mei W. Hwu and John W. Sias | 154<br>154<br>165<br>170<br>172<br>179<br>183<br>184<br>185               |  |  |  |
| Chapter 4 | Mult                                                        | tiprocessors and Thread-Level Parallelism                                                                                                                                                                                                                                                                                                                                                                                                                          |                                                                           |  |  |  |
|           |                                                             | Introduction Symmetric Shared-Memory Architectures Performance of Symmetric Shared-Memory Multiprocessors Distributed Shared Memory and Directory-Based Coherence Synchronization:The Basics Models of Memory Consistency: An Introduction Crosscutting Issues Putting It All Together:The Sun T1 Multiprocessor Fallacies and Pitfalls Concluding Remarks Historical Perspective and References Case Studies with Exercises by David A. Wood                      | 196<br>205<br>218<br>230<br>237<br>243<br>246<br>249<br>257<br>262<br>264 |  |  |  |
| Chapter 5 | Men                                                         | nory Hierarchy Design                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                           |  |  |  |
|           | 5.1<br>5.2<br>5.3                                           | Introduction Eleven Advanced Optimizations of Cache Performance Memory Technology and Optimizations                                                                                                                                                                                                                                                                                                                                                                | 288<br>293<br>310                                                         |  |  |  |

|            |                                        | Co                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | ontents | xiii                                                                        |
|------------|----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|-----------------------------------------------------------------------------|
|            | 5.4<br>5.5<br>5.6<br>5.7<br>5.8<br>5.9 | Protection: Virtual Memory and Virtual Machines<br>Crosscutting Issues: The Design of Memory Hierarchies<br>Putting It All Together: AMD Opteron Memory Hierarchy<br>Fallacies and Pitfalls<br>Concluding Remarks<br>Historical Perspective and References<br>Case Studies with Exercises by Norman P. Jouppi                                                                                                                                                                           |         | 315<br>324<br>326<br>335<br>341<br>342<br>342                               |
| Chapter 6  | Stor                                   | age Systems                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |         |                                                                             |
|            |                                        | Introduction Advanced Topics in Disk Storage Definition and Examples of Real Faults and Failures I/O Performance, Reliability Measures, and Benchmarks A Little Queuing Theory Crosscutting Issues Designing and Evaluating an I/O System—The Internet Archive Cluster Putting It All Together: NetApp FAS6000 Filer Fallacies and Pitfalls Concluding Remarks Historical Perspective and References Case Studies with Exercises by Andrea C. Arpaci-Dusseau an Remzi H. Arpaci-Dusseau | nd      | 358<br>358<br>366<br>371<br>379<br>390<br>392<br>397<br>399<br>403<br>404   |
| Appendix A | Pipe                                   | lining: Basic and Intermediate Concepts                                                                                                                                                                                                                                                                                                                                                                                                                                                 |         |                                                                             |
|            | A.9                                    | Introduction The Major Hurdle of Pipelining—Pipeline Hazards How Is Pipelining Implemented? What Makes Pipelining Hard to Implement? Extending the MIPS Pipeline to Handle Multicycle Operation Putting It All Together: The MIPS R4000 Pipeline Crosscutting Issues Fallacies and Pitfalls Concluding Remarks Historical Perspective and References                                                                                                                                    | าร      | A-2<br>A-11<br>A-26<br>A-37<br>A-47<br>A-56<br>A-65<br>A-75<br>A-76<br>A-77 |
| Appendix B | B.1<br>B.2<br>B.3<br>B.4<br>B.5        | Introduction Classifying Instruction Set Architectures Memory Addressing Type and Size of Operands Operations in the Instruction Set                                                                                                                                                                                                                                                                                                                                                    |         | B-2<br>B-3<br>B-7<br>B-13<br>B-14                                           |

## **xiv** Contents

|                | <ul><li>B.6 Instructions for Control Flow</li><li>B.7 Encoding an Instruction Set</li></ul>  | B-16<br>B-21 |  |  |
|----------------|----------------------------------------------------------------------------------------------|--------------|--|--|
|                | B.8 Crosscutting Issues:The Role of Compilers                                                | B-24         |  |  |
|                | B.9 Putting It All Together: The MIPS Architecture                                           | B-32         |  |  |
|                | B.10 Fallacies and Pitfalls                                                                  | B-39         |  |  |
|                | <ul><li>B.11 Concluding Remarks</li><li>B.12 Historical Perspective and References</li></ul> | B-45<br>B-47 |  |  |
| Appendix C     | Review of Memory Hierarchy                                                                   |              |  |  |
|                | C.1 Introduction                                                                             | C-2          |  |  |
|                | C.2 Cache Performance                                                                        | C-15         |  |  |
|                | C.3 Six Basic Cache Optimizations                                                            | C-22         |  |  |
|                | <ul><li>C.4 Virtual Memory</li><li>C.5 Protection and Examples of Virtual Memory</li></ul>   | C-38<br>C-47 |  |  |
|                | C.6 Fallacies and Pitfalls                                                                   | C-56         |  |  |
|                | C.7 Concluding Remarks                                                                       | C-57         |  |  |
|                | C.8 Historical Perspective and References                                                    | C-58         |  |  |
|                | Companion CD Appendices                                                                      |              |  |  |
| Appendix D     | Embedded Systems                                                                             |              |  |  |
| A mm am dist F | Updated by Thomas M. Conte Interconnection Networks                                          |              |  |  |
| Appendix E     | Revised by Timothy M. Pinkston and José Duato                                                |              |  |  |
| Appendix F     | Vector Processors                                                                            |              |  |  |
| Аррениіх і     | Revised by Krste Asanovic                                                                    |              |  |  |
| Appendix G     | Hardware and Software for VLIW and EPIC                                                      |              |  |  |
| Appendix H     | Large-Scale Multiprocessors and Scientific Applications                                      |              |  |  |
| Appendix I     | ix I Computer Arithmetic                                                                     |              |  |  |
|                | by David Goldberg                                                                            |              |  |  |
| Appendix J     | Survey of Instruction Set Architectures                                                      |              |  |  |
| Appendix K     | Historical Perspectives and References                                                       |              |  |  |
|                | Online Appendix (textbooks.elsevier.com/0123704901)                                          |              |  |  |
| Appendix L     | Solutions to Case Study Exercises                                                            |              |  |  |
|                | References                                                                                   | R-1          |  |  |
|                | Index                                                                                        | I-1          |  |  |
|                |                                                                                              |              |  |  |