# **Soft Errors: The Hardware-Software Interface**

Kyoungwoo Lee Yonsei University Seoul, South Korea kyoungwoo.lee@yonsei.ac.kr Aviral Shrivastava
Arizona State University
Phoenix, USA
aviral.shrivastava@asu.edu

Reiley Jeyapaul Arizona State University Phoenix, USA reiley@asu.edu

#### **ABSTRACT**

A recent report from the ITRS identifies soft errors, as one of the most important reliability challenges for the coming decades. Soft errors are transient errors caused by several effects e.g., voltage fluctuations, wire-cross talks, and cosmic particle strikes; and manifest as a temporary switch of the logic value of a transistor. While it is not possible to prove nor disprove that a certain error happened due to soft errors, several fiscal disasters e.g., Sun server crashes in 2000, and HP server crashes in 2005, have been attributed to soft errors. Industry has moved from the position of ignoring soft errors to adding design efforts for protection from them. For instance, in the recently announced nVIDIA's Fermi GPUs, the L1 cache, L2 cache and register files are ECC protected. Although the soft error rate is about once-per year today, it is expected to reach alarming levels of once-per-day in about a decade or two. Researchers are busy finding cost-effective solutions to protect computing devices from soft errors.

This tutorial will attempt to cover the entire gamut of soft error protection techniques, but will particularly focus on the soft error mitigation techniques at the hardware/software interface. Much time will be spent on microarchitectural, compiler, and hybrid compiler-microarchitectural techniques for soft error mitigation. This tutorial will be particularly useful for budding researchers who are fascinated by soft errors, and want to explore this as their research direction. For such researchers, this tutorial will be a onestop-shop to acquire knowledge of and analyze seminal research work in the field of soft error mitigation, at several design layers. For developers who have been working on soft errors at different levels, this will give them a picture of what can be done at other levels, so that they can provide complementary cross-layer protection. Finally, researchers and developers working on other aspects of system design can learn how soft errors are going to affect them.

## **Categories and Subject Descriptors**

B.7.3 [Integrated Circuits]: Reliability and testing D.3.4 [Programming Languages]: Processors

## Keywords

Soft error mitigation, compiler, microarchitectural techniques.

#### 1. SPEAKERS AT THE TUTORIAL

Prof. Aviral Shrivastava is Associate Professor in the School of Computing Informatics, and Decision Systems Engineering at the Arizona State University, USA. He received his Masters and Ph.D. in Computer Science from University of California, Irvine, and Bachelors in Computer Science from Indian Institute of Technology, Delhi. Prof. Shrivastava received the prestigious 2010 NSF CAREER award for his research and education on Soft Errors.

Prof. Kyoungwoo Lee is Assistant Professor in the Department of Computer Science and Engineering at Yonsei University, Seoul, South Korea. His research is in the area of embedded systems, with a specific focus on cross-layer design and optimization for error-aware and energy-efficient embedded systems.

Dr. Reiley Jeyapaul is a Post-Doctoral Researcher at the Compiler Microarchitecture Lab (CML), Arizona State University, USA. His research focuses on developing methods to ensure reliability in modern and future computing systems. His recent paper on Cache Vulnerability Equations (CVE), was the second highest rated paper in LCTES 2010..

### 2. TOPICS COVERED

- Soft Errors, Trends, and Challenges
- Low-level and Microarchitectural Techniques
- Compiler Based Techniques for Soft Error Mitigation
- System and Program Level Techniques
- Conclusion and Future Directions

This research was partially supported by funding from National Science Foundation grant CCF-1055094 (CAREER) and the MKE (Ministry of Knowledge Economy), Korea, under the Global Collaborative R&D program supervised by the KIAT (M002300089).

Copyright is held by the author/owner(s). *CODES+ISSS'12*, October 7–12, 2012, Tampere, Finland. ACM 978-1-4503-1426-8/12/09.