# METS: A Multiple Event Transient Simulator

Adam Watkins\*† and Spyros Tragoudas\*
\* Southern Illinois University Carbondale
Carbondale, IL 62901
† Los Alamos National Laboratories
Los Alamos, NM 87545
acwatkins88@lanl.gov, spyros@siu.edu

Abstract—Most existing soft error simulators do not consider multiple event transients or use simple electrical masking models to model the pulse shape. In this paper, the METS tool is proposed which employs BDDs and partitioning for faster simulation. Additionally, it uses an accurate electrical masking model to determine the output pulse shape which allows for accurate calculation of the soft error rate in the presence of multiple event transients (METs). The tool is tested on various ISCAS 85 benchmarks and is shown to have a speedup of up to 90X compared to Monte Carlo simulation.

#### I. Introduction

As process technology continues to the scale down, the likelihood of a radiation induced error increases. This trend provides a need for accurate and efficient tools to calculate the soft error rate of a given circuit. Since it is expensive and time consuming to design a circuit and test at a later time, efficient tools that accurately determine the error rate for a given netlist before fabrication can reduce the design time. However, it is difficult to characterize combinational circuits due to logical, electrical and temporal masking.

It has been shown in [3], [4] that concurrent estimation of all masking factors is required in order to ensure that the soft error rate is calculated accurately. Due to limitations on simulation time and available memory, efficient and accurate consideration of all masking factors is an ongoing problem. Moreover, the reduction in the transistor size has increased the probability of a single particle inducing a transient pulse in multiple transistors, referred to as a multiple event transient (MET).

Since the MET phenomenon is relatively new in combinational circuits, most existing efforts in soft error simulation only focus on the estimation of a single injected error [7], [3]. These tools do not consider METs and the electrical masking models used are not capable of efficiently or accurately determining the pulse shape when multiple pulses arrive at a gate simultaneously. This is an important aspect of MET analysis since two or more pulses that are injected concurrently may lead to many more pulses converging at a gate .

More recently, there has been work that considers the MET effect [4], [2]. In [4], the authors propose a tool which uses algebraic decision diagrams (ADDs). The tool in [2]

uses probabilistic arguments for determination of the logical effect. However, the data structure in [4] and the theoretical framework in [2] require that the transient pulse shape be approximated using a square shape. As suggested in [7], [8], the non-linear areas of the pulse shape has a drastic effect on the resulting pulse. This reveals a major drawback with both methods and necessitates enhanced approximation models.

In this paper, the Multiple Event Transient Simulator (METS) tool is proposed which accurately calculates the soft error rate (SER) in the presence of METs. Compared to existing tools, METS is designed specifically for use with an accurate electrical masking model which allows for better determination of the SER in the presence of METs. For example, the ADDs in [4] have terminating nodes representing a pulse with a calculated width or magnitude. For this simulator pulses with similar width or magnitude are represented by modifying the structure. However, when an accurate electrical masking model is used, the width and magnitude have a smaller granularity which requires many terminal nodes and may result in memory problems.

The authors in [2] propose a simulation tool which discretizes the simulation time into time steps that are used for analysis. As proposed, the approach uses integer values to propagate a square shaped pulse thus reducing the number of required time steps and providing fast simulation times. However, the use of accurate electrical masking models necessitates finer time steps so that the accuracy is preserved. If accurate models are used in [2], the simulation time is immense due to an intractable number of time steps.

To provide fast and accurate simulation, METS uses BDDs for the determination of the logical masking effect and the electrical masking model in [8] which can determine the pulse shape with SPICE-like accuracy for convergent pulses at a gate. To remedy the inherent problem of BDD blowup, partitioning is used which sacrifices SER calculation accuracy for a reduction in memory and simulation time. The rest of this paper is organized as follows: Section II gives the preliminaries for the calculation of the soft error rate, Section III gives the simulation flow of METS, Section IV gives the results, and Section V concludes the paper.

## II. CALCULATION OF THE SOFT ERROR RATE

When a high energy particle strikes a transistor, the magnitude and polarity of the pulse will depend on the configuration of the transistors. In CMOS logic, a transient pulse is generated when a particle hits a blocking transistor. This will, in turn,

This research has been supported in part by grants NSF IIP 1432026, NSF 1535658, and NSF IIP 1361847 from the NSF I/UCRC for Embedded Systems at SIUC. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

cause the transistor to temporarily conduct current allowing for the generation of the voltage pulse. In effect, this implies that for a rising pulse to be generated, the output must be a "0" value. Conversely, for a falling value, the output must be "1".

Given the Boolean functions for each gate input, the function for pulse generation can be determined by applying the respective gate function (eg. apply the AND operation for a AND or NAND gate) and inverting the output if the gate is inverting. The resulting function will represent the input patterns that allow for the output to be a "1". If the resulting pulse is a falling value, the function is not changed. However, if the pulse is rising, it is inverted such that a "1" value represents the case where a pulse is generated. Equations 1 and 2 provide the functions, F(G), for the generation of a rising and falling pulse, respectively, on an OR gate where  $F(I_i)$  represents the function for the i-th input and the gate has N inputs.

$$F(G) = \neg(F(I_1) \lor F(I_2) \lor ...F(I_N))$$
 (1)

$$F(G) = F(I_1) \vee F(I_2) \vee ... F(I_N)$$
 (2)

For the generation of METs, the above equations must be modified such that the joint probability of the error occurring is considered. In the case of pulse generation, it is assumed that a single radiation particle will strike two or more gates causing transient pulses. In order for a pulse with a specific polarity to be generated, the output of the gate must have a specific value. Based on this, the function  $F_{gen}$  for pulse generation assuming S gates are struck with sufficient energy is given in Equation 3, where  $F(G_{i,p})$  is the function representing all input patterns that allow gate  $G_i$  to generate a pulse of polarity p. Specifically if the pulse generated is rising,  $F(G_{i,p})$  pertains to the function that represents the patterns that force the output of  $G_{i,p}$  to "0". If the pulse is falling,  $F(G_{i,p})$  is the probability of the output being "1".

$$F_{gen} = F(G_{1,p}) \wedge F(G_{2,p}) \wedge ... F(G_{S,p})$$
 (3)

To determine the functions during pulse propagation, the off-inputs are considered using the logical AND operation. Assume that  $F(OI_i)$  is the Boolean function of the off-input  $OI_i$  which represents the input patterns that allow a non-controlling value and there are a N number of inputs. The function representing the case where the pulse is propagated F(M) is given below:

$$F(M) = F(OI_1) \wedge F(OI_2) \wedge ... F(OI_{N-1})$$
 (4)

In Equation 4, only N-1 inputs are considered since the input which contains the transient pulse is not included in the calculation.

Because the simulator considers multiple generated pulses, the likelihood of two or more pulses arriving at a gate increases. This creates multiple propagation cases that must be considered due to the possibility of pulses being logically masked. For example, if two pulses arrive at the input of a 2-input NAND gate, there are three cases: only the pulse on the first input arrives the other is logically masked, the pulse

on the second input arrives, the other is logically masked and both pulse arrive concurrently. Consideration of METs can lead to a substantial increase in simulation time since the cases scale w.r.t. the number of concurrent pulses. Let N denote the number of simultaneous pulses and  $P_{num}$  as the number of propagation cases.  $P_{num}$  can be determined using the below equation:

$$P_{num} = \sum_{k=1}^{N} \frac{N!}{k!(N-k)!}$$
 (5)

When a MET is generated, it is represented as an event  $E_k$  with k being the event number. For each event, the generated transients are propagated through gates to the primary outputs. Once the pulses arrive at the gate output, the temporal masking probability is considered. Based off the equation found in [6], let  $P_{L,i}$  be the probability of the pulse being latch on gate i, W be the pulse width,  $t_{setutp}$  and  $t_{hold}$  being the setup and hold times respectively and  $T_{clk}$  be the clock period, the probability of a pulse being latched on an output flip-flop is calculated in the below equation:

$$P_{L,i} = \frac{W - (t_{setup} + t_{hold})}{T_{clk}} \tag{6}$$

Assuming that the pulses from event  $E_k$  propagate to a M number of primary outputs with each pulse of event  $E_k$  having a probability of  $P_{L,i}$  calculated as in Equation 6, the probability of error for event  $E_k$  is calculated as the following:

$$P(E_k) = \sum_{i=1}^{M} P_{L,i}$$
 (7)

To evaluate the error probability for all events, the mean error susceptibility (MES), defined in [4] is used. The MES represents the average probability of error for all events. Assuming that there are a  $n_E$  number of events,  $n_d$  number of input probability distributions and K number of injected events, the MES is found using the following equation:

$$MES = \sum_{k=1}^{K} \frac{P(E_k)}{n_E n_d} \tag{8}$$

Based off the MES, the soft error rate (SER) can be calculated as in below where  $R_{eff}$  represents the effective concentration of particles in the given area,  $P_{eff}$  is the probability of a particle hitting the sensitive region of a transistor and A as the area of the sensitive volumes.

$$SER = MES * R_{eff} * P_{eff} * A \tag{9}$$

## III. DESCRIPTION OF THE METS SIMULATOR

The METS simulator operates in a topological order in which the primary inputs are visited first. Each primary input is represented as a variable for the BDDs. Once all inputs are visited, the simulator will then process gates within the circuit. At each gate, the logic BDD representing the gate output function is calculated. Specifically, the "1" terminal

on this BDD represents the Boolean function that makes the output a "1" value.

After the logic BDD is created, pulses are generated at the gate. If the pulse is rising, the pulse is only generated when the output is low. To consider this, the gate logic BDD is inverted such that a termination to "1" represents the case at which the pulse exists. Similarly, if the pulse is falling, the output must be a high value. Since the gate logic BDD already denotes a high value on the output, the logic function can be used directly to create the pulse generation BDD. If the pulse generation BDD does not evaluate to a "false" value, the electrical masking model in [8] is used to calculate the pulse shape. To determine the generation functions in the presence of a MET, the correlation between the inputs of both struck gates are found by using the logical AND operation between the generation functions. The logic behind this operation is that all pulses must be logically sensitized concurrently since they may have dependencies that must be considered.

After all pulses are generated at a gate in the METS simulator, the sensitization functions for the pulse arriving at the gate are determined by using the logical AND operation on each off-input BDD. Specifically, if the non-controlling value for the gate is a "0", such as in a OR gate, the BDDs are inverted so that the "1" terminal represents the case at which the pulse propagates. If the non-controlling value is a "1" the BDDs are left unchanged. The result of this operation is a single BDD in which the paths that lead to the "1" terminal represent all input patterns that allow the pulse to propagated while the paths that lead to the "0" terminal represent the patterns the will mask the pulse.

In the case of multiple pulses arriving at a gate simultaneously, the simple case of only a single pulse arriving is calculated using the previously discussed routine. All remaining cases include the circumstances where multiple pulses will arrive at the gate input simultaneously. In these cases, the logic functions are determined by using the AND operation on the pulse sensitization BDDs for the input pulses. The resulting function from this operation is then ANDed with the noncontrolling off-input logic functions. The basis behind this idea is that the pulse must be sensitized to the gate along with all off-inputs being non-controlling in order for the pulse to propagate.

To avoid the BDD blowup problem, METS uses partitioning to reduce the simulation time and memory overhead. For this paper, the circuits were partitioned using the Fiduccia and Mattheyses (FM) algorithm [1] which allows for a circuit to be partitioned into two equal size parts in linear time. To achieve a k part partition, the FM algorithm is applied recursively k-1 times. In the METS tool, the partitions are balanced according to the total number of fan-in pins because it has been experimentally observed that the computational complexity can be managed by controlling the fan-in pins per partition. Furthermore, it has been observed that the process has higher accuracy when partitions are connected with few lines. For these reasons, it is desirable to use the objectives in [1].

When the circuit is partitioned, the proposed simulator will extract each partition and simulate them individually. For each node on the output edge, the logic and pulse sensitization

BDDs are evaluated. The resulting pulse sensitization probability is stored with the pulse which is propagated between the partitions. Additionally, the logic probabilities from the logic BDDs are stored on the inputs of the next partition. Since a partition may be separated from the primary inputs that drive it, virtual inputs are created which store the logic and pulse sensitization probabilities. Using these values, the probabilities are multiplied by the evaluated probability of the function to determine the overall probability. This is shown in Fig. 1.



Fig. 1. Example of pulse propagation between partitions.

Lastly, when the simulator processes a circuit or partition output, the BDD functions for each pulse are solved to determine the probability. If the gate output is a primary output, the error probability for the gate is determined for each event  $E_k$  using equation 7. Additionally, if all other nodes have been visited, the MES is calculated using equation 8. An overview of the whole simulation flow is given in Algorithm 1.

#### Algorithm 1 METS

```
1: Set Process Parameters
2: Parse Netlist
   Organize Circuit Topologically
4: Parse Circuit Into k Parts
   for Each Partition do
      Extract Partition
6:
      Set Virtual Inputs
7:
      for Each Gate do
8:
        Generate Pulse
9:
10:
        Generate Sensitization Function
        Propagate Pulse
11:
        Calculate Convergence
12:
        if Primary Output then
13:
           Calculate P_{L,i}
14:
        else if Edge of Partition then
15:
           Solve BDD Probability
16:
        end if
17:
      end for
18:
19: end for
```

#### IV. RESULTS

The METS simulator was tested on the ISCAS 85 combinational benchmark circuits. METS was implemented in C++ on an eight core i7 with 16GB of RAM. The transistor lookup tables were characterized in HSPICE using the 32 nm

Predictive Technology Model (PTM) library [9] where  $V_{dd}$  was set to 1.05V. The unit gate capacitance was set to a constant capacitance of 2 fF to simulate the loading effects of the gates and interconnects. To calculate the MES, each circuit was operated at 1Ghz and a single pulse with an energy of 15 fC was applied to each using the current equation in [10] with  $\tau$  being set to  $32x10^{-15}$ . It was assumed that each output is connected to a flip-flop implemented in the same process library with a setup time  $(t_{setup})$  of 22 ps and a hold time  $(t_{hold})$  of -7 ps in accordance to [5]. For all simulations, the probability of MET was set to 10%. All METs were injected to a neighboring gate based on the netlist.

First, the accuracy of METS without partitioning is compared to Monte Carlo simulation using the approximation method in [8] on c17 and c880 to ensure that the probabilistic functions provided exactly compute the logical masking effect. Larger circuits were not tested since they lead to long simulation times and lead to memory blow up for METS without the use of partitioning. As can be observed in the Table I, METS provides exact estimation of the MES (and subsequently the SER) with an over 5x speedup.

TABLE I. COMPARISION OF METS VS MONTE CARLO

| Circuit | Simulator   | MES            | Run Time |
|---------|-------------|----------------|----------|
| c17     | Monte Carlo | $3.92x10^{-4}$ | 0.429    |
| c880    | Monte Carlo | $4.15x10^{-4}$ | 9110.35  |
| c17     | METS        | $3.92x10^{-4}$ | 0.0974   |
| c880    | METS        | $4.15x10^{-4}$ | 1909.37  |

The METS simulator provides an additional speedup through the use of partitioning. Table II provides the MES and the simulation time for various ISCAS 85 circuits while changing the partition size. According to the results, it shows that partitioning can provide an upward of 20X reduction in simulation time at a cost of accuracy compared to not using partitioning. Based off the results, it is shown that increasing the number of partitions follows the law of diminishing returns. For example in c880, the use of two partitions reduces the simulation time by 13X while the use of four partitions reduces the simulation time by 18X. While this is still a large decrease in simulation time, the error is increased by 4X. Additionally, it can be seen that in c17 the use of partitioning actually increases the simulation time. This is due to the time to simulate the circuit is less than the overhead incurred by the partitioning algorithm. Furthermore, extrapolation of the data on c880, it can be determined that the simulator with partitioning can provide up to 90X speed up over Monte Carlo simulation. In [8], the electrical masking model has a speed-up of 15X compared to HSPICE which indicates an up to 1350X speedup.

As observed by the results, the proper partition size to ensure an optimal trade-off between the error in calculating the MES and simulation time can be determined. For the current implementation, each partition was balanced based on the number of fan-in nodes to the gates. This is based on the assumption that gates with more fan-in nodes have a higher computational complexity due to more convergence cases and larger BDD sizes. To provide some direction on the ideal partition size, the average number of fan-in nodes among all partitions were counted and provided in Table II. For circuits c880 and c1355, the optimal trade-off between the error and

simulation time was at a partition size of three hundred fan-in pins. On c1355, for example, the error between the MES for one partition (ideal) and the partitioned circuit for two and four partitions is  $0.42x10^{-5}$  and  $0.47x10^{-5}$  respectively. While the difference in error is very small, the simulation time is halved when four partitions are used.

TABLE II. METS PERFORMANCE VS NUMBER OF PARTITIONS

| Circuit | Parts | MES            | Run Time | Partition Size |
|---------|-------|----------------|----------|----------------|
| c17     | 1     | $3.92x10^{-4}$ | 0.0974   | 12             |
| c17     | 2     | $3.83x10^{-4}$ | 0.1377   | 6              |
| c880    | 1     | $4.15x10^{-4}$ | 1909.37  | 729            |
| c880    | 2     | $3.13x10^{-4}$ | 139.54   | 350            |
| c880    | 4     | $8.37x10^{-4}$ | 107.72   | 180            |
| c1355   | 1     | $2.38x10^{-4}$ | 3685.00  | 1064           |
| c1355   | 2     | $1.96x10^{-4}$ | 459.00   | 536            |
| c1355   | 4     | $2.85x10^{-4}$ | 242.60   | 270            |
| c1355   | 8     | $3.67x10^{-4}$ | 167.961  | 135            |

## V. CONCLUSION

In this paper, the METS simulator is proposed. First, probabilistic equations were provided which allow for accurate consideration of the propagation effects of METs. In addition to the accurate approximation of the logical masking effect, METS uses a SPICE-like accurate electrical masking model. In the results it was shown that the tool provides a good trade-off between simulation time and accuracy due to the use of partitioning and an accurate masking model. Additionally, it is shown that the number of partitions reduces the simulation time up 18x compared to not using partitioning and 90x compared to Monte Carlo simulation.

### REFERENCES

- C. M. Fiduccia and R. M. Mattheyses. A linear-time heuristic for improving network partitions. In *Proceedings of the 19th Design Automation Conference*, DAC '82, pages 175–181, Piscataway, NJ, USA, 1982. IEEE Press.
- [2] S. Gangadhar and S. Tragoudas. An analytical method for estimating SET propagation. In 29th VLSI Test Symposium, pages 197–202, May 2011.
- [3] N. Miskov-Zivanov and D. Marculescu. Mars-c: modeling and reduction of soft errors in combinational circuits. In 2006 43rd ACM/IEEE Design Automation Conference, pages 767–772, July 2006.
- [4] N. Miskov-Zivanov and D. Marculescu. Multiple transient faults in combinational and sequential circuits: A systematic approach. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 29(10):1614–1627, Oct 2010.
- [5] C. Nunes, P. F. Butzen, A. I. Reis, and R. P. Ribas. A methodology to evaluate the aging impact on flip-flops performance. In *Integrated Circuits and Systems Design (SBCCI)*, 2013 26th Symposium on, pages 1–6, Sept 2013.
- [6] M. Omana, G. Papasso, D. Rossi, and C. Metra. A model for transient fault propagation in combinatorial logic. In *On-Line Testing Symposium*, 2003. IOLTS 2003. 9th IEEE, pages 111–115, July 2003.
- [7] F. Wang and Y. Xie. Soft error rate analysis for combinational logic using an accurate electrical masking model. *IEEE Transactions on Dependable and Secure Computing*, 8(1):137–146, Jan 2011.
- [8] A. Watkins and S. Tragoudas. An enhanced analytical electrical masking model for multiple event transients. In 2016 International Great Lakes Symposium on VLSI (GLSVLSI), pages 369–372, May 2016.
- [9] Wei Zhao and Yu Cao. Predictive technology model for nano-cmos design exploration. J. Emerg. Technol. Comput. Syst., 3(1), April 2007.
- [10] J. F. Ziegler. Terrestrial cosmic rays. IBM Journal of Research and Development, 40(1):19–39, Jan 1996.