# ECE 538: VLSI System Testing

Assignment 4

Alexander Zapata April 19, 2019

### **Duke Community Standard**

By submitting this LaTeX document, I affirm that

1. I have adhered to the Duke Community Standard in completing this assignment.

### Problem 1 Path Delay and Small Delay Defect Testing using Synopsys TetraMax:

#### a. Path Delay Faults

(i)

| Number of Critical Paths | 50     | 100    | 150    | 200    | 250    | 300    |
|--------------------------|--------|--------|--------|--------|--------|--------|
| Total Faults             | 50     | 100    | 150    | 200    | 250    | 300    |
| Detected                 | 44     | 82     | 123    | 128    | 129    | 128    |
| Test Coverage            | 88.00% | 82.00% | 82.00% | 64.00% | 51.60% | 42.67% |
| Patterns                 | 9      | 18     | 19     | 19     | 19     | 18     |
| CPU Time                 | 0.02   | 0.02   | 0.03   | 0.02   | 0.03   | 0.05   |

Table 1: Results for path-delay faults, 0.15ns clock period

(ii)

| Number of Critical Paths | 50     | 100    | 150    | 200    | 250    | 300    |
|--------------------------|--------|--------|--------|--------|--------|--------|
| Total Faults             | 50     | 100    | 150    | 200    | 250    | 300    |
| Detected                 | 44     | 82     | 123    | 128    | 129    | 128    |
| Test Coverage            | 88.00% | 82.00% | 82.00% | 64.00% | 51.60% | 42.67% |
| Patterns                 | 9      | 18     | 19     | 18     | 19     | 18     |
| CPU Time                 | 0.01   | 0.03   | 0.03   | 0.03   | 0.03   | 0.04   |

Table 2: Results for path-delay faults, 0.10ns clock period

The fault coverage for the 0.15ns/0.10ns path delay fault simulations were exactly the same. Having the same total faults, detected faults, and fault coverage means that—from one timing to the next—no additional delay faults were found on the critical paths tested (i.e., the paths not detected in the 0.15ns simulation had significant enough slack to also not be detected in the 0.10ns simulation). Between simulations, there was one more pattern for the 200 critical path simulation with 0.15ns clock than 0.10ns clock. This means that with the faster clock, fewer patterns were necessary to sensitize a delay long enough to detect. The CPU times were roughly the same for each simulation.

#### b. Small Delay Defects

(i)

| Slack               | 10%             | 15%              | 20%             | 25%              | 30%              |
|---------------------|-----------------|------------------|-----------------|------------------|------------------|
| Total Faults        | 4094            | 4094             | 4094            | 4094             | 4094             |
| Detected            | 3994            | 3994             | 3994            | 3994             | 3994             |
| Delay Effectiveness | 0.11 ns(55.17%) | 0.165 ns(30.75%) | 0.22 ns(50.08%) | 0.275 ns(49.82%) | 0.33 ns (53.68%) |
| SDQL                | 6289088.50      | 6126893.50       | 5438607.50      | 4897204.50       | 4477742.50       |
| CPU Time            | 0.07            | 0.07             | 0.07            | 0.08             | 0.08             |

Table 3: Results for small delay defects, 1.1ns clock period

(ii)

| Slack               | 10%             | 15%             | 20%             | 25%             | 30%             |
|---------------------|-----------------|-----------------|-----------------|-----------------|-----------------|
| Total Faults        | 4094            | 4094            | 4094            | 4094            | 4094            |
| Detected            | 3994            | 3994            | 3994            | 3994            | 3994            |
| Delay Effectiveness | 0.12 ns (6.64%) | 0.18 ns(49.23%) | 0.24 ns(25.95%) | 0.30 ns(44.24%) | 0.36 ns(49.65%) |
| SDQL                | 5756102.00      | 5206344.50      | 5301291.50      | 4501716.00      | 4122875.25      |
| CPU Time            | 0.08            | 0.06            | 0.07            | 0.07            | 0.08            |

Table 4: Results for small delay defects, 1.2ns clock period

The delay-effectiveness for almost all simulations using 1.2ns clock period (with the sole exception of 15% slack) was much lower than for the 1.1ns clock period simulations of the same slack-percentage. This means that fewer of the small delay defects could be detected in a circuit with a higher clock period. This is potentially because the small delay defects for a higher clock period (but with same percentage slack) have a longer path to sensitize— so fewer tests will be able to do so. The SDQL values for the 1.1ns simulations are relatively higher than those for their 1.2ns simulation counterparts. This is most-likely the case because of the delay-defect distribution (i.e., there are more small-delay defects that with little slack when the clock period is relatively smaller). The CPU times stayed relatively the same.

(iii)

| Slack               | 10%             | 15%             | 20%              | 25%             | 30%              |
|---------------------|-----------------|-----------------|------------------|-----------------|------------------|
| Total Faults        | 4094            | 4094            | 4094             | 4094            | 4094             |
| Detected            | 3994            | 3994            | 3994             | 3994            | 3994             |
| Delay Effectiveness | 0.10 ns(44.24%) | 0.15 ns(45.63%) | 0.20 ns (50.85%) | 0.25 ns(51.03%) | 0.30 ns (58.95%) |
| SDQL                | 6445672.00      | 5683769.50      | 5567790.00       | 5023122.00      | 4668992.50       |
| CPU Time            | 0.07            | 0.08            | 0.08             | 0.08            | 0.08             |

Table 5: Results for small delay defects, 1.0ns clock period

The delay-effectiveness is generally higher (with the exception of 10% slack simulation) for the 1.0ns clock period simulations when compared to the 1.1ns clock period simulations. This most-likely means that there were shorter path lengths of the detected defects (within the slack-percentage), because the allowed slack was smaller. So, more shorter paths counted towards delay-effectiveness in the 1.0ns simulations. SDLQ fluctuated for the 1.1ns and 1.0ns simulations for which was higher for a given slack-percentage. Delay-effectiveness and SDQL was generally lower for the 1.2ns simulations compared to the 1.0ns simulations (presumably for similar reasons to the ones discussed in part ii).

#### Problem 2 Response compaction using LFSRs:

#### (c) Simulation Code and Results:

```
\#compact(flops\ ,\ is\_tapped\ ,\ input\_polynomial)\ simulates\ an\ LFSR\ given\ input\ params.
    #flops- gives the instantiation value of the flip-flopsself.
    #is_tapped- is asserted if the flop input should be tapped (Type-2 LFSR) high-order -> low-order.
 4
    \#input\_polynomial- the input bits that are xor 'ed into the xn-1} flop in order.
 6
    def compact(flops, is_tapped, input_polynomial):
         into_one = False
         \#Iterating\ through\ the\ input\_polynomial\ bits .
9
         for i in range(len(input_polynomial)):
             \#Setting\ temporary\ flops\ so\ that\ we\ can\ update\ sequentially .
10
             temp_flops = flops.copy()
11
             #Update each flop depending on tap value and value of previous flop.
12
13
             for j in range(len(flops)):
                  if(is\_tapped[j] == 1 and j != 0):
14
                       temp\_flops[j] = flops[len(flops) - 1] flops[j - 1]
15
16
                       if(j != 0):
17
             \begin{array}{c} \texttt{temp\_flops[j]} = \texttt{flops[j-1]} \\ \texttt{temp\_flops[0]} = \texttt{input\_polynomial[i]} & \texttt{flops[len(flops)-1]} \end{array}
18
19
             \#Set\ current\ values\ to\ updated\ values .
20
21
              flops = temp_flops
22
             #Printing result.
             print("Timestep_Result_" + str(i + 1) + ":_" + str(flops))
23
24
         #Printing final flop values.
25
         for i in range(len(flops)):
              print("x".format(i) _+_": "_+_str(flops[i]))
26
27
28
    \#Calling\_the\_function\_with\_the\_values\_for\_the\_homework\,.
29
    compact ([0, -0, -0, -0], -[1, -0, -0, -1], -[1, -0, -1, -0, -1, -1, -1])
```

Listing 1: Python code used to simulate LFSR compaction

```
Alexanders-MacBook-Pro-10:ECE 538 Zapata$ python3 compaction.py
Timestep Result 1: [1, 0, 0, 0]
Timestep Result 2: [0, 1, 0, 0]
Timestep Result 3: [1, 0, 1, 0]
Timestep Result 4: [0, 1, 0, 1]
Timestep Result 5: [0, 0, 1, 1]
Timestep Result 6: [0, 0, 0, 0]
Timestep Result 7: [1, 0, 0, 0]
x^0: 1
x^1: 0
x^2: 0
x^3: 0
```

As can be seen in Figure 1, the result of the LFSR compaction was found to be a 1 in the  $x^0$  register (i.e., a 1). This is the same result that was found by hand on the attached sheet.

#### Problem 3 SOC Test Infrastructure Design:

To solve this problem, I used a heuristic BFD optimization algorithm (like discussed in class). The results and code are below.

|                             | Wrapper SC 1                                   | Wrapper SC 2                                   | Wrapper SC 3                 | Wrapper SC 4                                  |
|-----------------------------|------------------------------------------------|------------------------------------------------|------------------------------|-----------------------------------------------|
| Wrapper<br>Internal SCs     | Included: - one 12-bit chain - one 6-bit chain | Included: - one 12-bit chain - one 6-bit chain | Included: - two 8-bit chains | Included: - one 8-bit chain - two 6-bit chain |
| Wrapper<br>Input Cells      | 3                                              | 2                                              | 4                            | 0                                             |
| Wrapper<br>Output Cells     | 2                                              | 3                                              | 3                            | 3                                             |
| Scan-in, Scan-out<br>Length | 23                                             | 23                                             | 23                           | 23                                            |

Table 6: Wrapper design generated for embedded core C, TAM width of 4

```
import math
         def minimize_function(wrapper_chains, objects_to_add, activity_monitor, name):
 3
 4
                     for i in range(len(objects_to_add)):
                               min\_chain = wrapper\_chains.index(min(wrapper\_chains))
 5
  6
                               max_chain = wrapper_chains.index(max(wrapper_chains))
 7
                               minimum\_chain\_val = math.inf
  8
                               minimum\_chain\_num = -1
 9
                               for j in range(len(wrapper_chains)):
10
                                          value_to_minimize = wrapper_chains [max_chain] - (objects_to_add[i] + wrapper_chains[j])
11
                                         if((value_to_minimize >= 0) and (value_to_minimize < minimum_chain_val)):</pre>
                                                    minimum_chain_val = value_to_minimize
12
13
                                                   minimum_chain_num = j
14
                               if(minimum\_chain\_num == -1):
15
                                         wrapper_chains [min_chain] += objects_to_add[i]
                                         activity_monitor[min_chain].append("{}:_{{}}".format(name, objects_to_add[i]))
16
17
18
                                         wrapper_chains[minimum_chain_num] += objects_to_add[i]
                                         activity_monitor[minimum_chain_num].append("{}:_{{}}".format(name, objects_to_add[i]))
19
20
         def design_wrapper(tam_width, primary_input_num, primary_output_num, internal_scan):
21
22
                     wrapper_chains = []
23
                     activity_monitor = []
                     for i in range(tam_width):
24
                               wrapper_chains.append(0)
25
26
                               activity_monitor.append(["Wrapper-Chain_{{}}".format(i + 1)])
27
                    #Step 1
28
                    internal_scan.sort(reverse = True)
29
                     minimize_function(wrapper_chains, internal_scan, activity_monitor, "sc")
30
                    primary_inputs = [1] * primary_input_num
31
                     \label{lem:minimize_function} \verb"minimize_function" ( \verb"wrapper_chains", primary_inputs", activity_monitor", "pi") \\
32
33
34
                    primary_outputs = [1] * primary_output_num
                    \label{lem:minimize_function} \\ \text{minimize\_function(wrapper\_chains, primary\_outputs, activity\_monitor, "po")} \\ \text{activity\_monitor.append("Final\_wrapper\_chains: $$\_{}$".$ \\ \textbf{format(wrapper\_chains))} \\ \\ \text{primary\_outputs, activity\_monitor, "po")} \\ \text{activity\_monitor.append("Final\_wrapper\_chains: $$\_{}$".$ \\ \textbf{format(wrapper\_chains))} \\ \text{primary\_outputs, activity\_monitor, "po")} \\ \text{activity\_monitor.append("Final\_wrapper\_chains: $$\_{}$".$ \\ \textbf{format(wrapper\_chains))} \\ \text{primary\_outputs, activity\_monitor, append("Final\_wrapper\_chains: $$\_{}$".$ \\ \textbf{format(wrapper\_chains))} \\ \text{primary\_outputs, activity\_monitor, append("Final\_wrapper\_chains: $$\_{}$".$ \\ \textbf{format(wrapper\_chains: }$$\_{}$".$ \\ \textbf{format(wrapper\_chains: }$$\_{}".$ 
35
36
37
                    \mathbf{print}\,(*\,\mathrm{activity}\,\text{-monitor}\;,\;\;\mathrm{sep}\;=\;"\,\backslash\,n"\,)
38
                    return wrapper_chains
39
         design_wrapper(4, 9, 11, [12, 12, 8, 8, 8, 6, 6, 6, 6])
```

Listing 2: Python code used to generate wrapper design.

## Problem 4 soc testing:

(a)

The code used for problem 3 (in Listing 2) was used to design wrappers of the appropriate sizes (w = 2, 3, 4, 5, 6), the results are as tabulated below.

|                             | Wrapper SC 1                                   | Wrapper SC 2                                   |
|-----------------------------|------------------------------------------------|------------------------------------------------|
| Wrapper<br>Internal SCs     | Included: - one 12-bit chain - one 8-bit chain | Included: - one 12-bit chain - one 8-bit chain |
| Wrapper<br>Input Cells      | 8                                              | 8                                              |
| Wrapper<br>Output Cells     | 4                                              | 4                                              |
| Scan-in, Scan-out<br>Length | 32                                             | 32                                             |

Table 7: Wrapper design for width of 2

|                             | Wrapper SC 1       | Wrapper SC 2       | Wrapper SC 3      |
|-----------------------------|--------------------|--------------------|-------------------|
| Wrapper                     | Included:          | Included:          | Included:         |
| Internal SCs                | - one 12-bit chain | - one 12-bit chain | - two 8-bit chain |
| Wrapper<br>Input Cells      | 7                  | 7                  | 2                 |
| Wrapper<br>Output Cells     | 3                  | 2                  | 3                 |
| Scan-in, Scan-out<br>Length | 22                 | 21                 | 21                |

Table 8: Wrapper design for width of 3

|                             | Wrapper SC 1       | Wrapper SC 2       | Wrapper SC 3      | Wrapper SC 4      |
|-----------------------------|--------------------|--------------------|-------------------|-------------------|
| Wrapper                     | Included:          | Included:          | Included:         | Included:         |
| Internal SCs                | - one 12-bit chain | - one 12-bit chain | - one 8-bit chain | - one 8-bit chain |
| Wrapper<br>Input Cells      | 2                  | 2                  | 6                 | 6                 |
| Wrapper<br>Output Cells     | 2                  | 2                  | 2                 | 2                 |
| Scan-in, Scan-out<br>Length | 16                 | 16                 | 16                | 16                |

Table 9: Wrapper design for width of 4

|                   | Wrapper SC 1       | Wrapper SC 2       | Wrapper SC 3      | Wrapper SC 4      | Wrapper SC 5 |
|-------------------|--------------------|--------------------|-------------------|-------------------|--------------|
| Wrapper           | Included:          | Included:          | Included:         | Included:         | None         |
| Internal SCs      | - one 12-bit chain | - one 12-bit chain | - one 8-bit chain | - one 8-bit chain | None         |
| Wrapper           | 0                  | 0                  | 4                 | 4                 | 0            |
| Input Cells       | U                  | U                  | 4                 | 4                 | 0            |
| Wrapper           | 1                  | 1                  | 1                 | 1                 | 4            |
| Output Cells      | 1                  | 1                  | 1                 | 1                 | 4            |
| Scan-in, Scan-out | 13                 | 13                 | 13                | 13                | 12           |
| Length            |                    | _                  |                   |                   |              |

Table 10: Wrapper design for width of 5

|                             | Wrapper SC 1       | Wrapper SC 2       | Wrapper SC 3      | Wrapper SC 4      | Wrapper SC 5 | Wrapper SC 6 |
|-----------------------------|--------------------|--------------------|-------------------|-------------------|--------------|--------------|
| Wrapper                     | Included:          | Included:          | Included:         | Included:         | None         | None         |
| Internal SCs                | - one 12-bit chain | - one 12-bit chain | - one 8-bit chain | - one 8-bit chain | TVOIC        | rone         |
| Wrapper<br>Input Cells      | 0                  | 0                  | 4                 | 4                 | 8            | 0            |
| Wrapper<br>Output Cells     | 0                  | 0                  | 0                 | 0                 | 4            | 4            |
| Scan-in, Scan-out<br>Length | 12                 | 12                 | 12                | 12                | 12           | 4            |

Table 11: Wrapper design for width of 6

(b)

Knowing the acceptable widths of the wrappers, and the fact that each of the 8 SOCs are identical, greatly simplified this problem. The following code was used to maximize the minimum wrapper width for this TAM design (minimizing scan-in/scan-out length and therefore minimizing test-time).

```
2
   def tam_designer(total_width, wrapper_widths, num_cores):
3
        wrapper_widths.sort(reverse = True)
4
        tam_routes = [0] * num_cores
        for i in range(len(wrapper_widths)):
5
            if((sum(tam\_routes) = 0) or (sum(tam\_routes) > total\_width)):
6
7
                for j in range(num_cores):
8
                     tam_routes[j] = wrapper_widths[i]
9
        while(sum(tam_routes) < total_width):</pre>
10
            min_index = tam_routes.index(min(tam_routes))
            tam\_routes[min\_index] += 1
11
12
        return tam_routes
13
   print (tam_designer (36, [2, 3, 4, 5, 6], 8))
14
```

Listing 3: Python code used to minimize TAM test time for the given SOC design

In the TAM design result, each embedded core wrapper is given their own TAM lines; however, the TAM width is not the same for each core because of the 36-bit TAM width constraint. No wrappers share TAM lines in this case, as preempting tests would result in longer maximum test time for the given widths. Four of the wrappers will have a TAM width of 5 and the other will have TAM widths of 4. This ensures the maximum scan-in/scan-out time is 16 clock cycles, corresponding to the BFD optimization in part (a) and assuming one clock cycle per single-bit shift-in/shift-out.