# **INTRANEX**

INTRANEX is a **programmable interconnect network** that accepts a N bit input W and produces a N bit output Z. The interconnect can be programmed to realize any mapping from W to Z.

 $\begin{array}{c} \textit{University of Cincinnati - EECE 6080} \\ & \text{Fall 2013} \end{array}$ 

Max Thrun 973 919 6593 max.thrun@gmail.com (Coordinator)

 $\begin{array}{c} {\rm Xiaohu~Qi} \\ 513~652~2075 \\ {\rm qixiaohuihaha@gmail.com} \end{array}$ 

## Contents

| 1 | Pro | gress Report 1                                                                                         | 8  |
|---|-----|--------------------------------------------------------------------------------------------------------|----|
|   | 1.1 | Pinout Diagram                                                                                         | 9  |
|   | 1.2 | Chip Functionality                                                                                     | 11 |
|   |     | 1.2.1 Configuring the Programmable Interconnect Network $\ \ldots \ \ldots \ \ldots \ \ldots \ \ldots$ | 11 |
|   |     | 1.2.2 Loading and reading a value                                                                      | 11 |
|   |     | 1.2.3 Test Mode                                                                                        | 12 |
|   | 1.3 | Design Decisions                                                                                       | 12 |
|   | 1.4 | Block Diagrams                                                                                         | 13 |
|   |     | 1.4.1 Top Level                                                                                        | 13 |
|   |     | 1.4.2 Top Level With Test Mode                                                                         | 13 |
|   |     | 1.4.3 Top Level Bit Sliced                                                                             | 14 |
|   |     | 1.4.4 Parallel Load Shift Register                                                                     | 15 |
|   |     | 1.4.5 Programmable Interconnect Network                                                                | 16 |
|   | 1.5 | VHDL Models                                                                                            | 17 |
|   |     | 1.5.1 Top Level                                                                                        | 17 |
|   |     | 1.5.2 PIN                                                                                              | 18 |
|   |     | 1.5.3 PIN Slice                                                                                        | 19 |
|   |     | 1.5.4 Shifter                                                                                          | 20 |
|   |     | 1.5.5 Shifter Slice                                                                                    | 21 |
|   |     | 1.5.6 Gates                                                                                            | 22 |
|   | 1.6 | VHDL Test Benches                                                                                      | 23 |
|   |     | 1.6.1 Top Level Functional                                                                             | 23 |
|   |     | 1.6.2 Top Level Test Mode                                                                              | 25 |
|   |     | 1.6.3 PIN Slice                                                                                        | 26 |
|   |     | 1.6.4 Shifter Slice                                                                                    | 27 |
|   | 1.7 | VHDL Test Bench Results                                                                                | 28 |
|   |     | 1.7.1 Top Level Functional                                                                             | 28 |
|   |     | 1.7.2 Top Level Test Mode                                                                              | 28 |
|   | 1 0 | West-Division                                                                                          | 20 |

| <b>2</b> | Pro  | ogress Report 2                                     | <b>2</b> 9 |
|----------|------|-----------------------------------------------------|------------|
|          | 2.1  | Slice Layouts                                       | 30         |
|          |      | 2.1.1 PIN Slice Layout                              | 30         |
|          |      | 2.1.2 Shift Slice Layout                            | 31         |
|          | 2.2  | Slice IRSIM Results                                 | 32         |
|          |      | 2.2.1 PIN Slice IRSIM Results                       | 32         |
|          |      | 2.2.2 Shift Slice IRSIM Results                     | 34         |
|          | 2.3  | Slice Spice Results                                 | 36         |
|          |      | 2.3.1 PIN Slice Spice Results                       | 36         |
|          |      | 2.3.2 Shift Slice Spice Results                     | 38         |
|          | 2.4  | Gate Spice Results                                  | 40         |
|          |      | 2.4.1 DFFPOSX1 Spice Results                        | 40         |
|          |      | 2.4.2 AOI21X1 Spice Results                         | 41         |
|          |      | 2.4.3 MUX2X1 Spice Results                          | 42         |
|          |      | 2.4.4 INVX1 Spice Results                           | 43         |
|          |      | 2.4.5 Leaf Component Delay Summary                  | 43         |
|          | 2.5  | VHDL Models With Timing                             | 44         |
|          | 2.6  | VHDL Testbench Results With Timing                  | 45         |
|          |      | 2.6.1 VHDL Slice Testbench With Delays Waveform     | 45         |
|          |      | 2.6.2 VHDL Top Level Testbench With Delays Waveform | 45         |
|          | 2.7  | Final Simulation Comparision                        | 46         |
|          | 2.8  | Floor Plan                                          | 47         |
|          | 2.9  | Major Design Decisions                              | 48         |
|          | 2.10 | Work Division                                       | 48         |

# List of Figures

| 1.1  | Pinout Diagram                                                                                                                                                                                  | 9  |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 1.2  | PIN Configuration                                                                                                                                                                               | 11 |
| 1.3  | Loading a value                                                                                                                                                                                 | 11 |
| 1.4  | Loading a value and reading the result                                                                                                                                                          | 11 |
| 1.5  | Enabling test mode and loading all DFFs                                                                                                                                                         | 12 |
| 1.6  | 3 INTRANEX Chain                                                                                                                                                                                | 12 |
| 1.7  | Top Level Block Diagram (3-Bit Configuration)                                                                                                                                                   | 13 |
| 1.8  | Top Level Block Diagram Showing Test Mode Logic (3-Bit Configuration)                                                                                                                           | 13 |
| 1.9  | Top Level Bit Sliced Block Diagram (3-Bit Configuration)                                                                                                                                        | 14 |
| 1.10 | Parallel Load Bit-Sliced Shifter Register (3-Bit Configuration)                                                                                                                                 | 15 |
| 1.11 | Parallel Load Shifter Register Bit-Slice                                                                                                                                                        | 15 |
| 1.12 | $ \label{thm:bit-Sliced Programmable Interconnect Network (3-Bit Configuration) } \dots $ | 16 |
| 1.13 | Programmable Interconnect Network Bit-Slice                                                                                                                                                     | 16 |
| 1.14 | Top Level Generated RTL Diagram                                                                                                                                                                 | 17 |
| 1.15 | Pin Generated RTL Diagram                                                                                                                                                                       | 18 |
| 1.16 | Pin Slice Generated RTL Diagram                                                                                                                                                                 | 19 |
| 1.17 | Shifter Generated RTL Diagram                                                                                                                                                                   | 20 |
| 1.18 | Shifter Slice Generated RTL Diagram                                                                                                                                                             | 21 |
| 1.19 | Top Level Functional Test Bench Waveform                                                                                                                                                        | 28 |
| 1.20 | Top Level Test Mode Test Bench Waveform                                                                                                                                                         | 28 |
| 2.1  | PIN Slice Layout                                                                                                                                                                                | 30 |
| 2.2  | PIN Slice Layout Internal                                                                                                                                                                       | 30 |
| 2.3  | PIN Slice Layout                                                                                                                                                                                | 31 |
| 2.4  | PIN Slice Layout Internal                                                                                                                                                                       | 31 |
|      | PIN Slice IRSIM Functional Results                                                                                                                                                              | 32 |
| 2.5  |                                                                                                                                                                                                 |    |
| 2.6  | PIN Slice Critical Path                                                                                                                                                                         | 33 |
| 2.7  | PIN Slice IRSIM Critical Path Delay                                                                                                                                                             | 33 |
| 2.8  | Shift Slice IRSIM Functional Results                                                                                                                                                            | 34 |
| 2.9  | Shift Slice Critical Path                                                                                                                                                                       | 35 |

| 2.10 | Shift Slice IRSIM Critical Path Delay                                                                                                                                         | 35 |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.11 | PIN Slice Spice Functional Results                                                                                                                                            | 36 |
| 2.12 | PIN Slice Spice Critical Path Delay                                                                                                                                           | 37 |
| 2.13 | Shift Slice Spice Functional Results                                                                                                                                          | 38 |
| 2.14 | Shift Slice Spice Critical Path Delay                                                                                                                                         | 39 |
| 2.15 | DFFPOSX1 Spice Results                                                                                                                                                        | 40 |
| 2.16 | AOI21X1 Spice Results                                                                                                                                                         | 41 |
| 2.17 | MUX2X1 Spice Results                                                                                                                                                          | 42 |
| 2.18 | INVX1 Spice Results                                                                                                                                                           | 43 |
| 2.19 | VHDL PIN Slice With Delays Waveform                                                                                                                                           | 45 |
| 2.20 | VHDL Shift Slice With Delays Waveform                                                                                                                                         | 45 |
| 2.21 | VHDL Top Level Functional Test Bench With Delays Waveform $\dots \dots \dots$ | 45 |
| 2.22 | VHDL Top Level Test Mode Test Bench With Delays Waveform $\dots \dots \dots$  | 45 |
| 2.23 | Floor Plan Diagram                                                                                                                                                            | 47 |

Progress Report 1 Page 5 of 48

## List of Tables

| 1.1  | Pin Descriptions                       | 10 |
|------|----------------------------------------|----|
| 1.2  | Task Assignment                        | 28 |
| 2.1  | PIN Slice IRSIM Critical Path Delays   | 33 |
| 2.2  | Shift Slice IRSIM Critical Path Delays | 35 |
| 2.3  | PIN Slice Spice Critical Path Delays   | 37 |
| 2.4  | Shift Slice Spice Critical Path Delays | 39 |
| 2.5  | DFFPOSX1 Delays                        | 40 |
| 2.6  | AOI21X1 Delays                         | 41 |
| 2.7  | MUX2X1 Delays                          | 42 |
| 2.8  | INVX1 Delays                           | 43 |
| 2.9  | Worst Case Delay Summary               | 43 |
| 2.10 | Critical Path Delay Comparison         | 46 |
| 2.11 | Task Assignment                        | 48 |

# Listings

| 1.1  | Top Level VHDL Module                       | 17 |
|------|---------------------------------------------|----|
| 1.2  | PIN VHDL Module                             | 18 |
| 1.3  | PIN Slice VHDL Module                       | 19 |
| 1.4  | Parallel Load Shifter VHDL Module           | 20 |
| 1.5  | Parallel Load Shifter Slice VHDL Module     | 21 |
| 1.6  | AOI21X1 VHDL Module                         | 22 |
| 1.7  | DFFPOSX1 VHDL Module                        | 22 |
| 1.8  | INVX1 VHDL Module                           | 22 |
| 1.9  | MUX2X1 VHDL Module                          | 22 |
| 1.10 | Top Level VHDL Test Bench                   | 23 |
| 1.11 | Python Vector Generator                     | 24 |
| 1.12 | Top Level Test Mode VHDL Test Bench         | 25 |
| 1.13 | PIN Slice VHDL Test Bench                   | 26 |
| 1.14 | Parallel Load Shifter Slice VHDL Test Bench | 27 |
| 2.1  | Python PIN Slice IRSIM CMD File Generator   | 32 |
| 2.2  | Python Shift Slice IRSIM CMD File Generator | 34 |
| 2.3  | Python PIN Slice Spice File Generator       | 36 |
| 2.4  | Python Shift Slice Spice File Generator     | 38 |
| 2.5  | Python DFFPOSX1 Spice File Generator        | 40 |
| 2.6  | Python AOI21X1 Spice File Generator         | 41 |
| 2.7  | Python MUX2X1 Spice File Generator          | 42 |
| 2.8  | Python INVX1 Spice File Generator           | 43 |
| 2.9  | AOI21X1 VHDL Module With Delay              | 44 |
| 2.10 | DFFPOSX1 VHDL Module With Delay             | 44 |
| 2.11 | INVX1 VHDL Module With Delay                | 44 |
| 2.12 | MUX2X1 VHDL Module With Delay               | 44 |

## Chapter 1

# Progress Report 1

## 1.1 Pinout Diagram

The pinout diagram for INTRANEX is shown below in Figure 1.1. Pins that are currently unutilized will be assigned to various internal logic signals once the floorplan is finalized. Note the symmetry of the core functionality. This was done so that multiple INTRANEX chip can be directly chained together with minimal routing effort during PCB layout.



Figure 1.1: Pinout Diagram

Progress Report 1 Page 9 of 48

The table below shows each pin and its corresponding name, type, and a brief description of its functionality. Type is of either I (Input), O (Output), or P (Power).

| Pin # | Name  | Type | Description                    |
|-------|-------|------|--------------------------------|
| 1     | _     | -    | _                              |
| 2     | _     | -    | _                              |
| 3     | _     | -    | -                              |
| 4     | _     | -    | -                              |
| 5     | PSI   | I    | PIN serial input               |
| 6     | PCLKI | I    | PIN clock input                |
| 7     | TESTI | I    | Test Mode enable input         |
| 8     | SI    | I    | Serial input                   |
| 9     | SCLKI | I    | Serial clock input             |
| 10    | LDI   | I    | Parallel load input            |
| 11    | TII   | I    | Test inverter input            |
| 12    | TIO   | О    | Test interter output           |
| 13    | TFD   | I    | Test flip-flop D input         |
| 14    | TFC   | I    | Test flip-flop clock input     |
| 15    | TFQ   | О    | Test flop-flop Q output        |
| 16    | GND   | Р    | _                              |
| 17    | _     | -    | -                              |
| 18    | _     | -    | -                              |
| 19    | _     | -    | -                              |
| 20    | _     | -    | -                              |
| 21    | LDO   | О    | Parallel load output           |
| 22    | SCLKO | О    | Serial clock output            |
| 23    | SO    | О    | Serial output                  |
| 24    | TESTO | О    | Test Mode enable output        |
| 25    | PCLKO | О    | PIN clock output               |
| 26    | PSO   | О    | PIN serial output              |
| 27    | _     | -    | _                              |
| 28    | _     | -    | _                              |
| 29    | _     | -    | _                              |
| 30    | TSI   | I    | Test shift slice serial input  |
| 31    | TSO   | О    | Test shift slice serial output |
| 32    | TSCI  | I    | Test shift slice clock input   |
| 33    | TSCO  | О    | Test shift slice clock output  |
| 34    | TSLI  | I    | Test shift slice load input    |
| 35    | TSLO  | O    | Test shift slice load output   |
| 36    | VDD   | Р    | _                              |
| 37    | TPI   | I    | Test pin slice serial input    |
| 38    | TPO   | O    | Test pin slice serial output   |
| 39    | TPCI  | I    | Test pin slice clock input     |
| 40    | TPCO  | O    | Test pin slice clock output    |

Table 1.1: Pin Descriptions

Progress Report 1 Page 10 of 48

## 1.2 Chip Functionality

The major function of this chip is to take an N bit input and translate any bit position to any other position. This allows for commonly desired functionality such as bit reversing or nibble swapping. To accomplish this we use a N by N bit interconnect network known as the PIN (Programmable Interconnect Network). The PIN is configured to perform the desired bit mappings by clocking in the mappings using the PSI (PIN Shift Input) and PCLKI (PIN Clock Input) pins. The value to be manipulated, called the Input Value, Shift Value or Shifter Value, is then clocked in serially using the SI (Shifter Input) and SCLKI (Shifter Clock Input) pins. To obtain the result the LDI (Load Input) pin is pulled high and the SCLKI pin is pulsed to latch the result in to the shift register. Once the result is latched the LDI pin is de-asserted and the result can be clocked out of the SO (Shifter Output) pin. Note that the input value is clocked in MSB first and the output value is clocked out MSB first as well.

## 1.2.1 Configuring the Programmable Interconnect Network

A timing diagram illustrating the PIN configuration process for a 3-Bit INTRANEX is shown below. For a 3-Bit input value a 3x3 grid is required resulting in a PIN configuration vector of 9 Bits. The mapping for each of these bits is also labeled and will be explained further in later sections.



Figure 1.2: PIN Configuration

## 1.2.2 Loading and reading a value

Loading an input value is achieved by clocking the value in on the SI pin using the SCLKI pin. The LDI pin must be held low during this operation. The diagram below illustrates this process and shows the bit definitions of the value being clocked in.



Figure 1.3: Loading a value

After the value has been loaded in the result is clocked out in a similar fashion. To first latch the result the LDI pin needs to be held high and the SCLKI pin pulsed. The MSB of the result is now available on the SO pin. The LDI pin should now be held low while clocking out the remaining result bits.



Figure 1.4: Loading a value and reading the result

Progress Report 1 Page 11 of 48

#### 1.2.3 Test Mode

Test Mode is enabled by pulling the TESTI pin high. When this occurs the output of the internal input value shift register is rerouted to connect to the input of the PIN network bypassing its normal PSI input. Additionally the SCLKO signal is also routed to the PIN bypassing its normal PCLKI signal. Finally the PSO signal is routed to the SO pin. This allows values that are clocked in via the SI pin to propagate through the shifter and then through the PIN and then out the SO pin. The fact that the values come out the SO pin allows multiple INTRANEX chips to be directly chained and tested in circuit using only the SI and SCLKI pins of the first chip in the chain. Note that the LDI pin must be held low during this entire operation in order to ensure proper shifting through the input value shift register.



Figure 1.5: Enabling test mode and loading all DFFs

## 1.3 Design Decisions

When evaluating design concepts and possible solutions we prioritized a few key factors that we wanted to achieve. The first is a fully bit-sliced solution where each slice can directly connect to the next with minimal wiring overhead and zero additional logic. This will allow us to utilize Magics Array functionality to quickly build up our chip and allow us to easily scale to any desired size. As we see in later sections we were able to achieve a fully bit-sliced design with zero logic overhead.

In order to achieve totally minimized wiring overhead it would be necessary to design two different slice layouts, one of which is mirrored and flipped. This would allow each slice row in the PIN to share a power rail with the rows above and below it and also minimize the length of the row-to-row wiring. This design however greatly increases the complexity of the VHDL design as wiring the rows together becomes trickier. Additionally we would have to maintain two different versions of the PIN slices. We decided to instead go with a design where all slices are exactly identical and the interconnect between them is linear. This allows for easier calculation of PIN configuration values as every row has the same index order. The only real disadvantage to this design is that we will require long interconnects between slices. We are assuming for now that even with the added capacitance of these long interconnects we will still be able to achieve max clock speeds of greater than 50Mhz. By progress report 2 we will have layout simulation results to confirm this.

As stated earlier an important goal for us was to be able to directly chain multiple INTRANEX chips together. Our current design achieves this and an example chain showing 3 INTRANEXs chained together is shown below. Note that the pin layout in this diagram matches that of the actual layout we plan on implementing.



Figure 1.6: 3 INTRANEX Chain

Progress Report 1 Page 12 of 48

## 1.4 Block Diagrams

#### 1.4.1 Top Level

A top level block diagram for a 3-bit INTRANEX is shown below. The top module is the PIN and the bottom module is the parallel load shift register. Test mode logic has been excluded to more clearly illustrate the core functionality.



Figure 1.7: Top Level Block Diagram (3-Bit Configuration)

## 1.4.2 Top Level With Test Mode

The same top level diagram is shown with the addition of the test mode logic. The test mode logic simply consists of 3 2:1 multiplexers that redirect the output of the shift register to the input of the PIN and the output of the PIN to what is normally the output of the shift register. In other words, it wires in the PIN between the shifter and the shifters normal output pins.



Figure 1.8: Top Level Block Diagram Showing Test Mode Logic (3-Bit Configuration)

Progress Report 1 Page 13 of 48

## 1.4.3 Top Level Bit Sliced

The diagram below shows a bit sliced version of the top level diagram shown in Figure 1.7. We can see how each slice is directly connected together with zero interfacing logic as well as the long row-to-row connections as discussed earlier.



Figure 1.9: Top Level Bit Sliced Block Diagram (3-Bit Configuration)

Progress Report 1 Page 14 of 48

## 1.4.4 Parallel Load Shift Register

#### Bit-slicing Scheme

Looking at just the shift register we can see that it is a parallel load parallel output shifter that is easily extendable by simply tacking on additional slices.



Figure 1.10: Parallel Load Bit-Sliced Shifter Register (3-Bit Configuration)

#### **Bit-Slice**

Looking at the internals of a single shift slice we can see that is is just a 2:1 multiplexer and a D Flip Flop. The multiplexer determines if the slice should load either the value from the previous slice (SI) or the parallel input (Z). When LDI is 0 it uses the value of the previous slice and when it is a 1 it uses the parallel load value.



 ${\bf Figure~1.11:~Parallel~Load~Shifter~Register~Bit\text{-}Slice}$ 

Progress Report 1 Page 15 of 48

#### 1.4.5 Programmable Interconnect Network

#### Bit-slicing Scheme

The diagram below showns just the PIN in bit-slice form. One of the design decisions made while determining the slice interconnects was to also pass the PCLK from slice to slice. The alternative was to simply connect each slices PCLKI to the main PCLKI pin at a higher level. We wanted to avoid as much manual layout as possible so it determined to be easier and cleaner to route the clock in such as way that it would be automatically connected when we layout the slice array.



Figure 1.12: Bit-Sliced Programmable Interconnect Network (3-Bit Configuration)

#### **Bit-Slice**

The PIN bit-slices, one of which is shown below, is what drive the whole functionality of our chip. WI is the input values bit for the current column. If that bit is set and this slice is configured as 'connected' we want to output a logic high on the Z bus simultaneously. We cannot, however, just simply AND these two values together and attach it to the bus as this would allow for multiple slices to drive or sink the bus. To avoid this we use an OR gate to determine if the slice behind us is outputting a 1. If so we just pass it along. If we want to output a 1 it is also no problem as the OR will accommodate us as well.



Figure 1.13: Programmable Interconnect Network Bit-Slice

Progress Report 1 Page 16 of 48

## 1.5 VHDL Models

## 1.5.1 Top Level

```
library ieee;
use ieee.std_logic_1164.all;
               entity top is
                         generic(
    n : integer := 3
);
                         port (
                                   t(
  psi : in std_logic;
pso : out std_logic;
pclk : in std_logic;
si : in std_logic;
so : out std_logic;
                                   psi : in std.logic;
pso : out std.logic;
pclk : in std.logic;
si : in std.logic;
so : out std.logic;
sclk : in std.logic;
ld : in std.logic;
test : in std.logic
10
11
12
13
14
15
16
17
18
19
20
21
              end top;
              architecture rtl of top is
                         — output of pin signal z : std_logic_vector((n-1) downto 0) := (others \Rightarrow '0'); — parallel output of shifter signal w : std_logic_vector((n-1) downto 0) := (others \Rightarrow '0');
24
 25
 26
27
                         signal pin.clk : std_logic;
signal pin.psi : std_logic;
signal pin.pso : std_logic;
signal shift_out : std_logic;
28
 29
30
31
32
              begin
33
34
35
                         — test mode mux connects shifter and pin together test.mux.1: entity work.mux2x1 port map(pclk, sclk, test.mux.2: entity work.mux2x1 port map(psi, shift.out test.mux.3: entity work.mux2x1 port map(shift.out, pin.pso,
                                                                                                                                                                       sclk , test , pin_clk );
shift_out , test , pin_psi );
pin_pso , test , so );
36
37
38
39
                          pin : entity work.pin
                         generic map(
n => n
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
60
61
                         )
port map(
    clk ⇒ pin-clk,
    psi ⇒ pin-psi,
    pso ⇒ pin-pso,
    z ⇒ z,
    w ⇒ w
):
                         pso <= pin_pso;
                         shifter : entity work.shift
generic map(
                                  n \Rightarrow n
                          port map(
                                  clk => sclk,

si => si,

so => shift_out,

Id => Id,
 \frac{62}{63}
                                    );
              end rtl:
```

Listing 1.1: Top Level VHDL Module



 ${\bf Figure~1.14:~Top~Level~Generated~RTL~Diagram}$ 

Progress Report 1 Page 17 of 48

#### 1.5.2 PIN

```
library ieee;
use ieee.std_logic_1164.all;
                           generic (
n: integer := 3
);
                           );
port(
    clk : in std_logic;
    psi : in std_logic;
    pso : out std_logic;
    z : out std_logic.vector((n-1) downto 0);
    w : in std_logic_vector((n-1) downto 0)
^{10}_{11}
\frac{12}{13}
15
16
17
                end pin;
                architecture rtl of pin is
18
19
                             component pin_slice is
                                        port(
zi : in std_logic;
'm std_logic;
20
21
                                                   zi : In std_logic;
qi : in std_logic;
wi : in std_logic;
ci : in std_logic;
zo : out std_logic;
qo : out std_logic;
wo : out std_logic;
co : out std_logic;
22
23
24
25
26
27
28
29
30
31
32
                            end component;
                           — carray_array(row, col)
type carry_array is array (0 to n, 0 to n) of std_logic;
signal zc : carry_array;
signal cc : carry_array;
signal wc : carry_array;
signal qc : carry_array;
33
34
35
36
37
38
39
40
41
42
43
                           — setup first and last inputs for each row
z_connect : for i in 0 to n-1 generate
zc(i, 0) <= '0';
z(i) <= zc(i, n);
end generate;</pre>
45
46
47
48
                           — setup first inputs for each column w-connect : for i in 0 to n-1 generate wc(0, i) \le w(i); end generate;
49
50
51
52
                          — setup row transfer 
— (last output of row to first input of next row) r.connect : for i in 0 to n-2 generate qc(i, 0) \leqslant qc(i+1, n); cc(i, 0) \leqslant cc(i+1, n);
53
54
55
56
                             end generaté;
57
58
59
60
61
62
63
64
65
66
67
71
72
73
74
75
                           \begin{array}{ll} -- & connect & external & inputs \\ qc(n-1, 0) &<= psi; \\ cc(n-1, 0) &<= clk; \\ pso &<= qc(0, n); \end{array}
                           — generate the grid of slices
pin-z-gen: for zz in 0 to n-1 generate
pin-w-gen: for ww in 0 to n-1 generate
pin.i: pin.slice port map(
    zi ⇒ zc(zz, ww),
    qi ⇒ qc(zz, ww),
    wi ⇒ wc(zz, ww),
    ci ⇒ cc(zz, ww),
    zo ⇒ cz(zz, ww+1),
    qo ⇒ qc(zz, ww+1),
    wo ⇒ wc(zz+1, ww),
    co ⇒ cc(zz, ww+1)
};
76
77
78
                                          );
end generate;
                             end generate;
79
                end rtl;
```

Listing 1.2: PIN VHDL Module



Figure 1.15: Pin Generated RTL Diagram

Progress Report 1 Page 18 of 48

## 1.5.3 PIN Slice

Listing 1.3: PIN Slice VHDL Module



 $\textbf{Figure 1.16:} \ \, \textbf{Pin Slice Generated RTL Diagram}$ 

Progress Report 1 Page 19 of 48

#### 1.5.4 Shifter

```
library ieee;
use ieee.std_logic_1164.all;
              entity shift is
                        generic (
n: integer := 3
);
                     );
port(
    clk : in std_logic;
    ld : in std_logic;
    si : in std_logic;
    so : out std_logic;
    z : in std_logic;
    z : in std_logic-vector((n-1) downto 0);
    w : out std_logic_vector((n-1) downto 0)
^{10}_{11}
14
15
              );
end shift;
16
17
              architecture rtl of shift is
\frac{18}{19}
20
21
                        component shift_slice
                               pont(
    clki : in std_logic;
    clko : out std_logic;
    ldi : in std_logic;
    ldo : out std_logic;
    ldo : out std_logic;
    si : in std_logic;
    so : out std_logic;
    z : in std_logic;
    z : in std_logic;
22
23
24
25
26
27
28
29
30
31
32
                         end component;
                        — vector to hold values between slices
signal c.so: std_logic_vector(n downto 0) := (others ⇒ '0');
signal c.clk: std_logic_vector(n downto 0) := (others ⇒ '0');
signal c.ld: std_logic_vector(n downto 0) := (others ⇒ '0');
33
\begin{array}{c} 34 \\ 35 \\ 36 \\ 37 \\ 38 \\ 39 \\ 40 \\ 41 \\ 42 \\ 43 \\ 44 \\ 45 \\ 46 \\ 47 \\ 48 \\ 49 \\ 50 \\ 51 \\ 52 \\ \end{array}
                        — input of slice 0 comes from module input c_so(0) <= si; c_ld(0) <= ld;
                         c_clk(0) <= clk;
                         — final shift output comes from output of last slice
                        so <= c_so(n);
                        — generate N slices
shift_gen: for i in 0 to n−1 generate
shift_i: shift_slice port map(
clki ⇒ c.clk(i),
clko ⇒ c.clk(i+1),
ldi ⇒ c.ldk(i+1),
                                             | Idi => c_Id(i),
| Ido => c_Id(i+1),
| si => c_so(i),
| so => c_so(i+1),
53
54
55
56
                                             z \Rightarrow z(i)
57
58
59
60
                        end generate;
                        — connect the output of each slice to parallel output vector connect : for i in 0 to n-1 generate w(\,i) <= c.so\,(\,i+1);\\ end generate\,;
61
62
63
64
65
              end rtl;
```

 $\textbf{Listing 1.4:} \ \, \text{Parallel Load Shifter VHDL Module}$ 



Figure 1.17: Shifter Generated RTL Diagram

Progress Report 1 Page 20 of 48

## 1.5.5 Shifter Slice

Listing 1.5: Parallel Load Shifter Slice VHDL Module



 ${\bf Figure~1.18:~Shifter~Slice~Generated~RTL~Diagram}$ 

Progress Report 1 Page 21 of 48

#### 1.5.6 Gates

```
| The state of the
```

Listing 1.6: AOI21X1 VHDL Module

 $\textbf{Listing 1.7:} \ \, \text{DFFPOSX1 VHDL Module}$ 

```
library ieee;
use ieee.std.logic.1164.all;

a entity invx1 is
generic(delay: time:= 0 ps);
port(
a : in std.logic;
y : out std.logic
ned invx1;

architecture rtl of invx1 is begin
y <= not a after delay;
end rtl;
```

 $\textbf{Listing 1.8:} \ \, \text{INVX1 VHDL Module}$ 

Listing 1.9: MUX2X1 VHDL Module

Progress Report 1 Page 22 of 48

## 1.6 VHDL Test Benches

### 1.6.1 Top Level Functional

```
library ieee;
use ieee.std_logic_1164.all;
use std.textio.all;
use work.txt_util.all;
          entity top_tb is
                 generic(
    stim_file : string := "vectors_3_bit.sim"
         end top_tb;
11
12
13
         architecture tb_rtl of top_tb is
14
15
16
                 constant \ n \ : \ integer \ := \ 3;
                 signal psi : std_logic := '0';
17
18
19
20
21
                  signal pso
                                       : std_logic;
                 signal pclk : std_logic := '0';
signal si : std_logic := '0';
signal so : std_logic := '0';
                 signal so : std_logic;
signal sclk : std_logic := '0';
signal ld : std_logic := '0';
signal test : std_logic := '0';
22
23
24
                 25
26
27
                      );
port(
    psi : in std_logic;
    pso : out std_logic;
    pclk : in std_logic;
    si : in std_logic;
    so : out std_logic;
    sclk : in std_logic;
    ld : in std_logic;
    test : in std_logic;

28
32
33
34
35
36
37
38
39
                 end component;
\frac{40}{41}
\frac{42}{42}
                 43
45
46
47
48
49
50
51
                  file stimulus : TEXT open read_mode is stim_file;
          begin
                 uut : top
generic map(
                        n => n
52
53
54
55
56
57
58
59
60
61
62
63
                       t map(
psi => psi,
pso => pso,
pclk => pclk,
si => si,
so => so,
sclk => sclk,
ld => ld,
test => test
                        procedure clock_shifter is begin
66
67
68
69
70
71
72
73
74
75
76
                                sclk <= '1';
wait for 20 ns;
sclk <= '0';
wait for 20 ns;
                        end procedure clock_shifter;
                        procedure clock_pin is begin
                               pclk <= '1';
wait for 20 ns;
pclk <= '0';
wait for 20 ns;
77
78
79
80
81
                        end procedure clock-pin;
                        variable l: line;
variable pin_str: string(1 to n*n);
variable shf_str: string(1 to n);
                        while not endfile(stimulus) loop
                              — load stimulus for this test
readline(stimulus, | ); read(|, pin_str);
                               pin_vector <= to_std_logic_vector(pin_str);</pre>
                               readline(stimulus, I); read(I, shf_str);
shift_vector <= to_std_logic_vector(shf_str);</pre>
                                readline(stimulus, I); read(I, shf-str)
                                result_vector <= to_std_logic_vector(shf_str);
                                wait for 100 ns;
```

Progress Report 1 Page 23 of 48

```
-- clock in the pin
for i in 0 to (n*n)-1 loop
    psi <= pin_vector(i);
    wait for 20 ns;</pre>
                                 clock_pin;
end loop;
                                 -- clock in the value
for i in 0 to n-1 loop
    si <= shift_vector(i);
    wait for 20 ns;</pre>
110
111
                                          clock_shifter;
                                  end loop;
                                — pull latch high so the first result
— loop will trigger the latch
ld <= '1';
wait for 20 ns;</pre>
116
117
                                  — clock out result and check it
for i in 0 to n-1 loop
clock_shifter;
121
                                         assert so = result_vector(i) report "Test Failed!"; Id <= '0'; wait for 20 ns;
122
\frac{123}{124}
125
                                  end loop;
126
                          end loop;
                          report "Test Complete" severity note; wait;
129
130
                   end process;
132
133
           end tb_rtl;
```

Listing 1.10: Top Level VHDL Test Bench

We decided to write a small Python script to generate the expected output vector for all possible PIN configurations and input values. Our test bench then runs through all of these vectors and checks if the output vector from our VHDL design matches the known output.

Listing 1.11: Python Vector Generator

Progress Report 1 Page 24 of 48

### 1.6.2 Top Level Test Mode

```
use ieee.std_logic_1164.all;
            entity top_test_tb is
end top_test_tb;
             architecture tb_rtl of top_test_tb is
                      constant n : integer := 3;
 10
                     signal psi : std_logic := '0';
signal pso : std_logic;
signal pclk : std_logic := '0';
signal si : std_logic := '0';
signal so : std_logic;
 11
                     signal si : std_logic := 0;
signal so : std_logic;
signal sclk : std_logic := '0';
signal ld : std_logic := '0';
signal test : std_logic := '0';
 15
 16
17
 18
 19
20
21
                      component top
                               generic (
n : integer := n
22
 23
24
25
                                         psi
                                                     : in std_logic;
                                         pso : out std_logic;
pclk : in std_logic;
si : in std_logic;
 26
 27
28
29
                                        so : out std_logic;
sclk : in std_logic;
ld : in std_logic;
test : in std_logic
 30
31
32
33
34
35
36
                      end component;
             begin
37
38
39
40
41
42
43
44
                      uut : top
generic map(
                               n \Rightarrow n
                     )
port map(
    psi => psi,
    pso => pso,
    pclk => pclk,
    si => si,
    so => so,
    sclk => sclk,
    ld => ld,
    test => test
};
45
46
47
48
49
50
51
52
                      );
53
54
55
56
                               procedure clock is begin
                                        sclk <= '1';
wait for 20 ns;
sclk <= '0';
wait for 20 ns;
57
58
60
61
62
63
64
65
66
67
70
71
72
73
74
75
76
77
78
                     wait for 20 ns;
end procedure clock;
begin
                               wait for 20 ns;
                               — pull test line high to enable test mode test <= '1';
                               __ clock in a '1' si <= '1';
                               si <= '1';
wait for 20 ns;
clock;
                               -- clock in a '0'
si <= '0';
wait for 20 ns;</pre>
                               clock:
                              — push the pulse through till just before the last FF — (n*n)+n = number of flip flops — 2 = we already did two clocks — 1 = we want to stop before before final output for i in 1 to (n*n)+n-2-1 loop
 80
81
82
                                          clock:
                               end loop;
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
                               — check to make sure the bit in front of the pulse is 0 assert so = '0' report "Bit leading pulse not 0"; clock;
— check to make sure the pulse is 1 assert so = '1' report "Pulse is not 1";
                                clock;
                                — check to make sure the bit behind pulse is 0 assert so = '0' report "Bit trailing pulse not 0";
                               report "Test Complete" severity note;
wait;
                      end process;
            end tb_rtl;
```

Listing 1.12: Top Level Test Mode VHDL Test Bench

Progress Report 1 Page 25 of 48

### 1.6.3 PIN Slice

```
use ieee.std_logic_1164.all;
             entity pin_slice_tb is
end pin_slice_tb;
             architecture tb_rtl of pin_slice_tb is
                      signal zi : std_logic := '0';
signal qi : std_logic := '0';
signal wi : std_logic := '0';
signal ci : std_logic := '0';
signal zo : std_logic;
signal zo : std_logic;
10
11
14
15
16
17
                        signal wo : std_logic;
signal co : std_logic;
18
19
20
21
                        component pin_slice
                                 port(
zi : in std_logic;
                                          zi : in std_logic;
qi : in std_logic;
wi : in std_logic;
ci : in std_logic;
zo : out std_logic;
qo : out std_logic;
wo : out std_logic;
co : out std_logic
22
23
24
25
26
27
28
29
                       end component;
30
31
32
                      \begin{array}{ll} \text{uut} : \ \text{pin\_slice} \\ \text{port} \ \text{map}( \\ \text{zi} \Rightarrow \text{zi}, \\ \text{qi} \Rightarrow \text{qi}, \\ \text{wi} \Rightarrow \text{wi}, \\ \text{ci} \Rightarrow \text{ci}, \\ \text{zo} \Rightarrow \text{zo}, \\ \text{qo} \Rightarrow \text{qo}, \\ \text{wo} \Rightarrow \text{wo}, \\ \end{array}
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
                        process
                                 type pattern_type is record
                                 — inputs
zi, qi, wi: std_logic;
— output
zo: std_logic;
end record;
48
49
50
51
52
                                 53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
71
72
73
74
75
77
78
79
                                         check each pattern
                                 for i in patterns' range loop
                                          -- set the inputs
zi <= patterns(i).zi;
qi <= patterns(i).qi;
wi <= patterns(i).wi;
wait for 10 ns;</pre>
                                           — pulse the clock and check clock passthrough ci <= '1':
                                           — pulse the clock and check clock passthrough ci \leq '1'; wait for 10 ns; assert co = '1' report "CO does not equal 1" severity error; ci \leq '0'; wait for 10 ns; assert co = '0' report "CO does not equal 0" severity error;
80
81
82
83
84
85
                                           — check the outputs
assert qo = patterns(i).qi report "Ql not equal QO" severity error;
assert zo = patterns(i).zo report "ZO does not match pattern" severity error;
86
87
88
89
90
                                 end loop;
                                 report "Test Complete" severity note;
                                  wait:
             end th rtl:
```

Listing 1.13: PIN Slice VHDL Test Bench

Progress Report 1 Page 26 of 48

### 1.6.4 Shifter Slice

```
use ieee.std_logic_1164.all;
          entity shift_slice_tb is
end shift_slice_tb;
           architecture tb_rtl of shift_slice_tb is
                   signal clki
signal clko
                                              : std_logic := '0';
10
                                              : std_logic;
                                              : std_logic;
: std_logic := '0';
: std_logic;
: std_logic := '0';
                   signal Idi
signal Ido
11
                    signal si
14
                   signal so
                                              : std_logic:
                                              : std_logic := '0';
15
                   signal z
16
17
                   component shift_slice is
                         | pont | clki | : in | std_logic; | clko | : out | std_logic; | ldi | : in | std_logic; | ldi | : in | std_logic; | ldo | : out | std_logic; | si | : in | std_logic; | so | : out | std_logic; | so | : out | std_logic; | z | : in | std_logic; | z | : in | std_logic; | z | : in | std_logic; | |
18
19
20
21
22
23
24
25
26
27
28
29
                   end component;
          begin
30
31
32
                  33
34
35
36
37
38
39
40
41
42
43
                   );
                           type pattern_type is record
                          45
\frac{46}{47}
48
49
50
51
52
                          53
54
55
56
\begin{array}{c} 57 \\ 58 \\ 59 \\ 60 \\ 61 \\ 62 \\ 63 \\ 64 \\ 65 \\ 66 \\ 67 \\ 70 \\ 71 \\ 72 \\ 73 \\ 74 \\ 75 \end{array}
                   begin
                           — check each pattern
for i in patterns 'range loop
                                        set the inputs
                                  -- set the inputs
Idi <= patterns(i).Idi;
z <= patterns(i).z;
si <= patterns(i).si;
wait for 10 ns;</pre>
                                   — pulse the clock and check clock passthrough clki <= '1'; wait for 10 ns; assert clko = '1' report "SCLKO does not equal 1" severity error; assert ldo = patterns(i).ldi report "SCLKO does not equal 1" severity error; clki <= '0'; wait for 10 ns; assert clko = '0' report "SCLKO does not equal 0" severity error; assert ldo = patterns(i).ldi report "SCLKO does not equal 1" severity error;
76
77
78
79
82
                                         check the output
83
84
85
                                   assert so = patterns(i).so report "SO is incorrect" severity error;
                           end loop;
86
87
88
89
                           report "Test Complete" severity note;
wait;
90
                   end process;
           end tb_rtl;
```

Listing 1.14: Parallel Load Shifter Slice VHDL Test Bench

Progress Report 1 Page 27 of 48

### 1.7 VHDL Test Bench Results

## 1.7.1 Top Level Functional

While our top level functional testbench is completely automated and does an exhaustive test on all possible inputs an example waveform is shown below. We can first see that we clock in a PIN configuration vector of 000001000 which enables slice Z1W2. We then shift in a value of 001. Given these input vectors we expect the output vector to be 010. We can see from the waveform below that we achieve the expected result.



Figure 1.19: Top Level Functional Test Bench Waveform

## 1.7.2 Top Level Test Mode

Our top level test mode testbench is also completely automated. For this test we simply send a pulse through all the flip flops and count the number of clock cycles it takes for the pulse to come out the other end. For a 3-Bit configuration there are 3\*3+3 flip flops so we expect the pulse to appear at the output after 12 clock pulses. We can see from the waveform below that we achieve the expected output.



Figure 1.20: Top Level Test Mode Test Bench Waveform

## 1.8 Work Division

| Task                         | Person        |
|------------------------------|---------------|
| Pinout Diagram               | Both          |
| Explanation of Functionality | Both          |
| Design Decisions             | Both          |
| Top Level Block Diagrams     | Both          |
| Shifter Block Diagrams       | $\mathrm{Qi}$ |
| PIN Block Diagrams           | Thrun         |
| VHDL Shifter+TB              | $\mathrm{Qi}$ |
| VHDL PIN+TB                  | $\mathrm{Qi}$ |
| VHDL Top+TB                  | Thrun         |
| VHDL Top Test Mode TB        | Thrun         |

Table 1.2: Task Assignment

Progress Report 1 Page 28 of 48

## Chapter 2

# Progress Report 2

## 2.1 Slice Layouts

## 2.1.1 PIN Slice Layout

The PIN slice layout consists of 3 cells from the provided library. They were arranged as to provided maximum material density and uniformity among the power rails. The connections in and out of the slice are arranged such that slices can be directly patterned together with little to no additional connections at a higher level.



Figure 2.1: PIN Slice Layout



Figure 2.2: PIN Slice Layout Internal

Progress Report 1 Page 30 of 48

## 2.1.2 Shift Slice Layout

Like the PIN slice, the Shift slice layout consists of 3 cells from the provided library. Again, we chose to keep a linear layout to maintain a uniform power rail between slices. Also, like the PIN slice, the connections in and out of this slice are laid out in such a manner that allows for direct patterning of slices with no additional work required.



Figure 2.3: PIN Slice Layout



 $\textbf{Figure 2.4:} \ \, \textbf{PIN Slice Layout Internal}$ 

Progress Report 1 Page 31 of 48

## 2.2 Slice IRSIM Results

#### 2.2.1 PIN Slice IRSIM Results

In order to test the PIN slice functionally a Python script was developed that translates our VHDL testbench patterns into a IRSIM command file. The output CMD file is not included here because of its length and the fact that it can be inferred from the Python script shown below.

```
with open("pin_slice.cmd", "w") as f:

f.write("stepsize 10\n")
f.write("logfile pin_slice.log\n")
f.write("logfile pin_slice.log\n")
f.write("logfile pin_slice.log\n")
f.write("lokD\n")
f.write("lokD\n")
f.write("lokD\n")
f.write("vector CLK Cl\n")
f.write("ana Cl Zl Ql Wl ZO QO\n")
f.write("ana Cl Zl Ql Wl ZO QO\n")

# zi qi wi zo

a patterns = {
    ('l','l','l','l','l'),
    ('l','l','l','l'),
    ('l','l','l','l'),
    ('l','l','l','l'),
    ('l','l','l','l'),
    ('h','l','l','l','l'),
    ('h','l','l','h'),
    ('h','l','l','h'),
    ('h','h','l','h'),
    ('h','h','h','h'),
    ('h','h','h','h'),
    ('h','h','h','h'),
    ('h','h','h','h'),
    ('h','h','h','h'),
    ('h','h','h','h'),
    ('h','h','h','h'),
    ('h','h','h','h','h')
f.write("% Zl\n" % zl)
f.write("% Zl\n" % zl)
f.write("% Sl\n" % wi)
f.write("c CLK\n")
```

Listing 2.1: Python PIN Slice IRSIM CMD File Generator

The resulting IRSIM waveform, shown below, illustrates that we achieve correct functional behavior for our slice layout.



Figure 2.5: PIN Slice IRSIM Functional Results

Progress Report 1 Page 32 of 48

The critical path through the PIN slice is highlighted in red in the diagram below. Knowing this path we constructed a simple CMD file to toggle the qi pin high and low on two consecutive clock cycles. This gives us a rising and falling edge through the critical path that we were able to measure using the PATH command in IRSIM.



Figure 2.6: PIN Slice Critical Path

The textual output from IRSIM is shown below:

```
Q0=0 WI=1 QI=0 ZI=0 Z0=0 CI=1
time = 20.000ns
Q0=1 WI=1 QI=1 ZI=0 Z0=1 CI=1
time = 40.000ns
critical path for last transition of ZO:
 CI -> 1 @ 30.000ns , node was an input
 DFFP0SX1_1/a_66_6# -> 0 @ 30.14lns
                                        (0.141ns)
 Q0 -> 1 @ 30.230ns
                       (0.089ns)
 INVX1 0/A -> 0 @ 30.285ns
                               (0.055ns)
 ZO -> 1 @ 30.286ns
                       (0.001ns)
Q0=0 WI=1 QI=0 ZI=0 Z0=0 CI=1
time = 60.000ns
critical path for last transition of ZO:
 CI -> 1 @ 50.000ns , node was an input
 DFFP0SX1 1/a 2 6# -> 0 @ 50.039ns
                                       (0.039ns)
 DFFP0SX1 1/a 66 6# -> 1 @ 50.198ns
                                        (0.159ns)
 Q0 -> 0 @ 50.290ns
                       (0.092ns)
 INVX1 0/A -> 1 @ 50.347ns
                               (0.057ns)
 ZO -> 0 @ 50.348ns
                       (0.00lns)
```

Figure 2.7: PIN Slice IRSIM Critical Path Delay

Looking at the output we can see the two delays of the critical path. The first of the two delays is the output value going from a 0 to a 1 and the second is the output going from a 1 to a 0. The table below tabulates the two delays and indicates that the falling edge delay was the worse of the two.

| State Change | Delay  |       |
|--------------|--------|-------|
| 0            | 0.286n |       |
| 1            | 0.348n | WORST |

 $\textbf{Table 2.1:} \ \, \textbf{PIN Slice IRSIM Critical Path Delays}$ 

Progress Report 1 Page 33 of 48

## 2.2.2 Shift Slice IRSIM Results

Similarly to the PIN slice, for the Shift Slice we used the same Python script to generate a CMD file that matched our VHDL testbench patterns. The script is shown below:

```
with open("shift_slice.cmd", "w") as f:

f.write("stepsize 10\n")

f.write("logfile shift_slice.log\n")

f.write("logfile shift_slice.log\n")

f.write("logfile shift_slice.log\n")

f.write("logfile shift_slice.log\n")

f.write("loghile shift_slice.log\n")
```

Listing 2.2: Python Shift Slice IRSIM CMD File Generator

Looking at the results we can see that our Shift Slice performs as expected and matches our VHDL simulations.



Figure 2.8: Shift Slice IRSIM Functional Results

Progress Report 1 Page 34 of 48

The critical path through the shift slice is highlighted in red in the diagram below. Again, knowing this path we constructed a simple CMD file to drive a pulse through the path.



Figure 2.9: Shift Slice Critical Path

The textual output from IRSIM, shown below, provides us with two critical path delays one for the rising edge of the output and one for the falling edge.

```
S0=0 Z=0 SI=0 LDI=0 SCLKI=1
time = 20.000ns
S0=1 Z=0 SI=1 LDI=0 SCLKI=1
time = 40.000ns
critical path for last transition of SO:
  SCLKI -> 1 @ 30.000ns , node was an input
  DFFP0SX1 0/a 66 6# -> 0 @ 30.14lns
                                        (0.141ns)
  SO -> 1 @ 30.181ns
                      (0.040ns)
S0=0 Z=0 SI=0 LDI=0 SCLKI=1
time = 60.000ns
critical path for last transition of SO:
  SCLKI -> 1 @ 50.000ns , node was an input
  DFFP0SX1 0/a 2 6# -> 0 @ 50.040ns
                                       (0.040ns)
  DFFP0SX1 0/a 66 6# -> 1 @ 50.200ns
                                        (0.160ns)
  SO -> 0 @ 50.242ns
                       (0.042ns)
```

Figure 2.10: Shift Slice IRSIM Critical Path Delay

Looking at the output we can see that again the falling edge has a greater propagation delay through the path. The table below summarizes the results for this slice.

| State Change | Delay  |       |
|--------------|--------|-------|
| 0            | 0.181n |       |
| 1            | 0.242n | WORST |

Table 2.2: Shift Slice IRSIM Critical Path Delays

Progress Report 1 Page 35 of 48

## 2.3 Slice Spice Results

## 2.3.1 PIN Slice Spice Results

We wanted to be able to functionally test our slices in HSpice in addition to analyzing the propagation delay. To do this another Python script was written that took our test patterns and wrote out the required Piecewise Linear (PWL) statements to generate them.

Listing 2.3: Python PIN Slice Spice File Generator

In the figure below we can see the result of our functional Spice test. As you can see, it again matches our expected functionality once again proving the slice is operating correctly.



Figure 2.11: PIN Slice Spice Functional Results

Progress Report 1 Page 36 of 48

In order to measure the critical path delay the patterns in the above Python program were modified to simply toggle the input line QI. The resulting output was then measured against the input clock CI to obtain the delays for both rising and falling edges.



Figure 2.12: PIN Slice Spice Critical Path Delay

The delay times for each of these state changes is tabulated below.

| State Change | Delay  |       |
|--------------|--------|-------|
| 0            | 0.638n |       |
| 1            | 0.792n | WORST |

 ${\bf Table~2.3:~PIN~Slice~Spice~Critical~Path~Delays}$ 

Progress Report 1 Page 37 of 48

### 2.3.2 Shift Slice Spice Results

Similarly to the PIN slice spice tests we again wanted to be able to test the logical functionality of our slice in HSpice. The same Python script was utilized with a few tweaks to the variable names and pattern definitions in order to match this slice.

```
# Idi
                                      '0'),
'5'),
'0'),
'5'),
'0'),
'5'),
'0'),
'5')
                       '0
8
9
10
11
12
13
14
15
                open("shift_slice_all.sp", "w") as f:
                     f.write("* Shift Slice Test All\n")
                     f.write(".include ../../models/model_t36s.sp\n")
f.write(".include ../magic/shift_slice.spice\n")
19
                           22
23
24
25
26
                     f.write ("VDD vdd gnd 5V \backslash n") \\
                     f.write("Vsclki SCLKI gnd PULSE(0V 5V 10n 0 0 10n 20n)\n")
                     o_ldi = "
27
28
29
                     o_z = ""
o_si = ""
30
31
32
33
34
35
36
37
                     f.write("Vldi ldi gnd PWL(%s)\n" % o_ldi
f.write("Vz z gnd PWL(%s)\n" % o_z)
f.write("Vsi si gnd PWL(%s)\n" % o_si)
                     f.write("Vz z
f.write("Vsi si
39
40
41
                     \begin{array}{lll} f. \ write (".option \ post \ ") \\ f. \ write (".tran \ 0.01n \ \%dn \ ") \\ f. \ write (".end \ ") \end{array}
```

Listing 2.4: Python Shift Slice Spice File Generator

From the output waveform below we can see that our slice performed as we expected at a logical level.



Figure 2.13: Shift Slice Spice Functional Results

Progress Report 1 Page 38 of 48

To obtain the delays the input pattern was modified to send a single pulse through the critical path. The output was then referenced to the input clock in order to measure the propagation delay of each edge.



Figure 2.14: Shift Slice Spice Critical Path Delay

The delays of the rising and falling edges are tabulated below.

| State Change | Delay  |       |
|--------------|--------|-------|
| 0            | 0.316n |       |
| 1            | 0.455n | WORST |

Table 2.4: Shift Slice Spice Critical Path Delays

Progress Report 1 Page 39 of 48

## 2.4 Gate Spice Results

In order to figure out the worst case delay of each gate we generate an exhaustive list of input patterns that toggle the gate inputs in every such state that results in the outputs changing. With this we are trying to find the input state change that causes the worst case delay. We then measure each delay using the .measure directive and look for worst delay time.

### 2.4.1 DFFPOSX1 Spice Results

Listing 2.5: Python DFFPOSX1 Spice File Generator



Figure 2.15: DFFPOSX1 Spice Results

| State Change | Delay              |       |
|--------------|--------------------|-------|
| 0            | 0.3011n            |       |
| 1            | $0.4257\mathrm{n}$ | WORST |

Table 2.5: DFFPOSX1 Delays

Progress Report 1 Page 40 of 48

### 2.4.2 AOI21X1 Spice Results

```
# C A B Y
                                                             .'0'. '5').
.'5'. '5').
.'0'. '5').
.'0'. '5').
.'0'. '0').
.'0'. '0').
.'0'. '0').
.'0'. '0').
10
11
                  )
15
16
17
                  with open ("aoi21 \times 1\_test.sp", "w") as f :
                                             f.write("* AOI21X1 Test\n")
18
19
                                             f.write(".include ../../models/model_t36s.sp\n")
f.write(".include ../magic/AOI21X1.spice\n")
20
21
                                             for n in ("a", "b", "c", "y"):
f.write(".ic v(%s) = 0\n" % n)
22
23
24
25
                                             f.write("V1 vdd gnd 5V\n")
\begin{array}{c} 26 \\ 27 \\ 28 \\ 29 \\ 30 \\ 31 \\ 32 \\ 33 \\ 34 \\ 45 \\ 44 \\ 45 \\ 44 \\ 45 \\ 46 \\ 47 \\ 48 \\ 49 \\ 50 \\ 51 \\ 52 \\ 53 \\ 55 \\ 56 \\ \end{array}
                                             o_a = ""
o_b = ""
o_c = ""
                                            i = 0
for state.1, state.2 in itertools.permutations(patterns, 2):
    # if this state change doesn't change the output, skip it
    if state.1[-1] == state.2[-1]: continue
    # if more than one input changed skip it
    if sum(1 for x, y in zip(state.1[:-1], state.2[:-1]) if x != y) > 1: continue
    # otherwise execute the state change
    print (state.1, state.2)
    for c, a, b, y in (state.1, state.2):
        o.a += "%dn %sV %fn %sV " % (i*20, a, (i+1)*20-0.00001, a)
        o.b += "%dn %sV %fn %sV " % (i*20, b, (i+1)*20-0.00001, b)
        o.c += "%dn %sV %fn %sV " % (i*20, c, (i+1)*20-0.00001, c)
        i += 1
                                             print i
                                             f.write("Va a gnd PWL(%s)\n" % o.a) f.write("Vb b gnd PWL(%s)\n" % o.b) f.write("Vc c gnd PWL(%s)\n" % o.c)
                                             \begin{array}{lll} f.\,write\,\big("\,.\,option\,\,post\,\backslash n"\,\big) \\ f.\,write\,\big("\,.\,tran\,\,0.01n\,\,\%dn\,\backslash n"\,\,\%\,\,\big(\,i*20\big)\big) \end{array}
                                              # measure each crossing for n in range (0 , (i/2)): f. write (".meas tran delay_%d when v(y)=2.5 td=%sn cross=1\n" % (n,(n*40)+10))
```

Listing 2.6: Python AOI21X1 Spice File Generator



Figure 2.16: AOI21X1 Spice Results

| State Change | Delay    |       |
|--------------|----------|-------|
| 0            | 0.1075n  |       |
| 1            | 0.1577n  | WORST |
| 2            | 0.1207 n |       |
| 3            | 0.1434n  |       |
| 4            | 0.1076n  |       |
| 5            | 0.1457n  |       |
| 6            | 0.1150n  |       |
| 7            | 0.0491n  |       |
| 8            | 0.0871n  |       |
| 9            | 0.0580n  |       |

Table 2.6: AOI21X1 Delays

Progress Report 1 Page 41 of 48

### 2.4.3 MUX2X1 Spice Results

```
import itertools
                  # S A B Y
                                                             .'0', '5'),
.'5', '0'),
.'0', '5'),
.'5', '0'),
.'0', '5'),
.'5', '5'),
.'0', '0'),
.'5', '0')
10
11
                 )
15
16
17
                  with open ("\,mux2x1\_test.sp"\,,~"w"\,) as f\colon
                                            f.write("* MUX2X1 Test\n")
18
19
                                             f.write(".include ../../models/model_t36s.sp\n")
f.write(".include ../magic/MUX2X1.spice\n")
20
21
                                             for n in ("s", "a", "b", "y"):
f.write(".ic v(%s) = 0\n" % n)
22
\begin{array}{c} 23 \\ 24 \\ 25 \\ 26 \\ 27 \\ 28 \\ 29 \\ 30 \\ 31 \\ 32 \\ 33 \\ 34 \\ 35 \\ 36 \\ 37 \\ 38 \\ 40 \\ 41 \\ 42 \\ 43 \\ 44 \\ 45 \\ 46 \\ 47 \\ 48 \\ 49 \\ 55 \\ 55 \\ 56 \\ \end{array}
                                             f.write("V1 vdd gnd 5V\n")
                                             o_a = ""
o_b = ""
o_s = ""
                                            i = 0
for state.1, state.2 in itertools.permutations(patterns, 2):
    # if this state change doesn't change the output, skip it
    if state.1[-1] == state.2[-1]: continue
    # if more than one input changed skip it
    if sum(1 for x, y in zip(state.1[:-1], state.2[:-1]) if x != y) > 1: continue
    # otherwise execute the state change
    print (state.1, state.2)
    for s, a, b, y in (state.1, state.2):
        o.s += "%dn %sV %fn %sV " % (i*20, s, (i+1)*20-0.001, s)
        o.a += "%dn %sV %fn %sV " % (i*20, a, (i+1)*20-0.001, a)
        o.b += "%dn %sV %fn %sV " % (i*20, b, (i+1)*20-0.001, b)
        i += 1
                                             print i
                                             f.write("Vs s gnd PWL(%s)\n" % o.s) f.write("Va a gnd PWL(%s)\n" % o.a) f.write("Vb b gnd PWL(%s)\n" % o.b)
                                             \begin{array}{lll} f.\,write\,\big("\,.\,option\,\,post\,\backslash n"\,\big) \\ f.\,write\,\big("\,.\,tran\,\,0.01n\,\,\%dn\,\backslash n"\,\,\%\,\,\big(\,i*20\big)\big) \end{array}
                                              # measure each crossing for n in range (0 , (i/2)): f. write (".meas tran delay_%d when v(y)=2.5 td=%sn cross=1\n" % (n,(n*40)+10))
```

Listing 2.7: Python MUX2X1 Spice File Generator



Figure 2.17: MUX2X1 Spice Results

| Delay    |                                                                                                                       |
|----------|-----------------------------------------------------------------------------------------------------------------------|
| 0.1536n  |                                                                                                                       |
| 0.1342n  |                                                                                                                       |
| 0.2569n  |                                                                                                                       |
| 0.1542n  |                                                                                                                       |
| 0.0832n  |                                                                                                                       |
| 0.1340n  |                                                                                                                       |
| 0.1390n  |                                                                                                                       |
| 0.2571n  | WORST                                                                                                                 |
| 0.1391n  |                                                                                                                       |
| 0.0788n  |                                                                                                                       |
| 0.1464n  |                                                                                                                       |
| 0.1467 n |                                                                                                                       |
|          | 0.1536n<br>0.1342n<br>0.2569n<br>0.1542n<br>0.0832n<br>0.1340n<br>0.1390n<br>0.2571n<br>0.1391n<br>0.0788n<br>0.1464n |

Table 2.7: MUX2X1 Delays

Progress Report 1 Page 42 of 48

## 2.4.4 INVX1 Spice Results

```
patterns = ("0", "5", "0")

with open("invx1_test.sp", "w") as f:

f.write("* INVX1 Test\n")

f.write(".include ../../models/model_t36s.sp\n")

f.write(".include ../../magic/INVX1.spice\n")

for n in ("a", "y"):

f.write(".ic v(%s) = 0\n" % n)

f.write("v1 vdd gnd 5V\n")

for i, d in enumerate(patterns):

o.a = ""

for i, d in enumerate(patterns):

o.a += "%dn %sV %fn %sV " % (i *20, d, (i+1)*20-0.001, d)

f.write("Va a gnd PML(%s)\n" % o.a)

f.write(".tran 0.01n %dn\n" % (len(patterns)*20))

# measure each crossing

f.write(".meas tran delay.0 when v(y)=2.5 td=30n cross=1\n")

f.write(".meas tran delay.1 when v(y)=2.5 td=30n cross=1\n")

f.write(".end\n")
```

Listing 2.8: Python INVX1 Spice File Generator



Figure 2.18: INVX1 Spice Results

| State Change | Delay    |       |
|--------------|----------|-------|
| 0            | 0.0550 n | WORST |
| 1            | 0.0407n  |       |

Table 2.8: INVX1 Delays

### 2.4.5 Leaf Component Delay Summary

A table summarizing the worst delays for each gate is shown below.

| Component | Worst Delay |
|-----------|-------------|
| AOI21X1   | 0.1577n     |
| DFFPOSX1  | 0.4257 n    |
| INVX1     | 0.0550 n    |
| MUX2X1    | 0.2571n     |

Table 2.9: Worst Case Delay Summary

Progress Report 1 Page 43 of 48

# 2.5 VHDL Models With Timing

Using the worst case leaf delays, found above, we updated our VHDL models in order to take the delay into account. All other modules and testbenches required no changes and were left as is.

Listing 2.9: AOI21X1 VHDL Module With Delay

Listing 2.10: DFFPOSX1 VHDL Module With Delay

Listing 2.11: INVX1 VHDL Module With Delay

```
library ieee;
use ieee.std.logic.1164.all;

entity mux2x1 is
generic(delay : time := 0.2571 ns);
pprt(
a : in std.logic;
b : in std.logic;
s : in std.logic;
y : out std.logic
);
end mux2x1;

architecture rtl of mux2x1 is begin
process(a, b, s) begin
if if (s = '1') then
y <= b after delay;
else
y <= a after delay;
end process;
end rtl;
```

Listing 2.12: MUX2X1 VHDL Module With Delay

Progress Report 1 Page 44 of 48

# 2.6 VHDL Testbench Results With Timing

#### 2.6.1 VHDL Slice Testbench With Delays Waveform

Looking at the outputs of the slice test benches we can see that they perform as expected and match not only the original test benches but the IRSIM and HSpice functional test waveforms. Since we know the delays of each gate and which gates are in each cell there is no need to show a zoomed in waveform illustrating the delay in VHDL, we can simply add up the delays manually as there is no other delay introduced in the simulation.



Figure 2.19: VHDL PIN Slice With Delays Waveform



 ${\bf Figure~2.20:~VHDL~Shift~Slice~With~Delays~Waveform}$ 

#### 2.6.2 VHDL Top Level Testbench With Delays Waveform

Looking at the results of the top level testbenches for both normal and test mode we can see that they are again identical to the waveforms captured without delays. Additionally, since our testbench is exhaustive and self-checking we can be certain that the delays did not introduce any corner cases that one might miss if they are spot checking manually.



Figure 2.21: VHDL Top Level Functional Test Bench With Delays Waveform



Figure 2.22: VHDL Top Level Test Mode Test Bench With Delays Waveform

Progress Report 1 Page 45 of 48

## 2.7 Final Simulation Comparision

Taking a look at the final critical path delay summary we can see that there is a bit of discrepancy between the different simulations. We are not one hundred percent certain why this exists but our best guesses lead to explaining it as an artifact of the different simulation techniques used by each simulator and what parameters they take into account. While there are slight discrepancies all the worst case delays are under 1ns which gives us a theoretical **max clock speed of 1.3GHz**. Since our design is basically all shift registers we should also be able to achieve a throughput equal to the max clock rate. Once we simulate the full layout, which will introduce some long traces between slice rows, we will be able to determine a more accurate maximum clock and throughput rates.

| Simulation | PIN    | Shift  |
|------------|--------|--------|
| IRSIM      | 0.348n | 0.242n |
| SPICE      | 0.792n | 0.455n |
| VHDL       | 0.638n | 0.682n |

Table 2.10: Critical Path Delay Comparison

Progress Report 1 Page 46 of 48

### 2.8 Floor Plan

The current floorplan that we plan on pursuing is shown below. From initial placement testing we believe we will be able to achieve a **15x15** grid of slices. The majority of the core functionality is contained in a nice, symmetrically sliced, square. The only additional components that fall outside of this model are the 3 MUXs that are required for test mode. The floor plan shown below indicates the planned location of all test slices and the major components of our design, namely the Shifter and the PIN.



 $\textbf{Figure 2.23:} \ \, \textbf{Floor Plan Diagram}$ 

Progress Report 1 Page 47 of 48

### 2.9 Major Design Decisions

The major design decisions revolved mainly around the layout of each slice. We knew we wanted to come up with a design that would allow us to tile with minimal effort. Achieving this goal was not necessarily difficult but it required an iterative design process to break out each signal in such as way that would allow them to directly connect together.

Another decision that was made early on was simply 'which cells should we use?'. While some parts of our slice, such as the D Flip Flop, were obvious choices others were more flexible. In the schematic representation of our PIN slice we show a 2 input AND gate feeding into a 2 input OR gate. As it turns out, the provided library has a cell which performs that function but with an inverted output which is easily mitigated by adding. After comparing two layouts, one using an AND and an OR cell and one using the AOI cell plus an INV cell it turned out that using the AOI and INV saved us some horizontal space which allowed use to fit an extra column of slices in bumping our PIN size to 15x15.

Design decisions revolving around the floor plan are derived from not only from our initial pin layout, which strives to provide chip-to-chip slicing, but also organically as we continue to place components in the frame and see how they fit together. As such, we have not completely finalized the pinout and many pins are still left unassigned. As stated in the first progress report, once we move further into finalizing placement of our Shifter and PIN in the frame we will start tapping off various interesting signals and routing them to close by unassigned pins.

#### 2.10 Work Division

| Task                  | Person |
|-----------------------|--------|
| PIN Slice Layout      | Thrun  |
| Shift Slice Layout    | Qi     |
| PIN Slice IRSIM       | Qi     |
| Shift Slice IRSIM     | Thrun  |
| PIN Slice Spice       | Thrun  |
| Shift Slice Spice     | Qi     |
| Gate Spice            | Both   |
| VHDL Models           | Both   |
| VHDL Slice Tests      | Thrun  |
| VHDL Top Tests        | QI     |
| Simulation Comparison | Both   |
| Floor Plan            | Both   |
| Design Decisions      | Both   |

Table 2.11: Task Assignment

Progress Report 1 Page 48 of 48