

# Tiny Tapeout 03 Datasheet

Project Repository https://github.com/TinyTapeout/tinytapeout-03

## April 12, 2023

## **Contents**

| Render of whole chip           | 4  |
|--------------------------------|----|
| Projects                       | 5  |
| 0 : Test Inverter Project      | 5  |
| 1:ChipTune                     | 6  |
| 2 : 7 Segment Life             | 7  |
| 3 : Another Piece of Pi        | 9  |
| 4: Wormy                       | 11 |
| 5 : Knight Rider Sensor Lights | 15 |
| 6 : Single digit latch         | 17 |
| 7: 4x4 Memory                  | 18 |

|    | : KS-Signal                                            | 19         |
|----|--------------------------------------------------------|------------|
|    | :Hovalaag CPU ..............................           | 21         |
|    | 0 : SKINNY SBOX                                        | 22         |
|    | 1 : Stateful Lock                                      | 23         |
|    | 2 : Ascon's 5-bit S-box                                | 24         |
|    | 3 : 8bit configurable galois Ifsr                      | 25         |
|    | 4 : Sbox SKINNY 8 Bit                                  | 26         |
|    | 5 : BinaryDoorLock                                     | 27         |
|    | 6 : bad apple                                          | 28         |
|    | 7 : TinyFPGA attempt for TinyTapeout3                  | 29         |
|    |                                                        | 30         |
|    | 9:12-bit PDP8                                          | 31         |
|    | 0 : CTF - Catch the fish                               | 33         |
|    | 1 : Dot operation calculator                           | 34         |
|    | 2 : RiscV Scan Chain based CPU – block 1 – clocking    | 35         |
|    | 3: RiscV Scan Chain based CPU – block 2 – instructions | 36         |
|    | 4 : RiscV Scan Chain based CPU – block 3 – registers   |            |
|    | 5 : RiscV Scan Chain based CPU – block 4 – ALU         |            |
|    |                                                        | 39         |
|    |                                                        | 41         |
|    |                                                        | 44         |
|    |                                                        | 45         |
|    |                                                        | 47         |
|    | — ·                                                    | 48         |
|    | 2 : 4-bit ALU                                          | 50         |
|    | 3 : Pulse-Density Modulators                           |            |
|    | 4 : CRC Decelerator                                    |            |
|    |                                                        | 59         |
|    |                                                        | 60         |
|    | ,                                                      |            |
| Te |                                                        | 61         |
|    | Scan chain                                             | 61         |
|    | Clocking                                               | 62         |
|    |                                                        | 63         |
|    | Vait states                                            | 63         |
|    | Pinout                                                 | 63         |
|    | nstructions to build GDS                               | 64         |
|    | Changing macro block size                              | 65         |
|    | <b>.</b>                                               | <i>-</i> - |
| ve |                                                        | 66         |
|    | ere P                                                  | 66         |
|    | imulations                                             | 66         |

| Top level tests setup                                      | 67 |
|------------------------------------------------------------|----|
| Formal Verification                                        | 68 |
| Timing constraints                                         | 68 |
| Physical tests                                             |    |
| Toplevel STA and Spice analysis for TinyTapeout-02 and 03. | 70 |
| Introduction                                               | 70 |
| GitHub action                                              | 70 |
| Prerequisites                                              | 70 |
| Assumptions                                                |    |
| Issues                                                     |    |
| Invoking STA analysis                                      | 71 |
| How it works                                               |    |
| Invoking Spice simulation:                                 |    |
| Sponsored by                                               | 74 |
| Team                                                       | 74 |

# Render of whole chip



Figure 1: Full GDS

# **Projects**

## 0 : Test Inverter Project

Author: Matt Venn

• Description: Inverts every line. This project is also used to fill any empty design spaces.

GitHub repository

Wokwi project

Extra docs

Clock: 0 Hz

External hardware:

## How it works

Uses 8 inverters to invert every line.

## How to test

Setting the input switch to on should turn the corresponding LED off.

| # | Input | Output    |
|---|-------|-----------|
| 0 | a     | segment a |
| 1 | b     | segment b |
| 2 | С     | segment c |
| 3 | d     | segment d |
| 4 | e     | segment e |
| 5 | f     | segment f |
| 6 | g     | segment g |
| 7 | dot   | dot       |

## 1 : ChipTune

Author: Wallie Everest

• Description: Vintage 8-bit sound generator

GitHub repository

HDL project

Extra docs

Clock: 22000 HzExternal hardware:

## How it works

Chiptune implements an 8-bit Programmable Sound Generator. Input is from a serial UART interface. Output is PWM audio.

### How to test

Explain how to test your project

| # | Input               | Output              |
|---|---------------------|---------------------|
| 0 | io_in[0] (CLOCK)    | io_out[0] (PWM)     |
| 1 | io_in[1] (IN_1)     | io_out[1] (OUT_1)   |
| 2 | io_in[2] (IN_2)     | io_out[2] (OUT_2)   |
| 3 | io_in[3] (MODE)     | io_out[3] (OUT_3)   |
| 4 | io_in[4] (BAUD_CLK) | io_out[4] (REF_CLK) |
| 5 | io_in[5] (TCK)      | io_out[5] (RTCK)    |
| 6 | io_in[6] (TMS)      | io_out[6] (LED)     |
| 7 | io_in[7] (TDI)      | io_out[7] (TDO)     |

## 2: 7 Segment Life

Author: icegoat9

Description: Simple 7-segment cellular automaton

GitHub repository

Wokwi project

Extra docs

■ Clock: 0 Hz

• External hardware: None. Could add debounced momentary pushbuttons in parallel with dip switches 1,2,3 to make loading in new patterns and stepping through a run easier.

#### How it works

See the Wokwi gate layout and simulation. At a high level...

- Seven flip-flops hold the cellular automaton's internal state, which is also displayed in the seven-segment display.
- Combinatorial logic generates the next state for each segment based on its neighbors, according to the ruleset...
  - Living segments with exactly one living neighbor (another segment that touches it end to end) survive, and all others die.
  - Dead segments with exactly two living neighbors come to life.
- When either the system clock or a user toggling the clock input go high, this new state is latched into the automaton's state.
- There's minor additional support logic to let the user manually shift in an initial condition and handle clock dividing.

#### How to test

For full details and a few 'exercises for the reader', see the github README doc link. But at a high level, assuming the IC is mounted on the standard tinytapeout PCB which provides dip switches, clock, and a seven-segment display for output...

- Set all dip switches off and the clock slide switch to the 'manual' clock side.
- Power on the system. An arbitrary state may appear on the 7-segment display.
- Set dip switch 4 on ('run mode').
- Toggle dip switch 1 on and off to advance the automaton to the next state, you should see the 7-segment display update.

If you want to watch it run automatically (which may quickly settle on an empty state or a static pattern, depending on start state)...

- Set the PCB clock divider to the maximum clock division (255). With a system clock of 6.25kHz, the clock input should now be ~24.5Hz.
- Set dip switches 5 and 7 on to add a reasonable additional clock divider (see docs for more details on a higher or lower divider).
- Set dip switch 4 on.
- Switch the clock slide switch to the 'system clock' side. The display should advance at roughly 1.5Hz if I've done math correctly.

If you want to load an initial state...

- Set dip switch 4 off ('load mode').
- Toggle dip switches 2 and/or 3 on and off seven times total, to shift in 0 and 1 values to the automaton internal state.
- Set dip switch 4 on and run manually or automatically as above.

| # | Input     | Output     |
|---|-----------|------------|
| 0 | clock     | 7segmentA  |
| 1 | load0     | 7segmentB  |
| 2 | load1     | 7segmentC  |
| 3 | runmode   | 7segmentD  |
| 4 | clockdiv8 | 7segmentE  |
| 5 | clockdiv4 | 7segmentF  |
| 6 | clockdiv2 | 7segmentG  |
| 7 | unused    | 7segmentDP |
|   |           |            |

## 3: Another Piece of Pi

- Author: Meinhard Kissich, EAS Group, Graz University of Technology
- Description: This design takes up the idea of James Ross [1], who submitted a circuit to Tiny Tapeout 02 that stores and outputs the first 1024 decimal digits of the number Pi (including the decimal point) to a 7-segment display. In contrast to his approach, a densely packed decimal encoding is used to store the data. With this approach, 1400 digits can be stored and output within the design area of 150um x 170um. However, at 1400 decimals and utilization of 38.99%, the limitation seems to be routing. Like James, I'm also interested to hear about better strategies to fit more information into the design with synthesizable Verilog code. [1] https://github.com/jar/tt02\_freespeech
- GitHub repository
- HDL project
- Extra docs
- Clock: 0 Hz
- External hardware: 7-segment display

#### How it works

The circuit stores each triplet of decimals in a 10-bit vector encoded as densely packed decimals. An index vector selects the current digits to be output to the 7-segment display. It consists of an upper part index[11:2] that selects the triplet and a lower part index[1:0] that specifies the digit within the triplet. First, the upper part decides on the triplet, which is then decoded into three decimals. Afterwards, the lower part selects one of the three decimals to be decoded into 7-segment display logic and applied to the outputs. The index is incremented at each primary clock edge. However, when the lower part equals three, i.e., index[1:0] ==1'b10, two is added, as the triplet consists of three (not four) digits.

- index == 'b00000000000|01: triplet[0], digit 1 within triplet
- index == 'b0000000000|10: triplet[0], digit 2 within triplet
- index == 'b0000000001|00: triplet[1], digit 0 within triplet
- index == 'b0000000001|01: triplet[1], digit 1 within triplet
- index == 'b0000000001|10: triplet[1], digit 2 within triplet

There is one exception to the rule above: the decimal point. Another multiplexer

at the input of the 7-segment decoder can either forward a digit from the decoded tripled or a constant – the decimal point. Once the lower part of the index counter, i.e., index[1:0] reaches 2'b10 for the first time, the multiplexer selects the decimal point and pauses incrementing the index for one clock cycle.

- index == 'b00000000000|01: triplet[0], digit 1 within triplet
- index == 'b0000000000|10: triplet[0], digit 2 within triplet
- index == 'b0000000001|00: triplet[1], digit 0 within triplet

#### How to test

For simulation, please use the provided testbench and Makefile. It is important to run the genmux.py Python script first, as it generates the test vectors required by the Verilog testbench. For testing the physical chip, release the reset and compare the digits of Pi against a reference.

| # | Input | Output      |
|---|-------|-------------|
| 0 | clk   | segment a   |
| 1 | reset | segment b   |
| 2 | none  | segment c   |
| 3 | none  | segment d   |
| 4 | none  | segment e   |
| 5 | none  | segment f   |
| 6 | none  | segment g   |
| 7 | none  | decimal LED |

## 4: Wormy

Author: ngbit

• Description: MC Wormy Pants squirms like a worm and grows just as fast.

GitHub repository

HDL project

Extra docs

Clock: 300 Hz

External hardware:

#### How it works

Wormy is a very simple, addictive last person video game. This last person, open-world game takes you down the path of an earthworm. Wormy's world is made up of a 4x4 grid represented by 3x16-bit arrays: Direction[0], Direction[1], and Occupied. The Direction[x] maps keep track of which way a segment of worm moves, if it is on, and Occupied keeps track of if the grid location is occupied.

Example Occupied Grid:

In addition to direction and occupied there are also pointers to the head and the tail. The same grid would look something like the following:

Example Occupied Grid with head(H) and tail(T) highlighted:

|    |   |   |   |   | - |
|----|---|---|---|---|---|
|    | T | X | X | X |   |
|    |   |   |   | X |   |
|    |   |   |   | Н |   |
| ١, |   |   |   |   |   |

BOOM! Wormy shouldn't run into itself. If its head hits any part of its body, that causes a collision. There is a collision if the location of the Wormy's head is occupied by another segment of the Wormy. To determine this, we keep track of the worm location, and specifically the current and future locations of the worm head (H) and tail (T). If the future location of H will occupy a location that will already be occupied, this causes a collision.

This is made a bit trickier with growth (see below), because if the future state of H is set to occupy the current state of T - there is only a collision if growth is set to occur

in the next cycle, so be careful if you plan to have Wormy chase its tail! SPLAT!

NOM NOM! Wormy eats the tasty earth around it and grows. Every so often Wormy, after having eaten all of its lunch, grows a whole segment. During a growth cycle, the state of T simply persists (remains occupied and heading in the same direction as it is).

Last Person Input - see user\_input.v To move Wormy along, the last player needs to push buttons to help Wormy find more tasty earth: up, right, down or left.

Buttons are nasty little bugs in general. When pushed the button generates an analog signal that might not look exactly like a single rising edge.

It might look something like this:

|                |    |    |     |   | xxxxxxxxxxxxx |
|----------------|----|----|-----|---|---------------|
|                |    |    |     | 3 | XX            |
|                |    |    |     | 3 | Σ             |
|                |    |    |     | 3 | Σ             |
|                | X  |    |     | 3 | ζ             |
|                | X  |    |     | X |               |
|                | X  |    |     | X |               |
|                | X  |    |     | X |               |
|                | X  | XΣ | ХХ  | X |               |
|                | X  | XΣ | XXX | X |               |
|                | X  | X  | XX  | X |               |
|                | X  | X  | X   | X |               |
|                | X  | X  | X   | X |               |
| X              | X  | X  | X   | X |               |
| x              | XX | X  | X   | X |               |
| x              | XX | X  | X   | X |               |
| x              | X  | X  | X   | X |               |
| x              | X  | X  | XX  | X |               |
| X              | 2  | XX | X   | X |               |
| xxxxxxxxxxxxxx |    |    | 3   | X |               |

In order to help protect our logic from this scary looking signal that might introduce metastability (<- WHAT?), we can filter it with a couple flippy-floppies and keep the metastability at bay. ARGH.

Once we have a clear pushed or not-pushed, we can suggest that Wormy move in a specific direction. If the last player tries button mashing, Wormy won't listen. Once a second Wormy checks what the last button press was and tries really hard to go that way (see BOOM!).

Earthworms don't have eyes - see multiplexer.v The game's display is made up of a 4x4

grid of LEDs controlled by a multiplexer. Why multiplexing? With a multiplexed LED setup, we can control more display units (LEDs), with a limited number of outputs (8 on this TinyTapeout project).

To get multiplexing working the network of outputs is mapped to each display unit. This allows us to manipulate assigned outputs to control the state of each display unit, one at a time. We then cycle through each display unit quickly enough to display a persistent image to the last player.

Wires (A1-4, B1-4) map to each location on the game arena (4x4 grid):

| A1      | 1       |         |         |
|---------|---------|---------|---------|
| A2      | <br>    | <br>    |         |
| A3      |         | Ι       | <br>    |
| A4      |         | Ī       | Ι       |
| <br> B1 | <br> B2 | <br> B3 | <br> B4 |

When B's voltage is OFF the LED's state changes to ON if A is also ON. - ON LED state: A(ON) - > | - B(OFF) - OFF LED state: A(ON) - > | - B(ON)

Example: The 3 filled squares below each represent a Wormy segment in the ON state as controlled by the multiplexer. Notice how each is lighter than the last. This is because the multiplexer cycles through each LED to update the state, creating one persistent image even though the LEDS are not on over the entire period of time.

| B1 | B2 | B3   | B4 |
|----|----|------|----|
| A4 | Ī  | Ι    | Ι  |
| A3 | Ι. | <br> | Ι  |
| A2 | o  |      | Ī  |
| A1 | 0  |      | 1  |

Another Example: If you enter a dark cave and point a flashlight straight ahead at one point on the wall you have a very small visual field that is contained within the beam of light. However, you can expand your visual field in the cave by waving the flashlight back and forth across the wall. Despite the fact that the beam is moving over individual points on the wall, the entire wall can be seen at once. This is similar to the concept used in the Wormy display, since the multiplexer changes the state of

the worm occupied locations to ON one at a time, but in a cycle. The result is a solid image, made up of LEDs cycling through ON states to produce a persistent image of Wormy (that beautiful Lumbricina).

## How to test

After reset, you should see a single pixel moving along the display and it should grow every now and then.

| # | Input   | Output                           |
|---|---------|----------------------------------|
| 0 | clock   | A0 - Multiplexer channel A to be |
|   |         | tied to a an array of 16         |
|   |         | multiplexed LEDs                 |
| 1 | reset   | A1                               |
| 2 | button0 | A2                               |
| 3 | button1 | A3                               |
| 4 | button2 | B0 - Multiplexer channel B to be |
|   |         | tied to a an array of 16         |
|   |         | multiplexed LEDs                 |
| 5 | button3 | B1                               |
| 6 | none    | B2                               |
| 7 | none    | B3                               |

## 5: Knight Rider Sensor Lights



Figure 2: picture

- Author: Kolos Koblasz
- Description: The logic assertes output bits one by one, like KITT's sensor lights in Knight Rider.
- GitHub repository
- HDL project
- Extra docs
- Clock: 6000 Hz
- External hardware: Conect LEDs with ~1K-10K Ohm serial resistors to output pins and connect push button switches to Input[2] and Input[3] which drive the inputs with logic zeros when idle and with logic 1 when pressed. Rising edge on these inputs selects the next settings.

#### How it works

Uses several counters, shiftregisters to create a moving light. Input[2] and Input[3] can control speed and brightness respectively. Brightness control is achieved by PWM of the output bits at 50Hz. Simulated with 6KHz clock signal.

### How to test

After reset it starts moving the switched on LED. Input[0] is clk and Input[1] is reset (1=reset on, 0=reset off). By creating rising edges on Input[2] and Input[3] the two config spaces can be discovered. Conect LEDs with  $\sim 1 \text{K}-10 \text{K}$  Ohm serial resistors to output pins and connect push button switches to Input[2] and Input[3] which drive the inputs with logic zeros when idle and with logic 1 when pressed. Rising edge on these inputs selects the next settings.

| # | Input              | Output |
|---|--------------------|--------|
| 0 | clock              | LED 0  |
| 1 | reset              | LED 1  |
| 2 | speed control      | LED 2  |
| 3 | brightness control | LED 3  |
| 4 | none               | LED 4  |
| 5 | none               | LED 5  |
| 6 | none               | LED 6  |
| 7 | none               | LED 7  |

## 6 : Single digit latch

• Author: Dylan Garrett

■ Description: Store a single digit 0-9 and display it on a 7-segment display

GitHub repository

Wokwi project

Extra docs

■ Clock: 0 Hz

External hardware:

## How it works

Explain how your project works

## How to test

Explain how to test your project

| # | Input | Output    |
|---|-------|-----------|
| 0 | clock | segment a |
| 1 | reset | segment b |
| 2 | none  | segment c |
| 3 | none  | segment d |
| 4 | none  | segment e |
| 5 | none  | segment f |
| 6 | none  | segment g |
| 7 | none  | dot       |

## 7: 4x4 Memory

Author: Yannick Reiß

• Description: Store 4x4 bits of memory.

GitHub repository

Wokwi project

Extra docs

Clock: 1000 HzExternal hardware:

### How it works

On write enable, the four data-inputs are saved to the d-flipflops selected by the address inputs. The output is always read from the current selected flipflops.

#### How to test

Connect a clock, buttons for reset and write\_enable and switches for address and data like shown below. The output can be read by using 4 LEDs or any other kind of binary output device.

| # | Input        | Output |
|---|--------------|--------|
| 0 | clock        | data1  |
| 1 | reset        | data2  |
| 2 | write_enable | data3  |
| 3 | addr1        | data4  |
| 4 | addr2        | none   |
| 5 | data1        | none   |
| 6 | data2        | none   |
| 7 | data3        | none   |

## 8: KS-Signal

Author: Yannick Reiß

• Description: Set KS-Signal based on track information.

GitHub repository

Wokwi project

Extra docs

■ Clock: 10000 Hz

 External hardware: Input: Clock, 9 Buttons, Output: 3 yellow LEDs, 3 white LEDs, 1 green LED, 1 red LED, 1 orange LED

### How it works

Simply switches lamps on and off on toggle.

### How to test

Connect the clock, and the buttons as inputs. Connect the LEDs as outputs in the ordern shown below.

| # | Input                     | Output                            |
|---|---------------------------|-----------------------------------|
| 0 | track1_free               | Fahrt! (Green)                    |
| 1 | track2_free               | Halt Erwarten! (Orange)           |
| 2 | pre_signal                | Halt! (Red)                       |
| 3 | none                      | Vorsicht! (All yellow)            |
| 4 | none                      | pre signal indicator (white)      |
| 5 | allow shunting            | lower shunting indicator (white)  |
| 6 | Vorsicht!                 | shorter breaking distance (white) |
| 7 | shorter breaking distance | none                              |



Figure 3: picture

## 9: Hovalaag CPU

Author: Mike Bell

Description: Implementation of the CPU from HOVALAAG

GitHub repository

HDL project

Extra docs

Clock: 12500 HzExternal hardware:

#### How it works

HOVALAAG (Hand-Optimizing VLIW Assembly Language as a Game) is a free Zachlike game.

This is an implementation of the VLIW processor from the game. Thank you to @nothings for the fun game, making the assembler public domain, and for permission to create this hardware implementation.

The processor uses 32-bit instructions and has 12-bit I/O. The instruction and data are therefore passed in and out over 10 clocks per processor clock.

More details in the github repo.

#### How to test

The assembler can be downloaded to generate programs.

The subcycle counter can be reset independently of the rest of the processor, to ensure you can get to a known state without clearing all registers.

| # | Input                                                | Output   |
|---|------------------------------------------------------|----------|
| 0 | Clock                                                | Output 0 |
| 1 | Reset disable (resets enabled when low)              | Output 1 |
| 2 | Input 0 or Reset (when high)                         | Output 2 |
| 3 | Input 1 or Reset subcycle count (when high)          | Output 3 |
| 4 | Input 2 or enable ROSC (when high and reset enabled) | Output 4 |
| 5 | Input 3                                              | Output 5 |
| 6 | Input 4                                              | Output 6 |
| 7 | Input 5                                              | Output 7 |

## 10: SKINNY SBOX

Author: Niklas Fassbender

Description: Implementation of a 4-Bit Sbox for SKINNY

GitHub repository

Wokwi project

Extra docs

Clock: 0 Hz

External hardware:

## How it works

Explain how your project works

## How to test

Explain how to test your project

| # | Input     | Output    |
|---|-----------|-----------|
| 0 | bit_0_lsb | bit_0_lsb |
| 1 | bit_1     | bit_1     |
| 2 | bit_2     | bit_2     |
| 3 | bit_3_msb | bit_3_msb |
| 4 | none      | none      |
| 5 | none      | none      |
| 6 | none      | none      |
| 7 | none      | none      |

## 11: Stateful Lock

Author: Tim Henkes

 Description: A little combination lock which requires three codes in the correct order to unlock

GitHub repository

Wokwi project

Extra docsClock: 0 Hz

External hardware:

### How it works

A little two-bit state machine decides which code is required to enter the next state. The fourth state equals to the lock being open. A wrong code in any state resets to the first state.

### How to test

To test the project, refer to its Wokwi, which is public.

| # | Input        | Output      |
|---|--------------|-------------|
| 0 | unused       | lock status |
| 1 | reset        | unused      |
| 2 | enter        | unused      |
| 3 | code digit 0 | unused      |
| 4 | code digit 1 | unused      |
| 5 | code digit 2 | unused      |
| 6 | code digit 3 | unused      |
| 7 | code digit 4 | unused      |

## 12: Ascon's 5-bit S-box

Author: Fabio Campos

Description: Ascon's 5-bit S-box

GitHub repository

Wokwi project

Extra docs

Clock: 0 Hz

External hardware:

## How it works

See Figure 4a in https://eprint.iacr.org/2021/1574.pdf

## How to test

See Section 3.2 and Table 4 in https://eprint.iacr.org/2021/1574.pdf

| # | Input      | Output     |
|---|------------|------------|
| 0 | x0         | x0         |
| 1 | x1         | x1         |
| 2 | x2         | x2         |
| 3 | <b>x</b> 3 | <b>x</b> 3 |
| 4 | x4         | x4         |
| 5 | none       | none       |
| 6 | none       | none       |
| 7 | none       | none       |

## 13: 8bit configurable galois Ifsr

Author: Alexander Schönborn

• Description: A 8bit configurable galois Ifsr.

GitHub repository

HDL project

Extra docs

Clock: 1000 HzExternal hardware:

### How it works

Uses 2 cycles each to set the shift and xor mask registers. It has 3 operation modes: normal shift,

### How to test

After reset, set the shift register and xor state, after that normal shift mode

| # | Input                                                                      | Output                                       |
|---|----------------------------------------------------------------------------|----------------------------------------------|
| 0 | clock                                                                      | output[0] Isb normal Ifsr output             |
| 1 | reset                                                                      | output[1] other 7 bits to see the full state |
| 2 | mode[0] 00 normal shift mode, 01 set register mode,                        | output[2]                                    |
| 3 | mode[1] 10 set mode registers, 11 unused                                   | output[3]                                    |
| 4 | <pre>data_in[0] is used for both filling register and xor mask state</pre> | output[4]                                    |
| 5 | data_in[1] needs 2 cycles to fill all 8 bits                               | output[5]                                    |
| 6 | data_in[2] first cyle is lower 4 bits,<br>2nd upper 4 bits                 | output[6]                                    |
| 7 | data_in[3] see above                                                       | output[7]                                    |

## 14: Sbox SKINNY 8 Bit

Author: Thorsten Knoll

• Description: A circuit for the substitution of 8 bits. Made for the SKINNY cipher algorithm.

GitHub repository

Wokwi project

Extra docs

Clock: 0 Hz

External hardware:

## How it works

Explain how your project works

## How to test

Explain how to test your project

| # | Input | Output |
|---|-------|--------|
| 0 | in_0  | out_0  |
| 1 | in_1  | out_1  |
| 2 | in_2  | out_2  |
| 3 | in_3  | out_3  |
| 4 | in_4  | out_4  |
| 5 | in_5  | out_5  |
| 6 | in_6  | out_6  |
| 7 | in_7  | out_7  |
|   |       |        |

## 15: BinaryDoorLock

Author: Marcus Michaely

• Description: Input is 8-Bit and only one combination opens the door

GitHub repository

Wokwi project

Extra docs

Clock: 0 Hz

External hardware:

## How it works

Explain how your project works

## How to test

Explain how to test your project

| # | Input | Output |
|---|-------|--------|
| 0 | Bit_0 | Output |
| 1 | Bit_1 | none   |
| 2 | Bit_2 | none   |
| 3 | Bit_3 | none   |
| 4 | Bit_4 | none   |
| 5 | Bit_5 | none   |
| 6 | Bit_6 | none   |
| 7 | Bit_7 | none   |

## 16: bad apple

Author: shadow1229

• Description: Plays bad apple over a Piezo Speaker connected across io\_out[1:0]. Based on https://github.com/meriac/tt02-play-tune

GitHub repository

HDL project

Extra docs

■ Clock: 12000 Hz

• External hardware: Piezo speaker connected across io\_out[1:0]

### How it works

Converts an RTTL ringtone into verilog and plays it back using differential PWM modulation.

#### How to test

Provide 12kHz clock on io\_in[0], briefly hit reset io\_in[1] (L->H->L) and io\_out[1:0] will play a differential sound wave over piezo speaker (Bad Apple)

| # | Input | Output          |
|---|-------|-----------------|
| 0 | clock | piezo_speaker_p |
| 1 | reset | piezo_speaker_n |
| 2 | none  | none            |
| 3 | none  | none            |
| 4 | none  | none            |
| 5 | none  | none            |
| 6 | none  | none            |
| 7 | none  | none            |
|   |       |                 |

## 17: TinyFPGA attempt for TinyTapeout3

Author: Emilian Miron

• Description: FPGA attempt

GitHub repository

HDL projectExtra docs

Clock: 1000 HzExternal hardware:

## How it works

TODO

## How to test

After reset, the counter should increase by one every second.

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |

## 18: 4bit Adder

Author: Carin Schreiner

 Description: This tiny tape out project takes two four bit numberbs and adds them

GitHub repository

Wokwi project

Extra docs

Clock: 0 Hz

External hardware:

### How it works

The halfadder uses simple logic gates.

#### How to test

Input bits 0-3 are used for the first number and bits 4-7 for the second number. The output bits 0-3 are the resulting number an bit 4 the carry. All numbers should be in little endian format.

| # | Input | Output |
|---|-------|--------|
| 0 | a0    | r0     |
| 1 | a1    | r1     |
| 2 | a2    | r2     |
| 3 | a3    | r3     |
| 4 | b0    | carry  |
| 5 | b1    | none   |
| 6 | b2    | none   |
| 7 | b3    | none   |
|   |       |        |

## 19: 12-bit PDP8

Author: Paul CampnellDescription: PDP8 core

GitHub repository

HDL projectExtra docs

Clock: 1000 HzExternal hardware:

#### How it works

This is a 12-bit basic PDP8 cpu - it doesn't have the extended arithmetic unit (so no multiply or divide). Included is an assembler (mostly for test). Bus interface is a 5-clock to get 12 bits of address and 12 bits of data though 8-bit interfaces. Address is 2 beats of 6 bits each, data is 3 beats of 4 bits each, I/O cycles have an extra beat

```
output bits
7 6 5 4 3 2 1 0
1 O A A A A A A
                    address hi
1 1 A A A A A A
                    address lo
0 1 1 I I 4 2 1
                    IO cycle intro
either
0 0 0 0 - - - -
                    read data high nibble
0 0 1 0 - - - -
                    read data med nibble
0 1 0 0 - - - -
                    read data low nibble
or
0 0 0 1 D D D D
                    write data high nibble
0 0 1 1 D D D D
                    write data med nibble
0 1 0 1 D D D D
                    write data low nibble
```

Input bits are ignored except during read beats, interrupts are sampled during the first address beat

#### How to test

code in test-bench, assembler in asm dir

| # | Input     | Output                        |
|---|-----------|-------------------------------|
| 0 | clock     | address_0_data_mux_0          |
| 1 | reset     | address_1_data_mux_1          |
| 2 | none      | address_2_data_mux_2          |
| 3 | none      | address_3_data_mux_3          |
| 4 | data_in_0 | address_4_rw_mux              |
| 5 | data_in_1 | address_5_phase_lo_select_mux |
| 6 | data_in_2 | phase_hi_select               |
| 7 | data_in_3 | address_data_select           |

## 20: CTF - Catch the fish

Author: Carin Schreiner

• Description: Catch the fish is a whac-a-mole game.

GitHub repository

Wokwi project

Extra docs

Clock: 0 Hz

External hardware:

#### How it works

The project consists of \* a fibonacci Ifsr of degree 6. \* a simple timer: the timer consists of two flip flops. It slows down the game and further amplifies the pseudorandomness \* a button press detection for each button \* a simple reward detection \* Output state Handlers \* Output Selection: The last bit of the Isfr is used as the decision whether to give an output or not. The second and third to last bits are used to determine on which pin the output should be given. If both bits are zero, no output is given.

### How to test

To play the game, press the start button and make sure the clock is set to a frequency of one or two. The higher the frequency the more difficult the game gets. To score, you need to press the button while the respectivly numbered feedback output is one. You then get a reward feedback

| # | Input    | Output          |
|---|----------|-----------------|
| 0 | clock    | feedback 1      |
| 1 | button 1 | feedback 2      |
| 2 | button 2 | feedback 3      |
| 3 | button 3 | reward feedback |
| 4 | none     | none            |
| 5 | none     | none            |
| 6 | none     | none            |
| 7 | none     | none            |

## 21: Dot operation calculator

Author: Yannick Reiß

• Description: Can calculate the result for 3 bit multiplication and division.

GitHub repository

HDL project

Extra docsClock: 0 Hz

• External hardware: 7 switches, 6 leds

#### How it works

It takes input 1 to determine operator and two 3 bit inputs as operands. The result is put to output 0-5. In multiplication mode it just outputs the number binary encoded. In division mode the output pins 0-2 are the quotient and 3-5 are the reminder.

#### How to test

Connect input 1-7 with switches and output 0-6 with leds.

| # | Input         | Output           |
|---|---------------|------------------|
| 0 | clk_dummy     | product/quotient |
| 1 | opcode        | product/quotient |
| 2 | $operand1_1$  | product/quotient |
| 3 | operand1_2    | product/reminder |
| 4 | operand1_3    | product/reminder |
| 5 | operand $2_1$ | product/reminder |
| 6 | operand2_2    | none             |
| 7 | operand2_3    | none             |

## 22: RiscV Scan Chain based CPU – block 1 – clocking

Author: Emilian Miron

■ Description: RiscV Scan Chain based CPU - block 1 - clocking

GitHub repository

HDL project

Extra docs

Clock: 20000 HzExternal hardware:

## How it works

TODO

### How to test

After reset, the counter should increase by one every second.

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |
|   |       |                   |

## 23: RiscV Scan Chain based CPU - block 2 - instructions

Author: Emilian Miron

■ Description: RiscV Scan Chain based CPU - block 2 - instructions

GitHub repository

HDL project

Extra docs

Clock: 20000 HzExternal hardware:

## How it works

TODO

### How to test

After reset, the counter should increase by one every second.

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |
|   |       |                   |

# 24 : RiscV Scan Chain based CPU - block 3 - registers

Author: Emilian Miron

■ Description: RiscV Scan Chain based CPU - block 3 - registers

GitHub repository

HDL project

Extra docs

Clock: 20000 HzExternal hardware:

#### How it works

TODO

#### How to test

After reset, the counter should increase by one every second.

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |

# 25: RiscV Scan Chain based CPU - block 4 - ALU

Author: Emilian Miron

Description: RiscV Scan Chain based CPU – block 4 – ALU

GitHub repository

HDL project

Extra docs

Clock: 20000 HzExternal hardware:

#### How it works

**TODO** 

#### How to test

After reset, the counter should increase by one every second.

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |

## 26: 31b-PrimeDetector



Figure 4: picture

- Author: Aidan McNay
- Description: Detects whether a 31-bit number is prime or not
- GitHub repository
- HDL project
- Extra docs
- Clock: 50000 (though could probably be faster) Hz
- External hardware: External Clock, 5 buttons for inputs, 2 LEDs for outputs

#### How it works

The 31-bit Prime Detector takes in a 31-bit number (shifted in serially). Once the number is obtained, the FSM control logic takes over. It attempts to divide the value by all numbers less than it; if it finds one that divides evenly, the logic stops and declares the number not prime. If it doesn't divide evenly by any of these, the number is declared prime.

Due to space constraints, the design uses an iterative divider and FSM logic to minimize space usage. Further information can be taken from the README.md on the GitHub page

#### How to test

This design requires an external clock. Before testing, the design should be reset with the appropriate pin (active high), which resets the stored value to 0. To shift in a value, use the SPI-like interface; when the CS line is enabled (active low), on rising edges of SCLK, the data present at SDI is shifted in. Data is shifted into the LSB, and progressively shifted to more significant bits as new data is received (with the data at MSB being shifted out and disregarded).

Once you have the desired number stored, start the calculations by enabling the ready pin (active high). Note that the stored value cannot change while calculations are ongoing.

Once the calculations are finished, the done pin will be driven high. The result will be shown on the is prime pin; a value of 1 indicates that the value inputted is prime.

| # | Input | Output   |
|---|-------|----------|
| 0 | clock | done     |
| 1 | reset | is_prime |
| 2 | SDI   | waiting  |
| 3 | SCLK  | GND      |
| 4 | CS    | GND      |
| 5 | ready | GND      |
| 6 | NC    | GND      |
| 7 | NC    | GND      |

# 27: XOR Stream Cipher



Figure 5: picture

- Author: Luke Vassallo
- Description: An two channel XOR stream cipher with fully programmable 32-bit galois LFSRs.
- GitHub repository
- HDL project
- Extra docs
- Clock: 12500 Hz
- External hardware: General puporse FPGA with PMOD connector for detailed testing./

#### How it works

An XOR (exclusive or) cipher is a type of encryption that uses a bitwise exclusive or operation to combine a plaintext message with a secret key. This process generates a ciphertext that can only be decrypted by someone with the same secret key. XOR ciphers are commonly used in computer security and are popular due to their simplicity and efficiency. They can be implemented in hardware using XOR cipher chips, which typically use a Galois LFSR (linear feedback shift register) to generate a key stream that is XORed with the plaintext to produce the ciphertext. XOR ciphers are considered relatively secure as long as the secret key is kept secret and is not easily guessable.

The system uses a Galois Linear Feedback Shift Register (LFSR) to produce a key stream that is combined with the incoming bitstream through an XORing process to

create the cipher stream that appears at the output. A concurrent channel is additionally provided such that a transmission and receiption bitstream can be encrypted and decrypted simultaneously. When the system is reset, a default confgiruation is applied that will encrypt the bitsream on tx\_p and decrypt the bistream on rx\_e when te\_en and rx\_en are respectively driven high.

To configure the chip (optional), a 130-bit configuration vector is serial shifted through the pin cfg\_i while the current configuration is simultaneously outputted on pin cfg\_o. This configuration method functions as a lengthy shift register with its input connected to cfg\_i and its output connected to cfg\_o. It only operates synchronously to clk when cfg\_en is asserted. The configuration vector consists of bits 95->64, 31->0 for the tx/rx LFSR state, 127->96, 63->32 bits for the tx/rx LFSR taps. Power-on-reset state has the taps configured for PRBS-31 and LFSR state holding 0x55. Bit 128 is used to internally route the output bitstreams through an XOR that returns the original bitstreams provided at the input albeit placed on debug output pins (disabled by default). Bit 129 selects between the internal or externally provided plaintext generator (default).

#### How to test

Reset the module and optionally configure the LFSR taps, state and other options using the serial shift configuration register. When the enable pin (tx\_en/rx\_en) is asserted the corresponding channel is activate and the plain/cipher text bitstream will be encrypted/decrypted. Detailed simulation, FPGA and ASIC design examples are available on GitHub (https://github.com/LukeVassallo/tt03-xor-cipher). The GitHub repository is mirrored on my private GitLab instance (https://gitlab.lukevassallo.com/luke/tt03-xor-cipher) that incorporates automated builds and pre-built bitstreams.

| # | Input                                                            | Output                                                                      |
|---|------------------------------------------------------------------|-----------------------------------------------------------------------------|
| 0 | clk (12.5KHz system clock)                                       | tx_e ( Encrypted bitstream for transmission)                                |
| 1 | rst (Active high synchronous reset)                              | <pre>dbg_tx_p (Decrypted transmit bitstream, pin disabled by default)</pre> |
| 2 | <pre>tx_p (Plaintext bitstream for transmission)</pre>           | dbg_rx_e (Encrypted receive bitstream, pin disabled by default)             |
| 3 | cfg_i (Configuration input to the 130-bit serial shift register) | rx_p (Decrypted bitstream for receiption)                                   |
| 4 | cfg_en (Active high configuration enable)                        | cfg_o (Configuration output from the 130-bit shift register)                |

| # | Input                                     | Output                                    |
|---|-------------------------------------------|-------------------------------------------|
| 5 | tx_en (Transmit channel enable)           | heartbeat[7] (bit from heartbeat counter) |
| 6 | rx_e (Encrypted bitstream for receiption) | heartbeat[8] (bit from heartbeat counter) |
| 7 | rx_en (Receive channel enable)            | heartbeat[9] (bit from heartbeat counter) |

# 28: LED Panel Driver

Author: Tom KeddieDescription: TBDGitHub repository

HDL projectExtra docsClock: 0 Hz

• External hardware: icebreaker led panel pmod single

## How it works

Explain how your project works

#### How to test

Explain how to test your project

| # | Input     | Output |
|---|-----------|--------|
| 0 | clock     | red    |
| 1 | reset     | blue   |
| 2 | none      | aclk   |
| 3 | none      | blank  |
| 4 | none      | green  |
| 5 | rowmax[0] | arst   |
| 6 | rowmax[1] | sclk   |
| 7 | rowmax[2] | latch  |

## 29: 6-bit FIFO

Author: Mike Bell

Description: Implementation of a FIFO

GitHub repository

HDL project

Extra docs

Clock: 50000 HzExternal hardware:

#### How it works

The design implements a 52 entry 6-bit FIFO. The oldest 4 entries are accessible by setting the peek address. The first 48 entries in the FIFO are implemented as a chain of latches. This allows for high data density in the limited area. The last 4 entries in the FIFO are implemented by a ring of flip-flops. This allows random access to the last four entries.

Because of the way the chain of latches works, when entries are popped from the FIFO it takes time for data in the latch part of the FIFO to move down the chain. If the FIFO is fairly empty this shouldn't be noticeable, but when the FIFO is quite full this can cause the FIFO to refuse writes even though it is not full.

A ready output indicates whether the latch chain is ready for more data. If the FIFO is completely full and one entry is popped then it takes 48 cycles after the pop for the chain to be ready again. Therefore, if ready stays low when the FIFO is clocked 49 or more times without any data being popped then the FIFO is full.

There are minimal delays on the read side - if any data is in the FIFO then it can always be read, this works because empty latch entries can pass the data through them without needing to be clocked. The only exception to this is that due to input buffering newly written entries take 2 cycles to appear on the output.

#### How to test

New entries, taken from inputs 2-7, are written to the FIFO on a rising clock edge when input 1 is high. Data writes are ignored when the Ready output is low.

Because of limited inputs, the write enable is used to determine the mode of inputs 2-5. When not writing, these control reset (active low), pop (active high) and the peek address.

The peek address controls whether the oldest to 4th oldest entry in the FIFO is presented on the data outputs.

Reading the oldest entry when the FIFO is empty always reads 0. However, peeking at previous entries when the FIFO is empty or has fewer occupied entries reads stale data - the values should not be relied upon.

The oldest entry is popped from the FIFO by setting pop. It is valid to set the peek address to a non-zero value when popping, but it is always the oldest entry that is popped, which is not the entry being read if peek address is non-zero. The peek address is considered prior to the pop.

When writing, it is always the last entry in the FIFO that is read (peek address is considered to be zero). The new value written when the FIFO is empty is not presented on the outputs until the following cycle.

| # | Input               | Output  |
|---|---------------------|---------|
| 0 | Clock               | Ready   |
| 1 | Mode (Write enable) | Empty_n |
| 2 | Reset_n / Data 0    | Data 0  |
| 3 | Pop / Data 1        | Data 1  |
| 4 | Peek A0 / Data 2    | Data 2  |
| 5 | Peek A1 / Data 3    | Data 3  |
| 6 | Unused / Data 4     | Data 4  |
| 7 | Unused / Data 5     | Data 5  |
|   |                     |         |

# 30 : ezm\_cpu

Author: guianmonezm#4787Description: basic 8bit CPU

GitHub repository

• HDL project

Extra docs

■ Clock: 6000 Hz

• External hardware: FPGA or ram and demultiplexer

#### How it works

you connect an external ram and a decoder to it or an FPGA with a rom programed.

#### How to test

connect an external ram and a decoder to it.

| # | Input | Output  |
|---|-------|---------|
| 0 | clock | pc[0]/c |
| 1 | reset | pc[1]/c |
| 2 | in1   | pc[2]/c |
| 3 | in2   | pc[3]/c |
| 4 | in3   | pc[4]/c |
| 5 | in4   | pc[5]/c |
| 6 | in5   | pc[6]/c |
| 7 | in6   | pc[7]/c |

# 31: tiny logic analyzer

Author: yubex

• Description: The design samples one data input and shows the current state and edge events using the 7 segment display.

GitHub repository

HDL project

Extra docs

■ Clock: 12500 Hz

• External hardware: no external HW required

#### How it works

The samples of the data input pin are shifted into a shift register first. After that the 2 most significant bits of the shift register are used to detect the current signal state (high, low) and the edge events (rising edge, falling edge). There are 4 different states/events displayed on the 7 segment display.

| segments on |
|-------------|
| a           |
| d           |
| e and f     |
| b and c     |
|             |

The edge events duration is extended by counters, so the events can actually be seen by the human eye.

#### How to test

You can test by using the dip switch connected to io\_in[2] as data input. I plan to connect a wire to the pin on the PCB by PMOD connector or custom soldering to be able to have a "measurement probe". ;-)

| # | Input      | Output    |
|---|------------|-----------|
| 0 | clock      | segment a |
| 1 | reset      | segment b |
| 2 | data input | segment c |
| 3 | none       | segment d |

| # | Input | Output    |
|---|-------|-----------|
| 4 | none  | segment e |
| 5 | none  | segment f |
| 6 | none  | segment g |
| 7 | none  | dot       |

## 32: 4-bit ALU

Author: ReJ aka Renaldas Zioma

 Description: Digital design for a 4-bit ALU supporting 8 different operations and built-in 4-bit accumulator register

GitHub repository

Wokwi project

Extra docs

Clock: 0 Hz

• External hardware: push-button, debouncer, DIP-switch, 5 LEDs

#### How it works

Each clock cycles ALU performs one of the 8 possible operations and stores result in the 4-bit accumulator register.

accumulator [4 bit] = accumulator [4 bit] (operation) operand [4 bit]

Supported operations: Ida imm :: imm -> accumulator neg imm :: 0x0F - imm -> accumulator shr :: accumulator / 2 -> accumulator sub imm :: accumulator - imm -> accumulator and imm :: accumulator & imm -> accumulator xor imm :: accumulator  $\hat{}$  imm -> accumulator or imm :: accumulator  $\hat{}$  imm -> accumulator add imm :: accumulator + imm -> accumulator

Matrix mapping of operation opcode to internal control signals muxA muxB muxC AtoX negX setC outC invC 000 lda - - 1 0 0 - 0 - 001 neg - - 1 0 1 - 0 - 010 shr - - 1 1 0 - 0 - 011 sub 1 1 0 0 1 1 1 1 100 and 0 0 0 0 - 0 - 101 xor 0 1 0 0 0 - 0 - 110 or 1 0 0 0 0 - 0 - 111 add 1 1 0 0 0 0 1 0

#### How to test

The following diagram shows a simple test setup that can be used to test ALU



```
+ DIP +---> IMMO +-----+
+ switch +---> IMM1
+ +---> IMM2
+ +---> IMM3 CARRY--> LED
```

To reset ALU set all input pins to 0 which corresponds to 1da 0 operation loading Accumulator register with 0.

| # | Input           | Output                             |
|---|-----------------|------------------------------------|
| 0 | clock           | accumulator value 0th bit          |
| 1 | opcode 0th bit  | accumulator value 1st bit          |
| 2 | opcode 1st bit  | accumulator value 2nd bit          |
| 3 | opcode 2nd bit  | accumulator value 3rd bit          |
| 4 | operand 0th bit | {'unused (TODO': 'negative flag)'} |
| 5 | operand 1st bit | {'unused (TODO': 'overflow flag)'} |
| 6 | operand 2nd bit | {'unused (TODO': 'zero flag)'}     |
| 7 | operand 3rd bit | carry flag                         |

# 33: Pulse-Density Modulators



Figure 6: picture

- Author: Marco Merlin
- Description: An implementation of a DDPM, PWM and Sigma-Delta Pulse-Density Modulators with python libraries myHDL and PuEDA.
- GitHub repository
- HDL project
- Extra docs
- Clock: 12500 Hz
- External hardware: clock source

#### How it works

This project implements three different architectures of Pulse Density Modulators (PDM), to compare performances and implementation complexity of the different schemes.

PDM modulators are of particular interest because exploiting oversampling, they allow to implement Digital-to-Analog-Conversion (DAC) schemes with an equivalent resolu-

tion of multiple bits, based on a single-bit digital output, by means of a straightforward analogue low-pass filter.

The PDM modulators implemented in this project are the following: 1) Pulse Width Moulation (PWM) 2) Dyadic Digital Pulse Modulation (DDPM) 3) Sigma-Delta (SD)

PWM is arguably the simplest and possibly most widespread PDM technology that is praised for its low complexity.

DDPM [1][2][3] is a type of digital modulation technique in which the pulse width are quantized in a dyadic manner, meaning that they are quantized in powers of two, which allows for efficient implementation using binary arithmetic. As a consequence, DDPM modulators are relatively inexpensive to deploy, with a complexity comparable to that of widespread Pulse-Width Moduldation (PWM) modulators.

SD modulators are perhaps the best-performing PDM technology, that is particularly well suited for higher resolution data conversion, and typically find use in audio DACs and fractional PLLs. In spite their good performaces, SD modulators are Infinite-Impulse-Response (IIR) closed-loop systems which behavior depends on the input signal, resulting in a number of concerns (most notably stability), that typically require careful modeling of the systems and condition under which they will be required to operate.

This design is separated into two section: 1) 6 bits resolution instance of PWM, DDPM and SD fed by a static DC value from the input pins 2) 8 bits resolution instance of PWM, DDPM and SD fed sine look-up-table (LUT) that allows to evaluate the spectral content of the modulated signals.

#### How to test

Common: The circuit needs to be fed with a clock on pin io\_in[0]. Reset signal needs to be released by raising to 1 pin io\_in[1].

#### Static DC:

The input inval of the first set of DC modulators is fed through pins io\_in[7:2]. The low-passed dc component of the outputs on pins io\_out[0] (PWM), io\_out[1], and io\_out[2] is proportional to the decimal value of the input inval.

Sinusoidal output: The low-passed outputs on pins io\_out[4] (PWM), io\_out[5], and io\_out[6] is a sinusoidal wave. When clocking the chip with a clock frequency of  $12.5 \, \text{kHz}$ , the frequency of the sine is of  $0.76 \, \text{z}$  (fsin = fclock /  $2^14$ ), so that it should be visible at naked eye. The different designs should achieve an ENOB of 8 bits in the band  $0-40 \, \text{Hz}$ , with different level of out-of-band emission between each other.

| # | Input    | Output                |
|---|----------|-----------------------|
| 0 | clock    | pwm / segment a       |
| 1 | resetn   | ddpm / segment b      |
| 2 | inval[0] | sd / segment c        |
| 3 | inval[1] | 0'b1 / segment d      |
| 4 | inval[2] | pwm_sine / segment e  |
| 5 | inval[3] | ddpm_sine / segment f |
| 6 | inval[4] | sd_sine / segment g   |
| 7 | inval[5] | 0'b1 / dot            |

## 34 : CRC Decelerator



Figure 7: picture

Author: Grant Hernandez (@grant-h)

Description: A reconfigurable CRC engine

GitHub repository

HDL project

Extra docs

Clock: 1000 Hz

External hardware:

#### How it works

"The world's slowest CRC!"

The Cyclic Redundancy Check Decelerator is a reconfigurable CRC block that can be programmed to calculate any CRC up to 32-bits with arbitrary length, streamed input data. Since TinyTapeout 3 (TT03) I/O speeds are low, its unlikely that this CRC engine will be faster than the CPU/microcontroller streaming in data, hence "decelerator".

This TT03 project is the follow up to my earlier, full-custom VLSI version of a CRC-32 datapath, built in Cadence, and fabricated using the MOSIS service while attending my university. Read more about my original the CRC-32 design and check out the die shots.

# [3D-View]

# I/O Interface

The CRC IP has 8 input pins. Two are for the clk (io\_in[1]) and reset (rst / io\_in[1]). Then there is a two-bit command input, cmd (io\_in[3:2]), and the remaining 4-bits are are for data input, data\_in (io\_in[7:4]). The design uses

only the positive edge of the clock and has a synchronous, active high, reset line. The data input is limited to passing a single nibble at a time. How this data is used depends on the current command.

Here is a table showing the I/O pins:

| # | Input      | Output      |
|---|------------|-------------|
| 0 | clk        | data_out[0] |
| 1 | rst        | data_out[1] |
| 2 | cmd[0]     | data_out[2] |
| 3 | cmd[1]     | data_out[3] |
| 4 | data_in[0] | data_out[4] |
| 5 | data_in[1] | data_out[5] |
| 6 | data_in[2] | data_out[6] |
| 7 | data_in[3] | data_out[7] |

## There are 4 supported commands:

| #     | Name        | Description                                                                                                                                             |
|-------|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2'b00 | CMD_RESET   | Restarts the CRC calculations using the parameters from the last SETUP bitstream                                                                        |
| 2'b01 | CMD_SETUP   | Streams in a CRC-bitwidth dependent bitstream to configure the CRC parameters                                                                           |
| 2'b10 | CMD_MESSAGE | Stream in a message to CRC 4-bits at a time. First cycle is lower 4-bits. Second cycle is upper 4-bits. Wait 8 cycles for byte to be processed. Repeat. |
| 2'b11 | CMD_FINAL   | Continually stream out the final CRC value 8-bits at a time in a loop until deasserted                                                                  |

The overall flow is, a SETUP bitstream containing the CRC bitwidth, reflect in/out parameters, CRC poly, initial value, and XOR out is streamed in. Then the MESSAGE is streamed in 4-bits at a time until the message is complete. Finally, the FINAL is asserted and the final CRC value is streamed out on the output pins. To restart another CRC fresh, send RESET or resume adding additional data to the existing CRC by using MESSAGE.

## **CRC Setup Bitstream**

The most complex portion of the using this CRC IP is streaming in the configuration bitstream. This bitstream can be from 20-bits in a CRC-4 or less case and up to 104 bits in a CRC-32 case. The bitstream format is roughly: [config\_lo - 4 bits] [config\_hi - 4 bits] [poly - 4N bits] [init - 4N bits] [xor - 4N-bits]. Each nibble is packed with the MSB on data\_in[3]. config\_lo is 4-bits and defines the first 4-bits of the CRC bitwidth bitwidth[3:0]. config\_hi is also 4-bits and it contains the top-2 bits of the bitwidth and the reflect\_out and reflect\_in parameters of the CRC: [bitwidth[5]] [bitwidth[4]] [reflect\_out] [reflect\_in].

This initial configuration is always 8-bits, but the remaining bitstream is variable length dependent on the bitwidth. Following the initial parameters is the poly, init value, and XOR out values. These are equal length and are streamed one nibble at a time from least-significant nibble to most (least significant being bits[3:0]).

This process is best demonstrated using a timing diagram. For the example we'll be using the CRC-16/USB parameters which are width=16 poly=0x8005 init=0xffff refin=true refout=true xorout=0xffff check=0xb4c8:

## CRC-16/USB Setup Bitstream

The fully packed bitstream in hex is f35008ffffffff. The first nibble corresponds to bitwidth [3:0] = 0xf. This represents the value 15, which is one less than the bitwidth. This is because the CRC treats a bitwidth of zero as a CRC-1. You must pass in a bitwidth one less than the desired bitwidth to account for this. The second nibble unpacks as 4'b0011 in binary which makes the top 2-bits of the bitwidth be zero and sets reflect in and out to be true. The remaining nibbles are the configuration parameters least-significant nibble to most. Note that setup\_fsm is internal to the design.

# **CRC Message Streaming**

Once the CRC's parameters have been initialized, you can switch to CMD\_MESSAGE to stream in data to be CRC'd one nibble at a time. CRCs typically use the check message of 123456789 which is 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38 0x39 in individual bytes. Here is a timing diagram showing this message being streamed in and the CRC's result register:

# CRC-16/USB 123456789 Check Message

The first break skips the 8 clock cycles for 0x31. The second break skips the middle 6 bytes and the final break skips the last shifting clock cycles. Following CMD\_MESSAGE you would signal CMD\_FINAL on cmd to have the CRC value streamed out on the data

output pins. Note that crc\_state is internal to the design.

The CRC decelerator is a reconfigurable CRC block that can be programmed to calculate different CRC values up to 64-bits with arbitrary length streamed input data. Since clock speeds are low, its unlikely that this CRC engine will be faster than the CPU streaming in data, hence "decelerator".

To begin, a SETUP bitstream containing the bitwidth, reflect in/out, CRC poly, init, and XOR out is sent. Then the MESSAGE is streamed in 4-bits at a time until the message is complete. Finally, the FINAL is signaled, leading the final CRC value to be streamed out. To calculate another CRC fresh, send RESET.

#### How to test

See documentation.

| # | Input      | Output      |
|---|------------|-------------|
| 0 | clk        | data_out[0] |
| 1 | rst        | data_out[1] |
| 2 | cmd[0]     | data_out[2] |
| 3 | cmd[1]     | data_out[3] |
| 4 | data_in[0] | data_out[4] |
| 5 | data_in[1] | data_out[5] |
| 6 | data_in[2] | data_out[6] |
| 7 | data_in[3] | data_out[7] |
|   |            |             |

# 35: 7 segment seconds

Author: Matt Venn

• Description: Count up to 10, one second at a time.

GitHub repository

HDL projectExtra docs

Clock: 1000 HzExternal hardware:

#### How it works

Uses a set of registers to divide the clock, and then some combinational logic to convert from binary to decimal for the display.

Puts a slower square wave output on output 7.

#### How to test

After reset, the counter should increase by one every second.

| # | Input | Output            |
|---|-------|-------------------|
| 0 | clock | segment a         |
| 1 | reset | segment b         |
| 2 | none  | segment c         |
| 3 | none  | segment d         |
| 4 | none  | segment e         |
| 5 | none  | segment f         |
| 6 | none  | segment g         |
| 7 | none  | slow clock output |

# 36: Binary to DEC and HEX

Author: Norberto Hernandez-Como

 Description: Converts a 4 digit binary number to decimal or to hexadecimal using a 7-segment display

GitHub repository

Wokwi project

Extra docs

■ Clock: 0 Hz

External hardware:

#### How it works

Converts a 4 digit binary number to decimal or to hexadecimal using a 7-segment display

#### How to test

Select binary to decimal with input 6 (named Dec, the dot is displayed) or binary to hexadecimal with input 7 (named Hex). Input a 4 digit binary number using inputs B3, B2, B1, B0 and see in the 7-segment display the converted number in decimal or hexadecimal. All forbiden combinations are not displayed.

| # | Input | Output    |
|---|-------|-----------|
| 0 | B0    | segment a |
| 1 | B1    | segment b |
| 2 | B2    | segment c |
| 3 | B3    | segment d |
| 4 | none  | segment e |
| 5 | none  | segment f |
| 6 | Dec   | segment g |
| 7 | Hex   | dot       |

# **Technical** info

## Scan chain

All 250 designs are joined together in a long chain similiar to JTAG. We provide the inputs and outputs of that chain (see pinout below) externally, to the Caravel logic analyser, and to an internal scan chain driver.

The default is to use an external driver, this is in case anything goes wrong with the Caravel logic analyser or the internal driver.

The scan chain is identical for each little project, and you can read it here.



Figure 8: block diagram

## Updating inputs and outputs of a specified design

A good way to see how this works is to read the FSM in the scan controller. You can also run one of the simple tests and check the waveforms. See how in the scan chain

verification doc.

- Signal names are from the perspective of the scan chain driver.
- The desired project shall be called DUT (design under test)

Assuming you want to update DUT at position 2 (0 indexed) with inputs = 0x02 and then fetch the output. This design connects an inverter between each input and output.

- Set scan\_select low so that the data is clocked into the scan flops (rather than from the design)
- For the next 8 clocks, set scan\_data\_out to 0, 0, 0, 0, 0, 0, 1, 0
- Toggle scan\_clk\_out 16 times to deliver the data to the DUT
- Toggle scan\_latch\_en to deliver the data from the scan chain to the DUT
- Set scan\_select high to set the scan flop's input to be from the DUT
- Toggle the scan\_clk\_out to capture the DUT's data into the scan chain
- Toggle the scan\_clk\_out another 8 x number of remaining designs to receive the data at scan data in



Figure 9: update cycle

## Notes on understanding the trace

- There are large wait times between the latch and scan signals to ensure no hold violations across the whole chain. For the internal scan controller, these can be configured (see section on wait states below).
- The input looks wrong (0x03) because the input is incremented by the test bench as soon as the scan controller captures the data. The input is actually 0x02.
- The output in the trace looks wrong (0xFE) because it's updated after a full refresh, the output is 0xFD.

# Clocking

Assuming:

- 100MHz input clock
- 8 ins & 8 outs
- 2 clock cycles to push one bit through the scan chain (scan clock is half input clock rate)
- 250 designs
- scan controller can do a read/write cycle in one refresh

So the max refresh rate is 100MHz / (8 \* 2 \* 250) = 25000Hz.

## Clock divider

A rising edge on the set\_clk\_div input will capture what is set on the input pins and use this as a divider for an internal slow clock that can be provided to the first input bit.

The slow clock is only enabled if the set\_clk\_div is set, and the resulting clock is connected to input0 and also output on the slow\_clk pin.

The slow clock is synced with the scan rate. A divider of 0 mean it toggles the input0 every scan. Divider of 1 toggles it every 2 cycles. So the resultant slow clock frequency is scan\_rate / (2 \* (N+1)).

See the test\_clock\_div test in the scan chain verification.

## Wait states

This dictates how many wait cycle we insert in various state of the load process. We have a sane default, but also allow override externally.

To override, set the wait amount on the inputs, set the driver\_sel inputs both high, and then reset the chip.

See the test\_wait\_state test in the scan chain verification.

# **Pinout**

| PIN   | NAME          | DESCRIPTION                                   |
|-------|---------------|-----------------------------------------------|
| 20:12 | active_select | 9 bit input to set which design is active     |
| 28:21 | inputs        | 8 inputs                                      |
| 36:29 | outputs       | 8 outputs                                     |
| 37    | ready         | goes high for one cycle everytime the scancha |
| 10    | slow_clk      | slow clock from internal clock divider        |
| 11    | set clk div   | enable clock divider                          |

| 9:8 | driver_sel        | which scan chain driver: 00 = external, 01 |
|-----|-------------------|--------------------------------------------|
| 21  | ext scan clk out  | for external driver, clk input             |
| 22  | ext_scan_data_out | data input                                 |
| 23  | ext_scan_select   | scan select                                |
| 24  | ext_scan_latch_en | latch                                      |
| 29  | ext_scan_clk_in   | clk output from end of chain               |
| 30  | ext scan data in  | data output from end of chain              |

## Instructions to build GDS

To run the tool locally or have a fork's GitHub action work, you need the GH\_USERNAME and GH\_TOKEN set in your environment.

GH\_USERNAME should be set to your GitHub username.

To generate your  $GH\_TOKEN$  go to https://github.com/settings/tokens/new . Set the checkboxes for repo and workflow.

To run locally, make a file like this:

```
export GH_USERNAME=<username>
export GH_TOKEN=<token>
```

And then source it before running the tool.

# Fetch all the projects

This goes through all the projects in project\_urls.py, and fetches the latest artifact zip from GitHub. It takes the verilog, the GL verilog, and the GDS and copies them to the correct place.

```
./configure.py --clone-all --fetch-gds
```

# **Configure Caravel**

Caravel needs the list of macros, how power is connected, instantiation of all the projects etc. This command builds these configs and also makes the README.md index.

```
./configure.py --update-caravel
```

#### **Build the GDS**

To build the GDS and run the simulations, you will need to install the Sky130 PDK and OpenLane tool. It takes about 5 minutes and needs about 3GB of disk space.

```
export PDK_ROOT=<some dir>/pdk
export OPENLANE_ROOT=<some dir>/openlane
cd <the root of this repo>
make setup
Then to create the GDS:
make user_project_wrapper
```

# Changing macro block size

After working out what size you want:

- adjust configure.py in CaravelConfig.create\_macro\_config().
- adjust the PDN spacing to match in openlane/user\_project\_wrapper/config.tcl:

```
- set ::env(FP_PDN_HPITCH)
- set ::env(FP_PDN_HOFFSET)
```

# Verification

We are not trying to verify every single design. That is up to the person who makes it. What we want is to ensure that every design is accessible, even if some designs are broken.

We can split the verification effort into functional testing (simulation), static tests (formal verification), timing tests (STA) and physical tests (LVS & DRC).

See the sections below for details on each type of verification.

# Setup

You will need the GitHub tokens setup as described in INFO.

The default of 250 projects takes a very long time to simulate, so I advise overriding the configuration:

```
# fetch the test projects
./configure.py --test --clone-all
# rebuild config with only 20 projects
./configure.py --test --update-caravel --limit 20
```

You will also need iVerilog & cocotb. The easist way to install these are to download and install the oss-cad-suite.

# **Simulations**

- Simulation of some test projects at RTL and GL level.
- Simulation of the whole chip with scan controller, external controller, logic analyser.
- Check wait state setting.
- Check clock divider setting.

#### Scan controller

This test only instantiates user\_project\_wrapper (which contains all the small projects). It doesn't simulate the rest of the ASIC.

```
cd verilog/dv/scan_controller
make test_scan_controller
```

The Gate Level simulation requires scan\_controller and user\_project\_wrapper to be re-hardened to get the correct gate level netlists:

- Edit openlane/scan\_controller/config.tcl and change NUM\_DESIGNS=250 to NUM\_DESIGNS=20.
- Then from the top level directory:
   make scan\_controller make user\_project\_wrapper
- Then run the GL test
   cd verilog/dv/scan\_controller make test\_scan\_controller\_gl

## single

Just check one inverter module. Mainly for easy understanding of the traces.

make test single

#### custom wait state

Just check one inverter module. Set a custom wait state value.

make test\_wait\_state

#### clock divider

Test one inverter module with an automatically generated clock on input 0. Sets the clock rate to 1/2 of the scan refresh rate.

make test clock div

# Top level tests setup

For all the top level tests, you will also need a RISCV compiler to build the firmware.

You will also need to install the 'management core' for the Caravel ASIC submission wrapper. This is done automatically by following the PDK install instructions.

# Top level test: internal control

Uses the scan controller, instantiated inside the whole chip.

cd verilog/dv/scan\_controller\_int
make coco test

## Top level test: external control

Uses external signals to control the scan chain. Simulates the whole chip.

```
cd verilog/dv/scan_controller_ext
make coco_test
```

## Top level test: logic analyser control

Uses the RISCV co-processor to drive the scanchain with firmware. Simulates the whole chip.

```
cd verilog/dv/scan_controller_la
make coco_test
```

## Formal Verification

- Formal verification that each small project's scan chain is correct.
- Formal verification that the correct signals are passed through for the 3 different scan chain control modes.

#### Scan chain

Each GL netlist for each small project is proven to be equivalent to the reference scan chain implementation. The verification is done on the GL netlist, so an RTL version of the cells used needed to be created. See here for more info.

#### Scan controller MUX

In case the internal scan controller doesn't work, we also have ability to control the chain from external pins or the Caravel Logic Analyser. We implement a simple MUX to achieve this and formally prove it is correct.

# **Timing constraints**

Due to limitations in OpenLane - a top level timing analyis is not possible. This would allow us to detect setup and hold violations in the scan chain.

Instead, we design the chain and the timing constraints for each project and the scan controller with this in mind.

• Each small project has a negedge flop flop at the end of the shift register to reclock the data. This gives more hold margin.

- Each small project has SDC timing constraints
- Scan controller uses a shift register clocked with the end of the chain to ensure correct data is captured.
- Scan controller has its own SDC timing constraints
- Scan controller can be configured to wait for a programmable time at latching data into the design and capturing it from the design.
- External pins (by default) control the scan chain.

# Physical tests

- LVS
- DRC
- CVC

#### **LVS**

Each project is built with OpenLane, which will check LVS for each small project. Then when we combine all the projects together we run a top level LVS & DRC for routing, power supply and macro placement.

The extracted netlist from the GDS is what is used in the formal scan chain proof.

#### **DRC**

DRC is checked by OpenLane for each small project, and then again at the top level when we combine all the projects.

#### **CVC**

Mitch Bailey' CVC checker is a device level static verification system for quickly and easily detecting common circuit errors in CDL (Circuit Definition Language) netlists. We ran the test on the final design and found no errors.

- See the paper here.
- Github repo for the tool: https://github.com/d-m-bailey/cvc

# Toplevel STA and Spice analysis for TinyTapeout-02 and 03.

Contributed by J. Birch 22 March 2023

#### Introduction

TinyTapeout is built using the OpenLane flow and consists of three major components:

- 1) up to 250 user designs with a common interface, each of which has been composed using the OpenLane flow,
- 2) a scanchain block that is instanced for each user design to provide input data and to sink output data,
- 3) a scan chain controller.

The OpenLane flow allows for timing analysis of each of these individual blocks but does not provide for analysis of the assembly of them, which leaves potential holes in timing that could cause failures.

This work allows a single design to be assembled out of the sub-blocks and this can then be submitted to STA for analysis.

In addition, an example critical path (for the clock that is passed along the scanchain) is created using a Python script so that it can be run through SPICE.

# GitHub action

The STA github action automatically runs the STA when the repository is updated.

# **Prerequisites**

To run STA, the following files need to be available:

- 1) a gate level verilog netlist of the top level design (verilog/rtl/user\_project\_wrapper.v)
- 2) a gate level verilog netlist for each sub-block (verilog/rtl/)
- 3) a SPEF format parasitics file for the top level wiring (spef/)
- 4) a SPEF format parasitics file for each sub-block (spef/)
- 5) the relevant SkyWater libraries for STA analysis installed with the PDK
- 6) OpenSTA 2.4.0 (or later) needs to be on the PATH. rundocker.sh shows how to use STA included in the OpenLane docker
- 7) python3.9 or later
- 8) verilog parser (pip3 install verilog-parser)

9) a constraints file (sta\_top/top.sdc)

# **Assumptions**

- all of the Verilog files are in a single directory
- all of the SPEF files are in a single directory

## **Issues**

The Verilog parser has two bugs: it does not recognise 'inout' and it does not cope with escaped names (starting with a "). Locally this has been fixed but to use the off-the-shelf parser we preprocess the file to change inout to input and remove the escapes and substitute the [nnn] with *nnn* in the names

# **Invoking STA analysis**

```
sta_top/toplevel_sta.py <path to verilog> <path to spef> <sdc>
eg:
sta_top/toplevel_sta.py ./verilog/gl/user_project_wrapper.v
./spef/user_project_wrapper.spef ./sta_top/top.sdc > sta.log
Alternatively you can enter interactive mode after analysis:
sta_top/toplevel_sta.py ./verilog/gl/user_project_wrapper.v
./spef/user_project_wrapper.spef ./sta_top/top.sdc -i
```

## How it works

- preprocesses the main Verilog
- parses the main Verilog to find which modules are used
- creates a new merged Verilog containing all the module netlists and the main netlist
- creates a tcl script in the spef directory to load the relevant spefs for each instanced module in the main verilog
- creates all\_sta.tcl in the verilog/gl directory that does the main STA steps (loading design, loading constraints, running analyses) this mimics what OpenLane does
- creates a script sta.sh in the verilog/gl directory which sets up the environment and invokes STA to run all\_sta.tcl
- runs sta.sh

# **Invoking Spice simulation:**

## **Important notes**

- This uses at least ngspice-34, tested on ngspice-39, which has features to increase speed and reduce memory image when running with the SkyWater spice models. In addition the following needs to be set in ~/spice.rc and/or ~/.spiceinit:
  - set ngbehavior=hs
- Note the spelling of ngbehavior (no u). If this is not set then the simulation will run out of memory
- If the critical path changes then the spice\_path.py script will need to be adapted to follow suit

#### Instructions

- go to where the included spice files will be found cd \$PDK\_ROOT/sky130A/libs.tech/ngspice
- run the script to get the critical path spice circuit 250 stages of the clock
   /sta\_top/spice\_path.py path.spice
- invoke spice (takes about 2 minutes)ngspice path.spice
- run the sim (takes about 2 minutes)tran 1n 70n
- show the results plot i0 i250

This chart shows the input clock and how it changes after 250 blocks. It was simulated for 200n to capture a clean pair of clock pulses.



Figure 10: clock\_spice.png

# Sponsored by



## **Team**

Tiny Tapeout would not be possible without a lot of people helping. We would especially like to thank:

- Uri Shaked for wokwi development and lots more
- Sylvain Munaut for help with scan chain improvements
- Mike Thompson for verification expertise
- Jix for formal verification support
- Proppy for help with GitHub actions
- Maximo Balestrini for all the amazing renders and the interactive GDS viewer
- James Rosenthal for coming up with digital design examples
- All the people who took part in TinyTapeout 01 and volunteered time to improve docs and test the flow
- The team at YosysHQ and all the other open source EDA tool makers
- Efabless for running the shuttles and providing OpenLane and sponsorship
- Tim Ansell and Google for supporting the open source silicon movement
- Zero to ASIC course community for all your support
- Jeremy Birch for help with STA
- Aisler for sponsoring PCB development