

# **Hardware Synthesis Laboratory**

## 1<sup>st</sup> semester

## **Academic Year 2024**

```
module fulladder (cout,s,a,b,cin);
output cout;
output s;
input a,b,cin;

//assign {cout,s}=a+b+cin;
reg cout, s;
always @(a or b or cin)
begin
{cout,s}=a+b+cin;
end
endmodule
```



Krerk Piromsopa, Ph. D.

This document is a part of the 2110363 Hardware Synthesis Lab I,

Department of Computer Engineering, Chulalongkorn University.

© All rights reserved.

#### **Preface**

Since the 1970s, Hardware Description Languages (HDLs) have been developed to manage the increasing complexity of digital electronic circuits. With the growing popularity of Field-Programmable Gate Arrays (FPGAs), mastering hardware synthesis and HDL has become essential in the field of computer engineering.

For over 20 years, the Department of Computer Engineering at Chulalongkorn University has been a pioneer in Thailand, offering courses on Verilog HDL and Hardware Synthesis Lab. This course has evolved from using an initial in-house FPGA board to standardized developer boards. Over the years, countless students have both cherished and struggled with this challenging class. The remarkable success of the course is evident from the substantial number of graduates who have secured positions in esteemed semiconductor industries like Intel, AMD, IBM, and others.

As an ardent hardware design enthusiast, I took the initiative to create a comprehensive lab book to facilitate the study of HDL and Hardware Synthesis Lab. I firmly believe that a deeper understanding of hardware directly enhances one's abilities as a programmer. By providing a user-friendly and accessible resource, I aim to empower fellow students and enthusiasts to grasp the intricacies of HDL and hardware synthesis with greater ease. Through this endeavor, I hope to foster a community of skilled and knowledgeable individuals in the realm of hardware design and programming.

For those aspiring to delve deeper into the world of digital design, this course offers a platform to showcase your skills. Each academic year, we typically see 2-3 exceptional individuals who demonstrate a natural aptitude for HDL. Many of these talented students eventually pursue Ph.D. programs in hardware-related fields and find success in various industries.

However, we also want to emphasize that this class is open to all, including those who may not yet be comfortable with hardware design. The minimum requirement is a solid understanding of and proficiency in using digital tools. We believe that with dedication and hard work, anyone can grasp the concepts and excel in this exciting field of study. Whether you are a seasoned hardware enthusiast or just starting your journey, this course offers a valuable opportunity to broaden your knowledge and skills in digital design.

Our primary goal is to equip students with practical knowledge and valuable skills that they can carry forward into their future endeavors. We aim to create an engaging and enriching learning experience so that students can genuinely benefit from this class.

Thank you for your encouragement, and we hope all students find joy in their learning journey! If anyone faces challenges or needs support, our team is here to provide guidance

.

and assistance throughout the course. Let's make this learning adventure both rewarding and enjoyable!

Krerk Piromsopa, Ph.D.

**Associate Professor** 

Department of Computer Engineering,

Chulalongkorn University

August 4, 2024

## **Table of Contents**

| VerilogHDL                                                      | 6                    |                    |    |
|-----------------------------------------------------------------|----------------------|--------------------|----|
| Laboratory 1: Introduction to VerilogHDL and Digital Simulation | 7<br>7<br>7<br>12    |                    |    |
|                                                                 |                      | Background         | 12 |
|                                                                 |                      | TDM                | 12 |
|                                                                 |                      | Clock Division     | 13 |
|                                                                 |                      | Language Templates | 13 |
|                                                                 |                      | Exercises          | 15 |
| Laboratory 3: Counter and Switch (Debounce)                     | 16                   |                    |    |
| Objectives                                                      | 16                   |                    |    |
| Background                                                      | 16                   |                    |    |
| Switch and Bounce                                               | _                    |                    |    |
|                                                                 |                      |                    |    |
|                                                                 | Laboratory 4: Memory |                    |    |
| Objectives                                                      |                      |                    |    |
| Background                                                      | 19                   |                    |    |
| ROM                                                             | 19                   |                    |    |
| RAM                                                             | 21                   |                    |    |
| Block RAM                                                       | 22                   |                    |    |
| Exercises                                                       | 24                   |                    |    |
| Laboratory 5: Simple CPU and Memory Mapped I/O                  | 26                   |                    |    |
| Objectives                                                      | 26                   |                    |    |
| Background                                                      | 26                   |                    |    |
| Memory Mapped I/O and Port-Mapped I/O                           | 26                   |                    |    |
| Exercises                                                       | 27                   |                    |    |
| Laboratory 6: VGA and UART                                      | 32                   |                    |    |
| Objectives                                                      | 32                   |                    |    |
| Background                                                      | 32                   |                    |    |
| VGA                                                             |                      |                    |    |
| UART                                                            | 32                   |                    |    |
| Exercises                                                       | 33                   |                    |    |

# VerilogHDL

## Laboratory 1: Introduction to VerilogHDL and Digital Simulation

## **Objectives**

- 1. Get students to familiar with the simulation tool (Vivado)
- 2. Demonstrate the basic of Verilog simulation and waveform output
- 3. Able to explain structural model and behavioral model
- 4. Able to explain the differences between blocking and non-blocking assignments

## **Background**

In this lab, you will learn the fundamentals of VerilogHDL and Digital Simulation using the Vivado Design suite. Firstly, please download the free (webpack) version of Vivado Design Suite from Xilinx Web site<sup>1</sup>. The whole download is about 20GB. Alternatively, you may download it from the department server<sup>2</sup>. Should you have trouble finding a machine for installing the software, please contact the instructors. A (virtual) machine can be provided for you to remotely work with the tool. However, you may still have to install the Lab Edition to download the design to the FPGA board. For more information about the installation and Vivado IDE, please what the Xilinx tutorials' videos<sup>3</sup>.

Please watch the demonstration video on how to use the simulation.

#### **Exercises**

1. Complete the following 1-bit full adder and a test bench to validate such design by simulating all possible inputs. Use the Vivado tool to simulate and validate the design.

```
module fullAdder(cout, s, a, b, cin);
output cout;
```

1<sup>st</sup> Semester / 2024

Krerk Piromsopa, Ph.D.

https://www.xilinx.com/products/design-tools/vivado.html

https://mis.cp.eng.chula.ac.th/krerk/teaching/2018s2-HWSynLab/ Username: student, password: HWSynLab

https://www.xilinx.com/products/design-tools/vivado.html#video

```
output s;
input a;
input b;
input cin;
reg cout, s;
always @(
               )
begin
end
endmodule
`timescale 1ns/1ns
module tester;
      reg a,b,cin;
      wire cout,s;
      fullAdder a1(cout,s,a,b,cin);
initial
begin
      //$dumpfile("time.dump");
      //$dumpvars(2,a1);
      $monitor("time %t: {%b %b} <- {%d %d %d}", $time,cout,s,a,b,cin);</pre>
      a=0;
      b=0;
      cin=0;
      //....
      $finish;
end
endmodule
```

2. What would happen if we replace the always block of the full adder in question 1 with the following module? Would it give the same result? Please run the test bench and provide your analysis.

```
module fullAdder(cout, s, a, b, cin);
Indeed, our primary goal is to equip students with practical knowledge and valuable skills that they can carry forward into their future endeavors. We aim to create an engaging and enriching learning experience so that students can genuinely benefit from this class.

Thank you for your encouragement, and we hope all students find joy in their learning journey! If anyone faces challenges or needs support, our team is here to provide guidance and assistance throughout the course. Let's make this learning adventure both rewarding and enjoyable!
output cout;
output s;
input a;
input b;
input cin;
```

```
assign {cout,s} = a + b + cin;
endmodule
```

3. Please modify the following latch to be a (positive edge triggering) flip flop with asynchronous reset. Please also modify the test bench to validate your design.

```
`timescale 1ns/1ns

module DFlipFlop(q,clock,nreset,d);

output q;
input clock,nreset,d;

reg q;

always @(clock)
begin
    if (nreset==1)
        q=d;
    else
        q=0;
end
endmodule
```

```
module testDFlipFlop();
reg clock, nreset, d;
DFlipFlop D1(q,clock,nreset,d);
always
      #10 clock=~clock;
initial
begin
//$dumpfile("testDFlipFlop.dump");
     //$dumpvars(1,D1);
      #0 d=0;
      clock=0;
     nreset=0;
      #50 nreset=1;
      #1000 $finish;
end
always
      #8 d=~d;
endmodule
```

4. What are the differences between the 2 provided designs? Please write a test bench to show your analysis.

- 5. Please answer the following questions and submit (in PDF format) to MyCourseVille on Friday before 23:59 (midnight).
  - 1. Please draw a schematic representing the logical blocks of both shiftA and shiftB in exercise 4.
  - 2. What is the difference between blocking and non-blocking assignments?
  - 3. Is it possible to apply parameters to the design in exercise 4 to create shiftRegister with any number of bits? If Yes, please explain how.

# Laboratory 2: Time-Division Multiplexing and Clock Divider

## **Objectives**

- 1. Get students to familiar with the synthesis tool (Vivado)
- 2. Synthesis FPGA
- 3. Able to explain time-division multiplexing
- 4. Able to design clock divider
- 5. Able to use a language template.

## **Background**

#### **TDM**

To drive a seven-segment display (whether it is common anode or common cathode), each digit would require 9 wires (a to g, dot, common ground or common vcc). With several digits of seven-segment display, the number of wires would intuitively multiplied. For example, 4 digits of seven-segment displays may require up to 33 wires (a to g and dot for each digit with a sharing common wire). It is not practical to have so many wires. To share (reduce) physical wires, Time-Division Multiplexor is introduced.

With time-division multiplexing, we can share 8 wires (a to g and dot) among the displays. Only a digit will be active at a time. If the segments turn on and off at the appropriate rate (I.e. 15 frames per second or more), the observer would see it as if all segments are on at the same time.--hence the term time-division multiplexing. This way, four digits of seven-segment displays can be connected with only 12 wires (8 from a to g and dot +4 for activating digits).



For more details about time-division multiplexing, please watch the demonstration video.

#### **Clock Division**

There are several ways to divide high frequency clocks into slower clocks. A simple solution is to cascade D flip flops (or even T flip flops) together by feeding ~Q0 to D0 and feed Q0 as a clock for D1 (and so on). Nonetheless, this is just one implementation of the clock division. You may use a counter to set and clear a bit as a clock division as well.



## Language Templates

Vivado IDE tool comes bundled with language templates. Language templates are basically code snippets for HDL. You may access the language templates from *menu* > *Tools* > *Language Templates*. A language template that might be useful for this lab is 7-segment encoding.

#### **Exercises**

- 1. Use your knowledge from clock division and time-division multiplexor to display a 4-digit hexadecimal number (0x1234) to the seven-segment display of the BASYS 3 board. Your design should be modularized (You can save the component for reuse later). There should be at least 3 modules: clock divider, hex (or bcd) to 7-segment encoder and 7-segment TDM.
- 2. Please answer the following questions and submit (in PDF format) to MyCourseVille on Friday before 23:59 (midnight).
  - a. Is the 4-digit seven-segment display on the BASYS 3 board a common anode for common cathode? Please explain.
  - b. From the wiring of the board, which logic do you have to assign to the 7-segment pins (a to g and dot) to turn the LED on.
  - c. Given that the clock of the BASYS3 is around 10ns, how many bits do you have to divide the clock with to get the appropriate clock for the TDM. Please provide your analysis (calculation).

#### Hint

- 1. Use the BASYS 3 XDC<sup>4</sup> file as a base constraint file.
- 2. Read the datasheet to determine the interconnection in the board.
- 3. Use the language templates for 7-segment encoder

\_

https://mis.cp.eng.chula.ac.th/krerk/teaching/2018s2-HaWSynLab/downloads/Basys-3-Master.xdc

## Laboratory 3: Counter and Switch (Debounce)

## **Objectives**

- 1. Synthesis FPGA
- 2. Able to design debounce switch and input
- 3. Able to design up and down Counter

## **Background**

#### Switch and Bounce

Mechanic switches and relays have a common issue called contact bounce (aka. chatter). Switch and relay contacts are usually made of metals. When the contacts strike together, their momentum and elasticity act together to cause them to bounce apart one or more times before making steady contact. (Imagine a ball falling on a fall, it would bounce several times before coming to a complete stop.)



There are several ways to debounce. Debouncing methods include using capacitor, SR Latch, or Low-pass filtered schmitt trigger. However, those methods usually required special hardware.

1st Semester / 2024

Krerk Piromsopa, Ph.D.

<sup>5</sup> Image taken from

https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Bouncy\_Switch.png/400px-Bouncy\_Switch.png

To debounce without using special hardware, we can use software methods by resampling for input several times.



Please also note that there is a metastable issue. To avoid this, it is generally advised that two D flip flops be placed between the input and the digital circuit.

#### **Exercises**

- 1. Create an up/down 1-digit BCD counter with 4-bit outputs (DCBA) and 1 overflow output (cout), 1 borrow (bout) and 6 inputs (up, down, set9, set0, clock). Write a simulator to show that the counter functions correctly.
- 2. Create a single pulser with one input, clock, and one output. Write a simulator to show that the single pulser works correctly.
- 3. Use 4 counters from exercise 1 to create 4 digits BCD counters. Connect all displays to 4 digits seven-segment displays. (Use the display components from Laboratory II.) Use BTNU for set9 (set the number to 9999). Use BTNC for reset (set the number to 0000). Use SW0 for countdown by 1. Use SW1 for count up by 1. Use SW2 for countdown by 10. Use SW3 for count up by 10. Use SW4 for countdown by 100. Use SW5 for count up by 100. Use SW6 for countdown by 1000. Use SW7 for count up by 1000. If the number is at 0000, a countdown would not decrease the number. If the number is at 9999, a count up would not increase the number. Do not worry about the bounce at the moment. We will fix it in the next exercise.
- 4. Correct the bounce in exercise 2 by implementing a debounce component for each input.
- 5. Please answer the following questions and submit (in PDF format) to

1st Semester / 2024

Krerk Piromsopa, Ph.D.

MyCourseVille on Friday before 23:59 (midnight).

- a. From the circuit diagram, the BTNx is active High or active Low? Please provide your analysis.
- b. What is a bounce? How do you programmatically debounce the input? Please provide your analysis.
- c. Please show your method for implementing a single pulser. (e.g. draw a state diagram, or verilogHDL code)

## Laboratory 4: Memory

## **Objectives**

- 1. Able to implement memory in HDL
- 2. Able to instantiate internal FPGA memory

## **Background**

We would be soon implementing our first processor in the next Lab!. Now, you should have an understanding of how to implement a FSM with Verilog. What you are missing is how to implement a memory model on Verilog such that you can use it on your very first processor. We will be looking at read-only-memory (ROM), random-access-memory (RAM), and first-in-first-out (FIFO)

#### **ROM**

The first kind of memory you are going to implement is read-only-memory (ROM). So far, you have been using only Verilog for synthesizing registers (D-Flip Flop). However, it's costly to implement memory using purely registers. Field-programmable gate array (FPGA) manufacturers often include blocks of memory inside the FPGA such that you can use.

Typical ROM instantiation looks like the following

```
module rom case(
    (* synthesis , rom block = "ROM CELLXYZ01" *)
    output reg [3:0] z ,
    input wire [2:0] a); // address- 8 deep memory

always@* begin // @(a)
    case (a)
    3'b000: z = 4'b1011;
    3'b001: z = 4'b0001;
    3'b110: z = 4'b0010;
    3'b111: z = 4'b1110;
    default : z = 4'b0000 ;
    endcase
end
```

```
endmodule // rom case
```

The code above would generate ROM with 8 addresses, each address is 4 bits.

You might notice the synthesis suggestion keyword (\* synthesis, rom\_block = "ROM\_CELLXYZ01" \*). This tells the synthesis tools to try to use the dedicated ROM inside the FPGA instead of implementing it as a block of registers. The keywords may differ from one FPGA vendor from another. Note that most synthesis tools nowadays are smart enough to detect the access pattern that you can remove that out.

Second, the ROM in is actually asynchronous. You could make it a synchronous ROM by adding a clock, i.e.

```
module rom case(
      (* synthesis , rom block = "ROM CELLXYZ01" *) input clk ,
      output reg [3:0] z ,
      input wire [2:0] a); // address- 8 deep memory
always@(posedge clk)
      begin
      case (a)
      3'b000: z = 4'b1011;
      3'b001: z = 4'b0001;
      3'b100: z = 4'b0011;
      3'b110: z = 4'b0010;
      3'b111: z = 4'b1110;
      default : z = 4'b00000;
      endcase
end
endmodule // rom case
```

Having the data inside your program is extremely inconvenient especially if you have a large file set of data. Verilog allows you to "read" data from a file.

```
// Verilog-2001 style
// ROM module using two dimensional arrays with
// memory defined in text file with $readmemb or $readmemh
// NOTE: This style can lead to simulation/synthesis mismatch
         if the content of data file changes after synthesis
module rom 2dimarray initial readmem (
 output wire [3:0] z,
 input wire [2:0] a);
  // declares a memory rom of 8 4-bit registers.
  //The indices are 0 to 7
  (* synthesis, rom block = "ROM CELL XYZ01" *)
         [3:0] rom[0:7];
  // NOTE: To infer combinational logic instead of a ROM, use
  // (* synthesis, logic_block *)
 initial \$readmemb("rom.data", rom);
 assign z = rom[a];
endmodule
```

#### The rom.data would look like this

```
1011 // addr=0

1000 // addr=1

0000 // addr=2

1000 // addr=3

0010 // addr=4

0101 // addr=5

1111 // addr=6

1001 // addr=7
```

#### **RAM**

Another useful primitive for the FPGA is random-access-memory (RAM). Again, as in the ROM case, we could have implemented RAM as a set of registers, but it's expensive and costly to do so. Typical FPGAs have dedicated areas for RAMs, (BlockRAM for Xilinx, Memory Block for Altera, etc.) The following code will generate RAM with 128x8 bits.

```
module SinglePortRAM (
inout wire [7:0] d, // Data In and Out
input wire [6:0] addr , // Address
input wire oe , // Output Enable
input wire clk , we) ;
(* synthesis , ram block *)

reg [7:0] mem [127:0];

always @(posedge clk)
if(we)
    mem[addr] <= d;

assign d = oe ? mem[addr] : 8'bZ;
endmodule</pre>
```

You may see the inout port in the example. The idea is that the port can be used as both input and output (at different times). To read, you have to assign Z to the wire before reading the data. To write, just connect the register to the wire. This line "assign d = oe? mem[addr]: 8'bZ; " explains such a connection.

#### Block RAM

As you can see, we can write a HDL code to generate registers, and we can potentially implement a memory using it. However, a FPGA has a small number of these CLB, and it is a bit overkill since these logic can do much greater things than being just memory. So, most FPGA vendors have specialized memory units that we can use on these FPGAs. Each vendor has a different name, but for Xilinx, we call it Block RAMs. Typically, each bRAMs has a size of 10-20Kbit depending on the FPGA, and each FPGA may have from ten to thousands of these bRAMs.

There are several ways to initiate bRAMs on Xilinx, but in general Xilinx Synthesis step can recognize if you are going to generate a bRAMs from a following pattern

.

```
parameter RAM WIDTH = <ram width >;
parameter RAM ADDR BITS = <ram addr bits >;

reg [RAMWIDTH-1:0] <ram name> [(2**RAMADDRBITS)- 1:0]; reg [RAM WIDTH-1:0] <output data >;

<reg or wire> [RAMADDRBITS-1:0] <address>;
<reg or wire> [RAMWIDTH-1:0] <input data >; always @(posedge <clock>)

if (<ram enable>) begin

if (<write enable>) begin

<ram name>[<address >] <= <input data >; <output data > <= <input data >;
end else
<output data> <= <ram name>[<address >]; end
```

Note that bRAMs is not the only type of construct that the Synthesizer can recognize. There are many other types of construct. Some of these are even more complicated. So most FPGA vendors has a so called language template. For Xilinx, you can find these out in the menu by going to Tools > Language Templates. For memory, it is in Verilog > Synthesis Constructs > Coding Examples > RAM > BlockRAM. We recommend you explore these constructs.

Most FPGAs have what is called Distributed RAM. This essentially uses the logic gate and registers to implement the memory. Usually, Distributed RAM is faster than bRAMs, but it is smaller.

Note that there is also another method for instantiating the bRAMs. This is through the use of Block Design. We recommend you take a look at this document https://www.xilinx.com/support/documentation/university/Vivado-Teaching/Digital-Design/2014x/docs-pdf/Vivado\_tutorial.pdf for more information about how to use IP Integrator and Block Design.

#### **Exercises**

1. Design and build a circuit to work as a Stack (LIFO). The user can use two push buttons in order to PUSH (BTNU) or POP (BTNC). Use 8 switches on the board as a value. When a user hits a PUSH button, it will store the value from the switches to the stack. When the user hits the POP switch, it will display the value from the top of the stack in the two hex displays on the left. The other hex displays are used to display the number of elements currently in the stack. The stack can keep up to 256 elements. If the stack is full, hitting the PUSH button should not do anything.

We recommend using a PUSH button as a reset (BTND).

2. Read an input from 5 binary switches. (You are welcome to use any switch. If you have no preference, use SW[4..0].) This should give you a number ranging from 0 to 31. Use distributed memory or bRAMs as the ROM for converting binary to 2 BCD for displaying on a seven-segment display. Use 5 bit binary as an address. The output can be either 2x4 BCD for applying to BCDtoSevenSegment (as used in the previous lab) or 2x8 for feeding directly to a seven-segment display.

Note: You may want to initialize the memory from data. Please see the language template for more information

- 3. Use Block Design to create a simple calculator 4-bit calculator. You will assign 4 switches as the 4-bits input for A and another 4 switches for B. You will assign 4 push buttons to do 4 different operations.
  - a. When BTNU is pushed, you will display the result of A+B in base 10 using the three 7-segment displays.
  - b. When BTNL is pushed, you will display the result of A-B in base 10 using the three 7-segment displays.
  - c. When BTND is pushed, you will display the result of A\*B in base 10 using the three 7-segment displays.
  - d. When BTNR is pushed, you will display the result of A/B in base 10 using the three 7-segment displays.

- 4. Please answer the following questions and submit (in PDF format) to MyCourseVill on Friday before 23:59 (midnight).
  - a. Explain your ROM for mapping 5-bit binary to 2-digit BCDs (or 2x8 bits seven segment displays depending on your design in Exercise.2).