VLSI Physical Design for ASIC's

Installation of RISC-V Toolchain

https://github.com/kunalg123/riscv_workshop_collaterals/blob/master/run.sh

Execute the commands in run.sh
See that the gcc version on your system is of version 12 If not an error of this kind be found:

sudo apt upgrade
sudo apt install build-essential
sudo apt -y install gcc-12 g++-12
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 12
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-12 12
sudo update-alternatives --config gcc
sudo update-alternatives --config g++
gcc --version; g++ --version

The above commands will update your gcc to version 12. To check for successful installation run the below command and the output will be shown as depicted below

riscv64-unknown-elf-gcc --version

Day 1-Introduction to RISC-V ISA and GNU Compiler Chain

Instruction Set Architecture

Instruction set architecture or computer architecture is an abstract model of the computer that defines how the CPU is controlled by the software. It acts as an interface between languages like C, C++, Java, and the hardware. The type of instructions depends on the type of hardware.

From Apps to Hardware

Application software ---> System software ---> Hardware

System Software converts application software into binary language
It has three major parts:
- Operating system
- Compiler
- Assembler
The operating system acts on small functions present in C, C++, Java, or any other language codes and gives it to the Compiler which in turn generates the .exe file which has all the Instructions. The .exe file is fed into the assembler, which generates the Machine Language code through which hardware can be implemented

Type of Instructions

Pseudo Instructions
Base Integer Instructions(RV64I)
Multiply Extension(RV64M)
Single and Double precision floating point Extension(RV64F and RV64D)

Application Binary Interface

These are the keywords through which programmers can access the registers of RISC-V. They are basically the System functions associated with the RISC-V registers

Labwork for RISC-V Toolchain

Write a program to calculate the sum of numbers from 1 to n

#include <stdio.h>
int main(){
  int i,sum=0,n=10;
  for(i=1;i<=n;i++){
    sum=sum+i;
  }
printf("Sum of numbers from 1 to %d is %d",n,sum);
}

To execute the above type in the following commands

gcc sum.c
./a.out

To display the code present in the .c file

cat sum.c

To compile the C language code using RISC-V Compiler
1. O1 optimization

riscv64-unknown-elf-gcc -O1 -mabi=lb64 -march=rv64i -o sum.o sum.c

Ofast optimization

riscv64-unknown-elf-gcc -Ofast -mabi=lb64 -march=rv64i -o sum.o sum.c

If there is an error found as above, use the following commands and then re-run the compilation command

vim ~/.bashrc
export PATH=~/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/bin:$PATH
export PATH=~/riscv_toolchain/riscv64-unknown-elf-gcc-8.3.0-2019.08.0-x86_64-linux-ubuntu14/riscv64-unknown-elf/bin:$PATH

To view the assembly-level code for the C program, which is compiled using RISC-V

riscv64-unknown-elf-objdump -d sum.o
riscv64-unknown-elf-objdump -d sum.o | less

Output using O1 Optimization
Output using Ofast optimization

Spike stimulation and debugging

spike pk sum.o
spike -d pk sum.o

The above command is used for debugging

Click on ENTER to show the first line and ENTER to show successive lines
Click on q to quit the debug process

Integer Number Representation

- Unsigned numbers: are just like integers but they don't have a + or - sign associated with them.
Range: [0, (2^n)-1 ]
- Signed numbers: These are a set of both positive and negative numbers Range : [0, 2^(n-1)-1] to [-1 to 2^(n-1)]
To represent negative numbers in binary 2's complement methodology is used.

Lab

Write a C program that shows the maximum and minimum values of "n" bit unsigned numbers Considering n=64 here

#include <stdio.h>
#include <math.h>
int main(){
  int n=64;
	unsigned long long int max = (unsigned long long int) (pow(2,n) -1);
	unsigned long long int min = (unsigned long long int) (pow(2,n) *(-1));
	printf("Minimum value is %llu\n",min);
	printf("Maximum value is %llu\n",max);
	return 0;
}

Write a C program that shows the maximum and minimum values of "n" bit signed numbers

#include <stdio.h>
#include <math.h>

int main(){
	int n=64;
	long long int max = (long long int) (pow(2,n-1) -1);
	long long int min = (long long int) (pow(2,n-1) *(-1));
	printf("Minimum value is %lld\n",min);
	printf("Max value is %lld\n",max);
	return 0;
}

Day 2 - Introduction to ABI and Basic Verification Flow

Application Binary Interface

- An Application Binary Interface is the interface between two binary program module programs allowing them to work together. It defines the interface between two software components or systems that are written in different programming languages, compiled by different compilers, or running on different hardware architectures.
- ABI defines how your code is stored inside the library file so that any program using your library can locate the desired function and execute it.

Double Words- Memory allocation

Architecture can also be divided into two types based on the process of loading memory. Memory can be loaded in two ways
1. Little Endian: Here, the least significant byte is at the lowest memory address, and the most significant byte is at the highest memory address.
2. Big Endian: Here, the most significant byte is at the lowest memory address, and the least significant byte is at the highest memory address.

Load, Add, and Store Instructions

Load Instruction Considering the instruction ld x8,16(x23)
- Here ld represents the loading of double-word
- x8 is the destination register
- x23 is the source register which has the base address
- 16 is the offset which is added to the base address
- The base address and the offset are added to generate the Physical Address
- The content of the physical address is accessed and now loaded to the destination register i.e. x8 in here
Add Instruction Instruction: add x8,x24,x8
- Here add represents a normal adding arithmetic operation
- x8 is the destination register
- x24 is the source register 1
- x8 is the source register 2

32 Registers and their general ABI Names

Through the ABI names, we reserve some of these registers for certain purposes

Labwork

Using ABI Function calls (re-writing C program using ASM language) C program- .c file

#include <stdio.h>

extern int load(int x, int y);

int main()
{
  int result = 0;
  int count = 9;
  result = load(0x0, count+1);
  printf("Sum of numbers from 1 to 9 is %d\n", result);
}

Assembly file - .s file

.section .text
.global load
.type load, @function

load:

add a4, a0, zero
add a2, a0, a1
add a3, a0, zero

loop:

add a4, a3, a4
addi a3, a3, 1
blt a3, a2, loop
add a0, a4, zero
ret

Compile the above using

riscv64-unknown-elf-gcc -Ofat -mabi=lp64 -march=rv64i -o custom1to9.o custom1to9.c load.S

To get the assembly-level code

riscv64-unknown-elf-objdump -d custom1to9.o |less

RISC-V CPU (PICORV-32)

PicoRV-32 is a size-optimized RISC-V CPU Core that implements the RISC-V RV32IMC Instruction Set.

RTL Design using SKY130 Technology

Day 1 - Iverilog Design and Testbench

Introduction

- RTL Design is checked for adherence to the spec by simulating the design - Simulator (Iverlog in here) is a tool used for checking the design ( set of Verilog codes in here) - Working of Simulator: The Simulator looks for changes in the input signal and evaluates the output. If the input values are changed, only then they are reflected in the changes in output values

Testbench

Testbench is an environment used to verify the correctness or soundness of a design or model.
TestBench does not have any primary inputs or outputs

Iverilog Based Simulation flow

vcd file: A Value Change Dump file stores all the information about value changes in the simulator
GTKwave: It is a software, used as a simulation tool to verify the Verilog design code through a testbench.
```
 sudo apt install gtkwave
```

Labs using Iverilog and GTKwave

mkdir VLSI
cd VLSI
git clone https://github.com/kunalg123/sky130RTLDesignAndSynthesisWorkshop.git

The library files are stored in my_lib
All the Verilog models of the standard cells are present in verilog_model
verilog_files has all the source files and testbench files of the required standard cells ( has the design files)
for every file for example good_mux.v file there is a **tb_**good_mux.v file. We can see a one-to-one mapping between the Verilog Design file and it's testbench file
Load both the design source file and testbench file into the verilog simulator (iverilog in here) iverilog good_mux.v tb_good_mux.v.
An a.out file is created.
On executing this file ./a.out an VCD file is dumped out of the simulator
Loading the file into GTKwave using the command gtkwave tb_good_mux.vcd

- For looking into the file structure `gvim tb_good_mux.v -o good_mux.v`

Logic Synthesis

RTL Design: Behavioural representation of the required specification
.lib: Collection of logical modules
Synthesizer: Tool used for converting RTL to netlist
Netlist: Representation of design in the form of standard cells present in .lib

**Need of different flavors of gates** - Combinational logic (Propagation Delay) determines the maximum speed of operation of the digital logic circuit - T_clock > T_pd + T_cq + T_setup - To achieve maximum clock frequency, for better performance the delays should be as minimum as possible. This would mean that only faster cells are sufficient - But to ensure that there are no hold delay issues, gates are required to work slowly, creating a contractionary requirement - Therefore, for better performance fast cells are used while to avoid hold-time delays slow cells are used.
**Fast Cells v/s Slow Cells** - Fast Cells - Fast cells use wider transistors to enable higher current carrying capacity. This allows for quicker charging and discharging of capacitive loads, resulting in faster signal transitions. - Wider transistors generally consume more power compared to narrower ones due to the increased current flow and larger gate capacitance. - While faster cells offer improved performance, they might have larger silicon area requirements due to the increased number of transistors. Additionally, they might be more susceptible to issues like noise and power consumption.

Slow Cells
- Slow cells use narrower transistors to reduce power consumption and minimize power dissipation.
- Narrower transistors consume less power due to their lower current carrying capacity and reduced gate capacitance.
- While slower cells consume less power, they might operate at lower clock frequencies and have longer signal propagation delays. This can impact their ability to process data quickly.
The choice between faster and slower cells depends on the specific requirements of the digital logic circuit's application. Designers often need to strike a balance between performance, power consumption, and area constraints.

Yosys

Yosys is a framework for Verilog RTL Synthesis.

Installation of Yosys

git clone https://github.com/YosysHQ/yosys.git
cd yosys
sudo apt install make
sudo apt-get update
sudo apt-get install build-essential clang bison flex  libreadline-dev gawk tcl-dev libffi-dev git  graphviz xdot pkg-config python3 libboost-system-dev libboost-python-dev libboost-filesystem-dev zlib1g-dev
make config-gcc
make
sudo make install

To invoke Yosys

cd VLSI/sky130RTLDesignAndSynthesisWorkshop/verilog_files
yosys

To read the library read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
To read the design file read_verilog good_mux.v
For synthesizing the module synth -top good_mux
For realizing the logic in the verilog file abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
- Number of input signals, output signals and internal signals can be known through above
To get the graphical version of the realized logic show -The mux is completely realised in the form of sky130 library cells.
To write netlist

write_verilog good_mux_netlist.v
!gvim good_mux_netlist.v

To get a simplified version

write_verilog -noattr good_mux_netlist.v
!gvim good_mux_netlist.v

Day 2 - Timing libs, Hierarchial and flat synthesis, and efficient flop coding styles

Timing Dot libs

.lib files
- To view the contents of .lib file gvim ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
- .lib files are used in digital circuit design to provide detailed information about the timing, power, and other characteristics of standard cells.
- In the first line i.e. library("sky130_fd_sc_hd__tt_025C_1v80")
  - Libraries can be slow, fast, or typical. Here tt stands for typical. The term typical (abbreviated as "tt") refers to the standard or average performance characteristics of a component or circuit under normal operating conditions.
  - 025C refers to the temperature at which the library's characteristics are specified.
  - 1v80 is a representation of the supply voltage in volts. This voltage level serves as a reference point for understanding the circuit's behavior and performance under that specific operating voltage.
  - sc represents standard cells signifies that the library contains standard cell information and characteristics for use in circuit design.

Hierarchy v/s Flat Synthesis

For synthesizing the module we used the command, synth -top good_mux. Now to know what type of synthesis is taking place mutiple_modules.v module is used.

There are two sub-modules
1. AND Gate
2. OR Gate
The module multiple_modules, is instantiated sub-module 1 and 2
As per the module, the gate-level logic would be as below
But after synthesis

yosys
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.v
read_verilog mutiple_modules.v
synth -top multiple_modules
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show multiple_modules

This above synthesized is of hierarchical form
To get the netlist

write_verilog -noattr multiple_modules_hier.v
!gvim multiple_modules_hier.v

A NAND Implementation is seen, here.

Stacked PMOS Circuits
- Stacked PMOS NOR requires multiple transistors to be stacked vertically, which leads to a more complex manufacturing process. This complexity can result in lower yields and higher manufacturing costs.
- The stacked PMOS architecture tends to occupy more space compared to other memory cell configurations. This larger cell size translates to a lower storage density
- Due to its larger cell size, stacked PMOS NOR flash has a lower bit density, meaning you can store fewer bits in the same area compared to other architectures like NAND
- PMOS transistors are constructed using a p-type semiconductor for the channel region, and their carrier mobility tends to be lower than that of NMOS transistors, which use an n-type semiconductor for the channel. Due to the lower carrier mobility of PMOS transistors compared to NMOS transistors, stacked PMOS NOR flash memory cells might experience slower switching speeds, contributing to slower overall memory performance and longer access times.

write_verilog -noattr multiple_modules_flat.v
!gvim multiple_modules_flat.v

Directly the AND and OR Gate are instantiated.

Hierarchial Synthesis

In hierarchical synthesis, the design is organized into a hierarchy of modules, with each module representing a functional block or sub-component. Each module is synthesized independently, and then these synthesized modules are connected together to form the complete design.

Advantages
- Encourages modular design, making it easier to manage and maintain complex designs.
- Supports the reuse of modules, as synthesized blocks can be used in multiple designs.
- Enables concurrent development and optimization of different modules.
- Can help manage complexity and reduce the size of intermediate files.
Disadvantages
- Introduces the challenge of correctly integrating modules and ensuring proper connectivity.
- Some high-level optimizations might be more challenging due to module-level synthesis.

Flat Synthesis

In flat synthesis, the entire design is treated as a single, monolithic unit. This means that the entire design hierarchy, including all sub-modules, is flattened into a single-level representation. All optimizations, logic synthesis, and technology mapping are performed on this single-level design.

Advantages
- Simplifies the synthesis process, as the entire design is treated as a single unit.
- Can lead to high-level optimizations across the entire design.
Disadvantages
- Can result in large intermediate files and complex optimization problems.
- Limited ability to reuse common logic structures across different parts of the design.
- Can lead to inefficient use of resources if the design is very large and complex.

In practice, a combination of both flat and hierarchical synthesis is often used. Hierarchical synthesis is employed for managing the complexity of large designs, and then certain modules might be synthesized flat to achieve specific optimizations.

Flop Coding Styles

A flip-flop is a bistable multivibrator circuit element that can store one bit of data. It has two stable states and can be used to represent binary information.

Glitches

Glitches are unwanted and unpredictable transitions in digital circuits that can occur due to variations in signal propagation delays.

Reasons for Glitches

Different gates have different propagation delays, and these delays can lead to temporary imbalances in signal timing. If inputs to different gates change at slightly different times, it can result in momentary glitches in the output.
Signals may take different path lengths to reach different gates. Longer paths can introduce larger propagation delays, potentially causing timing mismatches and glitches.
Race conditions occur when two or more signals arrive at a gate at nearly the same time, and the output of the gate depends on which signal arrives first. This can lead to unpredictable temporary output values before the circuit settles into a stable state.

Requirement of flops

Flip-flops are used in sequential circuits to store data and create a controlled timing mechanism. They can help eliminate glitches that may occur in combinational circuits

Asynchronous Reset D flip-flop

The asynchronous reset feature allows the user to reset the flip-flop's state to a specific value, irrespective of the clock signal
When the reset input is not active i.e. 0, the flip-flop operates as a standard D flip-flop, capturing the value at the D input on the rising edge of the clock.
When the reset input is active i.e. 1, the flip-flop's output is forced to 0 regardless of the clock or D input.

!gvim dff_asyncres.v

Simulation

cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
iverilog dff_asyncres.v tb_dff_asyncres.v
./a.out
gtkwave tb_dff_asyncres.v

Synthesis

cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
yosys
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog dff_asyncres.v
synth -top dff_asyncres
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show dff_asyncres

Asynchronous Set D flip-flop

When the set is high, the output of the flip-flop is forced to 1, irrespective of the clock signal.
When the set is low, the flip-flop operates as a standard D flip-flop, capturing the value at the D input on the rising edge of the clock !gvim dff_async_set.v

Simulation

cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
iverilog dff_asyncres_set.v tb_dff_asyncres_set.v
./a.out
gtkwave tb_dff_asyncres_set.v

Synthesis

cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
yosys
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog dff_asyncres_set.v
synth -top dff_asyncres_set
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show dff_asyncres_set

Synchronous Reset D flip-flop

A synchronous reset D flip-flop is a type of flip-flop that includes a reset input that is synchronized with the clock signal. This means that the reset input will only take effect on a specific clock edge, typically the rising or falling edge of the clock.
During normal operation, when the reset input is not asserted, the flip-flop operates like a standard D flip-flop
When the reset input is asserted (active), the flip-flop's output is forced to 0 !gvim dff_syncres.v

Simulation

cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
iverilog dff_asyncres_set.v tb_dff_syncres.v
./a.out
gtkwave tb_dff_syncres.v

Synthesis

cd vsd/sky130RTLDesignAndSynthesisWorkshop/verilog_files
yosys
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog dff_syncres.v
synth -top dff_syncres
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show dff_syncres

Optimizations

1.

gvim mult_2.v

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog mult_2.v
synth -top mult2

abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

write_verilog -noattr mul2_netlist.v
!gvim mul2_netlist.v

gvim mult_8.v

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog mult_2.v
synth -top mult8

abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

Day 3 - Combinational and Sequential Optimizations

Combinational Optimization

Combinational optimization deals with finding the best solution from a finite set of possible solutions.
It focuses on finding the best possible solution from a finite set of options for problems that involve discrete variables and have no inherent notion of time.
Two methods of computational optimization are
1. Constant Propagation is a method of optimization that involves identifying and replacing variables with their constant values if they can be determined at compile-time. This optimization helps reduce the execution time of programs by avoiding redundant computations and simplifying expressions.
2. Boolean logic optimization is a process of simplifying and improving logical expressions in Boolean algebra. It aims to simplify Boolean expressions or logic circuits by reducing the number of terms, literals, and gates required to implement a given logical function.

opt_check

!gvim opt_check.v

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog opt_check.v
synth -top opt_check
opt_clean -purge
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

opt_check2

!gvim opt_check2.v

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog opt_check2.v
synth -top opt_check2
opt_clean -purge
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

opt_check3

!gvim opt_check3.v

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog opt_check3.v
synth -top opt_check3
opt_clean -purge
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

opt_check4

!gvim opt_check4.v

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog opt_check4.v
synth -top opt_check4
opt_clean -purge
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

multiple_module_opt

!gvim multiple_module_opt.v

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog multiple_module_opt.v
synth -top multiple_module_opt
flatten
opt_clean -purge
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

Sequential Logic Optimization

Sequential logic optimization is the process of enhancing digital circuits that incorporate memory elements and time-dependent behavior, with the aim of improving performance, efficiency, and other key characteristics
Sequential logic optimization directly impacts the performance and reliability of digital circuits and systems.
Methods of computational optimization are
1. Sequential constant propagation is a process used in computer programming and software optimization to identify and replace variables with their constant values in a sequential or step-by-step manner. This technique aims to replace variable values with their known constant values at various stages of the logic circuit, optimizing the design for better performance and resource utilization.
2. State optimization is an optimization technique used in digital design to reduce the number of states in finite state machines (FSMs) while preserving the original functionality.
3. Sequential Logic Cloning replicates portions of sequential logic to alleviate bottlenecks and improve circuit throughput.
4. Retiming, Adjusts the placement of flip-flops within a circuit to optimize timing, balance critical paths, and enhance overall performance

dff_const1

!gvim dff_const1.v

Simulation

iverilog dff_const1.v tb_dff_const1.v
./a.out
gtkwave tb_dff_const1.vcd

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog dff_const1.v
synth -top dff_const1
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

dff_const2

!gvim dff_const2.v

Simulation

iverilog dff_const2.v tb_dff_const2.v
./a.out
gtkwave tb_dff_const2.vcd

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog dff_const2.v
synth -top dff_const2
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

dff_const3

!gvim dff_const3.v

Simulation

iverilog dff_const3.v tb_dff_const3.v
./a.out
gtkwave tb_dff_const3.vcd

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog dff_const3.v
synth -top dff_const3
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

dff_const4

!gvim dff_const4.v

Simulation

iverilog dff_const4.v tb_dff_const4.v
./a.out
gtkwave tb_dff_const4.vcd

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog dff_const4.v
synth -top dff_const4
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

dff_const5

!gvim dff_const5.v

Simulation

iverilog dff_const4.v tb_dff_const4.v
./a.out
gtkwave tb_dff_const4.vcd

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog dff_const5.v
synth -top dff_const5
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

Sequential optimisations for unused outputs

counter_opt

!gvim counter_opt.v

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog counter_opt.v
synth -top counter_opt
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

counter_opt2

!gvim counter_opt2.v

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib  
read_verilog counter_opt2.v
synth -top counter_opt2
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show

Day 4 - GLS, blocking vs non-blocking, and Simulation-Synthesis mismatch

GLS Concepts and Flow using Iverilog

Gate-level Simulation

Gate-level simulation is a method used in electronics design to test and verify digital circuits at the level of individual logic gates and flip-flops.
It involves simulating the circuit using the actual logic gates and flip-flops that make up the design, as opposed to higher-level abstractions like RTL (Register Transfer Level) descriptions.
Gate-level simulation is commonly used in designs where precise timing and functionality are critical.
It operates at a lower abstraction level than higher-level simulations and is essential for debugging and ensuring circuit correctness.

GLS using iVerilog

Write RTL code.
Synthesize to generate gate-level netlist.
Create a testbench in Verilog.
Compile both netlist and testbench.
Run the simulation with compiled files. Debug and iterate as needed.
Perform timing analysis if necessary.
Generate test vectors for manufacturing tests.

Synthesis-Simulation Mismatch

Synthesis-simulation mismatch is when there are differences between how a digital circuit behaves in simulation at the RTL level and how it behaves after gate-level synthesis.
This discrepancy can occur due to various reasons, such as timing issues, optimization conflicts, and differences in modeling between the simulation and synthesis tools.
To address it, ensure consistent tool versions, check synthesis settings, debug with simulation tools, and follow best practices in RTL coding and design.
Resolving these mismatches is crucial for reliable hardware implementation.

Blocking and Non-blocking statements

Blocking Statements
- Blocking statements are executed sequentially in the order they appear in the code and have an immediate effect on signal assignments.
- They are called "blocking" because they block the execution of subsequent statements until they are completed. Blocking statements are typically used within procedural blocks, such as always or initial blocks, to describe sequential behavior.
- Blocking assignments are typically used to describe combinational logic, where the order of execution doesn't matter, and each assignment depends on the previous one.

always @(posedge clk) begin
    // Blocking assignments
    a = b; 
    c = a + 1; 
end

Non-blocking statements
- Non-blocking statements allow concurrent execution within a procedural block or always block, making them suitable for describing synchronous digital circuits.
- Non-blocking assignments are typically used to model sequential logic, like flip-flops and registers, where parallel execution is required.

reg [2:0] state, next_state;
always @(posedge clk or posedge reset) begin
    if (reset) begin
        state <= 3'b000;
    end else begin
        // Non-blocking assignment to update the next state
        next_state <= state + 1;
    end
end

Caveats With Blocking Statements

Blocking statements in Verilog are essential for modeling sequential logic and specifying the order of operations within procedural blocks.
- Race Conditions: When multiple blocking assignments are used within the same always block or initial block, there is a potential for race conditions. Race conditions occur when the final value of a signal depends on the execution order of assignments. To avoid race conditions, use non-blocking assignments or ensure that assignments do not depend on the order of execution.
- Combinational Loops: Using blocking assignments to create combinational feedback loops (combinational loops with no flip-flops) can lead to undefined behavior and simulation issues.
- Debugging Challenges: Debugging code with many blocking assignments can be challenging, especially when trying to track down timing-related issues.
- Limited for Testbenches: In testbench code, excessive use of blocking statements can lead to simulation race conditions that don't reflect real-world hardware behavior.
- Order Dependency: The order of blocking statements can affect simulation results, leading to race conditions or unintended behavior.
- Lack of Parallelism: Blocking statements do not accurately represent the parallel nature of hardware. In hardware, multiple signals can update concurrently, but blocking statements model sequential behavior. As a result, using blocking statements for modeling complex concurrent logic can lead to incorrect simulations.

Labwork

ternary_operator_mux

gvim teranry_operator_mux.v

Simulation

iverilog ternary_operator_mux.v tb_ternary_operator_mux.v
./a.out
gtkwave tb_ternary_operator_mux.vcd

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog ternary_operator_mux.v
synth -top ternary_operator_mux
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
write_verilog -noattr ternary_operator_mux_netlist.v
show

GLS to Gate level simulation

iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/sky130_fd_sc_hd.v ternary_operator_mux_netlist.v tb_ternary_operator_mux.v
./a.out
gtkwave tb_ternary_operator_mux.vcd

bad_mux.v

!gvim bad_mux.v

Simulation

iverilog bad_mux.v tb_bad_mux.v
./a.out
gtkwave tb_bad_mux.vcd

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog bad_mux.v
synth -top bad_mux
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
write_verilog -noattr bad_mux_netlist.v
show

GLS to Gate level simulation

iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/sky130_fd_sc_hd.v bad_mux_netlist.v tb_bad_mux.v
./a.out
gtkwave tb_bad_mux.vcd

blocking_caveat.v

!gvim blocking_caveat.v

Simulation

iverilog blocking_caveat.v tb_blocking_caveat.v
./a.out
gtkwave tb_blocking_caveat.vcd

Synthesis

read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog blocking_caveat.v
synth -top blocking_caveat
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
write_verilog -noattr blocking_caveat_netlist.v
show

GLS to Gate level simulation

iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/sky130_fd_sc_hd.v blocking_caveat_netlist.v tb_blocking_caveat.v
./a.out
gtkwave tb_blocking_caveat.vcd

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
README.md		README.md

ani171/pes_asic_class

Folders and files

Latest commit

History

Repository files navigation

VLSI Physical Design for ASIC's

Day 1-Introduction to RISC-V ISA and GNU Compiler Chain

Type of Instructions

Application Binary Interface

Spike stimulation and debugging

Day 2 - Introduction to ABI and Basic Verification Flow

RTL Design using SKY130 Technology

Day 1 - Iverilog Design and Testbench

Day 2 - Timing libs, Hierarchial and flat synthesis, and efficient flop coding styles

Hierarchial Synthesis

Flat Synthesis

Glitches

Requirement of flops

Asynchronous Reset D flip-flop

Asynchronous Set D flip-flop

Synchronous Reset D flip-flop

Day 3 - Combinational and Sequential Optimizations

opt_check

opt_check2

opt_check3

opt_check4

multiple_module_opt

dff_const1

dff_const2

dff_const3

dff_const4

dff_const5

counter_opt

counter_opt2

Day 4 - GLS, blocking vs non-blocking, and Simulation-Synthesis mismatch

Gate-level Simulation

GLS using iVerilog

Caveats With Blocking Statements

ternary_operator_mux

bad_mux.v

blocking_caveat.v

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages