1.Inception of open-source EDA,Open LANE and Sky130 PDK
2.Good FloorPlan vs BadFloorPlan and introduction to library cells
- Chip Floor Planning Considerations
- Library Binding and Placement
- Cell Design and Characterizaton Flows
- General Timing Charaterization Parameters
3.Design Library Cell Using Magic Layout and ngSpice characterization
- Labs for CMOS inverter Ngspice simulations
- Inception of Layout CMOS Fabrication Process
- Sky130 Tech File Labs
4.Prelayout timing analysis and importance of good clock tree
- Timing Modeling using delay tables
- Timing analysis with ideal clocks using open STA
- Clock tree synthesis TritonCTS and signal integrity
- Timing analysis with real clocks using open STA
5.Final Steps From RTL2GDS using tritonRoute and open STA
QFN-48(Quad Flat No-Leads) Package: It consists of 48 pins with 7mm * 7mm size with the pins cnnected to the chip placed inside the package. Pads-Send the signals inside and outside the chip through pads. Chip- It is the place where all the digital Logic is fabricated with a combination of various gates. 1.Foundary IPs(Intellectual Property)- Some amount intelligence techniques are used to build them 2.Macros-A Pure Digita Logic
-
RISC-V Architecture When an application or C-program is to be run on a Layout, it a first complied in its Assembly Language Program that is a RISC-V Assembly Language and then converted to Machine Language Program in terms of zeros and ones which is then run on a Layout. This RISC-V specification is implemented using a RTL Code for eg: picorv32 cpu core.
The instruction set produced by the compiler is the Instruction Set Architecture or Architecture of the computer. In this case it is RISC-V Architecture.HDL acts as an interface between RISC-V Architecture and hardware.
Synthesis: Converts RTL design to a circuit out of the components from standard cell library(SCL).Standard cells have regular layout. each cell has different views/models. For eg: Liberty View-This include power and delay models
Floor and Power Planning: Plan the silicon area and robust power distribution to power the circuits. Chip Floor Planning: Partition the chip die between different system building blocks and place the I\O Pads. Macro-Floor Planning:Dimensions, pin-locations,rows and routing tracks are defined. Power Planning: Power network is constucted where chip is power by multiple Vdd and Groung Pins.These pins ar connected to components through vertical and horizontal metal straps.These use upper metal layers which are wider and have lower resistance
Placement: Place the cells on the floor plan rows, aligned with the sites.Closser cells should be placed adjacent to reduce interconnect delay 2 Steps: Global Placement- Find optimal posiiton for all cellsand these are not legal so cells may overlap. Detailed Placement-The detials obtained from Global Placement are minimally altered
Clock Distributon Network: -To deliver clock to all sequential circuis (eg;FFs) -With minimum skew-Can be ensured by H-Tree or X-tree
Routing:Implement interconnect using metal layers.This requires to enquire a valid horizontal and vertical patterns to implement the nets. PDK defies the thickess, pitch, width of the metal layer and also the vias required to connect different metal layers.
A Divide and Conquer Approach is used -Global Routing:Generated the routing guides -Detailed Routing:Used the guide to implement the actual wiring
After Routing, Layout is confired through: Design Rule Checking(DRC)-Makes sure that the final Layout adheres to all Design Rules Layout Vs Schematic(LVS):Ensured that the final Layout matched the Gate-Level Netlist
Timing Verification(STA): To ensure all timing constraints are met.
It is an open source tool
Strive- Familyof open evrything SOCs Open EDA,Open PDK, Open RTL
Open Lane -Creates a clean GDSII flow with no human intervention that is, there ar no DRC and LVS violations. Can be used to harden MACROS and chips.
Synthesis Exploration-Gives us the analysis of area and delay of the design. Design Exploarion -Runs regression testing and comparision of different designs. Design For Test: + Scan Insertion + Autmatic Test Pattern generation(ATPG) + Test Patterns Compaction + Fault Coverage + Fault Simulation
Physical Implementation- Is done through Open ROAD Logic Equivalence Check -Done through yosys Dealing with Antenna Rules Violations-Metal wires can act as an antenna so their size should be limited. This is ensured by the router. -Bridging- Attaches higher metal layer to intermediary. -Add antenna diodes to leak away the charges provided by SCL
Magic is used for DRC Magic and Netgen ar used by LVS
Diretory Structure in
Invoke OpenLANE
interactive- Instead to running the completeflow at once it will run the process step by step
Input all the Packges that are required to run the flow:
After Design Preparation, we will run the Synthesis:
Yosys-Perform RTL Simulation abc-It performs technology mapping and the netlist is created. Open STA-Perform static timing analysis after synthesis.
This will execute both yosys and abc pass will be done
Characterize Synthesis Results: Number of D Flip-Flops=1613 Number of Cells=14876 Flop Count of the Design=1613/14876=0.1084 or 10.84 percent
Cell LEF- It holds information about the metal layer information, PNR boundary and pin positions. Technology LEF - Holds information DRC rules,vias and metal layers used in placement and routing
-
Utilization factor:
If for example,the following logic is taken:
If the area of the single unit is 1sq. unit,then if all the gatesare clubbed together then the total minumum area of the Netist will be 4sq. units. If the netlist occupies 100 percent area of the core then the Utilization Factor will be 100 percent.
If Utilization Factor is 1, then the core is completly occupied and any extra cells cannot be accomodated in the core area.Ideally ,utilization factor should 0.5 or 0.6.
2.Aspect Ratio: It is the ratio of height and width of the core.
If the aspect ratio is 1 , then chip is square-shaped otherwise it is of rectangle shape.
3.Define Location of Pre-Placed Cells: Pre-Placed Cells Logic is divided into blocks so that we can implement a Logic multiple times
4.DeCoupling Capacitors-Are attached in parallel of the gates so that timing vioalations can be avoided . They supply the current to the circuit when the design is decoupled and use the rest of the time to charge themselves to Vdd.
5.Power Planning- To avoid the voltage drop of supply voltage across the circuit the power lines are placed across the circuit in terms of vertical and horizontal lines.
6.Pin -Placement
- Labs
run_floorplan - command use to run the floorplan
Width of the die=660685/1000=660.685 micron Heigth of the Die=671405/1000=671.405 micron Area=Width*Height=443587.212 square microns
Review the FloorPlan in Magic
Changed the Placement of Pins to 2 :
Placement and Routing
Step-1 Bind the Netlist with Physical Cells Every gate is square and rectangular shaped and the informtion about their dimensions, delay and the required conditions are present in a library. Every gate will be present in different dimensions in the library.
Step-2- Place the Gates on the FloorPlan
- Gates are placed near to input -output pins.
- Buffers are added if the gates are far away from I/O pins which is charaterized by the timing information that is slew which depends on value of capacitance of the wire length .This is to ensure that the signal integrity is maintained.
- When the cells are abutted the delay is minimal
Labs for Placement:
Placement-Reduction of HPWL(Half Parameter Wire Length in Global Placement) and Overflow(If overflow value decreases the design will converge).
Standard Cells- They are defined in the libraries and they have different versions of same gate with different sizes and threshold voltage. Larger the threshold voltage more will be the time taken to switch. Same size gate can have different threshold voltage.
Cell Design Flow is divided into 3 parts:
- Input-Input needed to design the cell Foundary provides PDKs,DRC and LVS rules, SPICE models,library and user-defined specifications.
Some DRC Rules : Polywidth-2 Lambda Extensio Over ctive Area-3 Lambda Poly to Active Spacing-1 or 2 Lambda
User-Defined Specs Cell Height: It is decided by the separation of power and ground rails. Cell Width: It depends on drive strength. If cell has drive strength of 1 it will be difficult to drive short whereas if they have drive strength of 10 they can drive more longer wires. Supply Voltage- Top Level designer decides the supply voltage. Metal layers,Pin Locations and drawn gatelength
-
Design the cell Design of the cell requires 3 different steps:
1.Circuit Design-It is based on Spice Simulations. By knowing swithcing threshold value we can design our p-mos and n-mos gates and decide the value of W/L.
2.Layout Design-Element the logic in the form p-mos and n-mos and get their respective Network Graphs. Euler's Path-Path which is traced only once.Based on the Euler's Path a Stick Diagram is drawn. Convert the Stick Diagram to the Layout adhering to the DRC rules given by the foundary. GDSII is the typical Layout file LEF-Defines width and height of the cell Extractes Spice Netlist (.cir)- Contains resistances and capacitances of each and every element of layout Extract the parasitics of the Layout and characterize it in terms of timing.
3.Characteriztion Flow Steps for the Characterization 1.Read the spice model files 2.Read the extracted SpiceNetlist 3.Recognize the behavior of Buffer 4.Read the sub-circuits of the Logic 5.Attach the necessary Power Source 6.Timing, Noise and Power Characterization 7.Apply the stimulus 8.Necessary output capacitances 9.Provide necessary simuation command 10.Feed -in all the these configurations to the characterizatio software called GUNA which will generate timing, noise and Power models.
3.Output of the cell used by EDA tools We get outputs in the form of : CDL(Circuit Description language),GDSII,LEF(Library Exchange Format),Spice Extracted Netlist,timing,noise and power libs.
Timing Threshold Definitions
1.slew_low_rise_thr : Threshold of the waveform towards the lower side of the power supply rising from low to high .Typical values are from 20-30 percent.
2.slew_high_rise_thr:Threshold of the waveform towards the upper side of the power supply rising from low to high .Typical values are from 20-30 percent
3.slew_low_fall_thr: Lower threshold for falling waveform.
4.slew_high_fall_thr: Upper threshold for falling waveform.
These values are used to calculate slew of the waveform.
in_rise_thr,in_fall_thr:Related to the input waveforms.50 percent point are taken
out_rise_th,out_fall_thr:related to output waveform. 50 percent are taken
Propagation Delay- Take the difference of the 50 percent to calculate the delay. Negative delay can be observed if we shift the threshold values. And if the distance between 2 units is high it might result in higher slew resulting in negative delay.
Transition Time:
For the slew rising waveform substarct the low from high timing threshold. For the slew falling waveform substarct the high from low timing threshold.
VTC- Spice Simulations: 1. Create A Spice Deck- Connecticity information of the Netlist, input and ouput information. 2. Define W/L value of P-Mos and N-mos and Load capacitance, input gate voltage and Vss 3. Identify the nodes and name them.
Simulation Commands:
Model Files: When the W/L ratio of PMOS is greater the graph , the transfer characteristic shift to the right.
Switching Threshold: Point at which device switches
16-Mask CMOS Process
1.Selecting a Substrate:
p-type substarte- Has high resistivity (5-50 ohms), Doping Level(10^15 per cm3) and orientation(100)
2.Create an active-region for Transistors
Create 40nm layer of Sio2 and 80nm layer of Si3N4 and 1um layer of photoresist.
The areas covered by **mask-1** are protected and rest are exposed to the chemical reactions and extra photoresist is washed away.
Remove the mask.
The exposed areas which are not under the photoresist are exposed to etching. So, silicon nitride is etched off.
Remove the photoresist
Place the setup in the oxidation furnace and silicon nitride protects some areas for extra oxidation.This process is called as **LOCOS** (Local Oxidation of Silicon) and Bird's beak are formed.
This isolates the two transistors.
Now, silicon nitride is stripped out using hot phosphoric acid
3.Create N-Well and P-Well Formation:
A layer of photoresist is deposited.
Mask is placed above it and the setup is exposed to UV Light and area is washed away which is not exposed to UV Light.Next remove the mask and make P-Well
with Boron and diffused into substarte with ion Implantation at 200keV.
Similary a N-Well is formed with Phosphorus with ion-plantation.Energy required is high for Phosphorus implantation as they are heavier that is around 400keV.
Next step is to take the structure into driving furnace at high temperature about 1100 degrees centigrades about 4-6 hrs to drive-in the atoms to form clear wells.This process is called Twin-Tub process.
4.Formation of Gate
Threshold voltage depend on body effect coefficient which in turn depends on Doping conc. and oxide capacitance. These are controlled in fabrication steps to maintain a particular threshold voltage.
Mask-4- Is used and boron is implanted with low energy of 60keV just at the surface to attain required doping concentration with depends on threshold voltage. Similarly, Mask-5 is used As or P on the N-well with low energy just to maintain enough doping level.
The HF can be used to remove the oxide layer and a high quality oxide can re-grown of 10nm thick depending upon the threshold voltage.
A Polysilicon layer is deposited of about 0.4um thick and in order to make gate area of less resistance more impurities are doped into of any N-type impurity. Mask-6- It is used to draw the gates and reamining areas of Polysilicon can be etched away.
5.Lightly-Doped Drain(LDD) formation-
The doping profile for P-Mos id P+,P- and N and for N-Mos is N+,N- and P.This profile is used for 2 reasons:
Hot electron effect and ** Short Channel Effect** Hot electron effect- When the device size reduces and power supply is not redesigned and so the Electric Field (E=V/d) increases and as a result elecron attain large energy which break the Si-Si bonds leading to more e and holes disturbing the doping profile. And the energy may be too high which crosses electron barrier of 3.2 eV barrier between Si conduction band Sio2 band. Short Channel Effect-When we change gate length from 1um to 0.5 um, the drain penetrates into channel area and thus it becomes difficult to control the current by gate area.
**Mask-7**- Is used to implant the N- dopant into the P-well.
**Mask-8**- Is used to implant the P- dopant into the N-Well
Side-Wall Spacers- Are formed by depositing thick Silicon Nitride or Silicon oxide Layer and is etched out by Anisotropic Plasma Etching.These spacers prevent the N-or P- implants being disturbed by the P+ or N+ doping implants.
6. Source and Drain Formation-
Screen oxide is added to prevent channeling during implants.
Mask-9- The structure is exposed with As with 75keV and side-wall spacers keep the LDD intact.
Mask-10-P+ implant with B as 50keV.And keep the structure in high temperature furnace at 1000 deg celcius that is annealing.
7. Steps to form contacts and interconnects(local)
Etch the screen oxide used earlier using HF
Deposit Titanium(very low resistivity) on wafer surface using sputtering. The titanium is hit Argon gas and the particles get deposited on the substrate.Then create a contact by heating it at about 650-700 deg celcius in N2 ambient for 60 sec resulting in low resistance TiSi2 and TiN deposited on top of it for local communication.
Mask-11 : Is applied and extra TiN is etched with RCA cleaning.
8. Higher Level Metal Formation:
Planarize the surface with a thick layer of SiO2 doped with P(phophosilicate glass) deposited on wafer surface.It helps to reduce the temperature and CMP(Chemical mechanical Polishing) is done for planarizing the surface.Now, by **Mask-12** we etch out sio2 for contact holes. Then remove the photoresist and deposit a thin 10nm layer of TiN which acts as adhesion layer and followed by Blanket Tungsten Layer and with CMP the structure is planarized.
Now Al Layer is deposited and it is patterned with Mask-13 and rest Al is plasma etched out.
Next higher level metal is formed by depositing Sio2 and CMP done and Mask-14 is used to pattern the same. Next again TiN layer is deposited and W thereafter and Mask-15 is used to make the third level of interconnect.
The higher level interconnect is thicker than the lower level.Silicon Nitride is used to protect the chip and Mask-16 is used to open contact holes on this layer.
Spice Simulation
Extract Spice netlist
Editing Spice Deck:
Plot y vs time a
When Load Capacitance is = 0.2fF, Waveform is as shown below
Charaterizing the Cell
- Rise Transition-Time taken by output waveform to rise from 20-80 percent of Vdd =0.064ns
- Fall Transition-Time taken by output waveform to fall from 80-20 percent of vdd=27ps
3.Cell Rise Propagation Delay(propagation delay when the output of the cell is rising)=0.061ns 4. Cell Fall Delay(Propagation Delay when output if fallng)=40ps
Convert grid info to track info
Some rules for standard cell design
1.Input and output port must be on the intersection of vertical and horizontal lines
2.Width of the standard should be in odd multiples of horizontal track pitch
3.Height of the standard should be in multiples of vertical track pitch
grid
The dimension of the grid changed according to the track file data and input and output ports are found at the intersection of tracks.
Width of the standard cell should be in odd multiples of x-pitch Here width is about 3 times the x-pitch and the height is about 9 times the y-pitch
When we extract lef file the ports are referred to as pins in the MACRO
Now ports are identified as input-output ports and are defined by port class and port use.
So, for output port Port class output and port use signal For VPWR port class input and port use power For VGND port class input and port use power
Saving the cell with a custom name:
A mag file is created with the same name
Now lef file is extracted by the following command :
A lef file is create hence, So, ports are converted into pins
Now in order to integrate the cell with picorv32a ,copy the lef file and libraries into the src folder and make changes to config.tcl file
And the cell has been integrated in the synthesis:
Clock gating And and OR gates Logic are used with the clock to stop the clock and save static and dynamic power.
Delay Tables:
Its a 2-D table with timing analysis when varying input slew and output capacitances.
Every component has a delay table which are timing models for that gate.
STA -Labs
READ.me file with the strategy
Synthesis run has tns=-711.59ns wns=-23.89ns So we can reduce the slack by increasing the including the area and introducing buffers in the synthesis,increase the fanout and increase the size of cell which is creating a slack.
Then sta sta.conf to run static timing analysis
Introducing bufferes,upsizing the cell and increasing the Fanout has reduced the delay
Now replacing the synthesis file with modified one
Command in STA: write verilog /
By increasing the size of the buffers increased and more iterations were used placement.
Clock Tree Synthesis Skew-Time difference to reach the clock to different flip-flops.It should be close to zero.
So, **H-Tree** -Calculates the distance of the clock to all flip-flops and comes at the mid-pont of it from where the tree is built and looks H-shaped.
Here skew will be close to zero.
Clock Tree Buffering:
Due to transition of clock across the wire the signal might loose its integrity because of resistances and capacitances across the wire. So, the solution is clock buffers.They have equal rise and fall time.
Here red coloured bufferes are clock buffers
Clock Net Shielding and Crosstalk
Crosstalk can occur on the lines.So the lines are shielded from the signal adjacent wires or from outside.
Crosstalk can result glitch and delta delay. Glitch-Coupling capacitance can result in droop in the signal in another circuit which if inverted can result in high logic resulting in loss of signal integrity.For eg: A memory can go to reset if reset goes to high.The shirlded wired are connected to Vdd or ground.The shields dont switch and their is less possibility of glitch.
Delta Delay: If the victim and the aggressor both are switching. The delay of the victim gets impacted by the total delay gets increased by delta delay which results in skew.
Parameters considered in CTS:
Running Clock-Tree-Synthesis:
In Clock Buffers get added which modifies the Netlist and a new file picorv32a.synthesis_cts.v is created
run_cts
In real clock system , the clock will reach the flip-flop through a set of buffers and wires.
The RHS is data required time and LHS is Data arrival Time.Data required time should be greater than data arrival time, otherwise it is a negative slack.
Slack should be either positive zero.
Delta1 and Delta2 - Clock network delay of launch and capture flop.
Skew is Delta1 - Delta2 The circuit violating the timing constraints will have a slack.
Hold-Timing Analysis Single Clock
Hold time refers to the second mux delay in the flip-flop and the setup time is mux1 delay. No data should arrive when it is sending data outside and that time is the hold time.
RHS is data required time and LHS is data arrival time. So, in hold timing analysis slack=Data Arrival Time-Data Required Time.So, slack should not be negative or zero.
Time Analysis after CTS:
Invoking openroad and reading the Lef File:
Reading DEF File:
Write and read Db: A db is created in openlane
Read Liberty Reading SDC File Set Propagated Clock
Slack MET:
Hold Slack=1.8922
Removing a buf_1 and replacing with a bigger buffer buf_2
Replacing Cts_def with placement
Inside Openroad
Setup Slack=4.7821 Hold Slack =1.922
Clock Skew for Hold=0.39 Clock Skew for Setup=0.39
To kill a process= kill -9
Routing:
Routing Algorithm finds the best possible connection between 2 end points.
Maze Routing (Lee's Algorithm): Algorithm creates a routing grid in the background. It labels the adjaacent grid to 1 (vertical and horizontal) and next adjaacent grids to 2 and so on till the target is reached.So, there are many ways to reach the target and the routes with least bends are preferred(that is L shaped)
Design Rule Check while Routing- 1.Minimum width of the wire according to its optical wavelength and minimum pitch of the wire.
There might be a signal short while routing if 2 signal are travelling across same wire.So, the layer of the wire is chnaged to another layer. Bottom metal layer is connected to the top with the help of vias.
Parasitic Extraction: Resistances and Capacitances of wires are extracted.
Generate Power Distribution Network- gen_pdn
Height of the standard cell=2.72=Pitch of power rails
Power rails from the pads to ring to straps to the standard cell
run_routing
Routing is divided into :
Fast or Global Route(Fast Route):Routing region into rectangular grid cells
Detail Route: Triton Route Realises segments and vias in accordance to global route.
-
It performs initial detail route
-
Honours pre-processed route guides
-
Assume route guide for each net satisfies inter-guide connectivity.
-
Works on MILP(Mixed-Integer Linear Programming) based panel-routing with intra-layer parallel and inter-layer sequential routing framework.
Pre-processed Route Guides:
Should have unit width Should be in preferred direction
The metal layers have opposite preferred direction to reduce the parallel capacitance developed between them.
Inter-guide connectivity:
2 guides are connected if: they are on same metal layer with touching edges they are on neighbouring metal layers with nonzero vertical overlap area.
Detailed Routing: Inputs: DEF,LEF,Processed Route Guides Output: Detailed Routing Solution with optimized wirelength and vias count. Contraints: Route Guide honouring,connectivity constraints and design rules
Handling Connectivity:
Access Point: On-grid point on the metal layer in the route guide used to connect lower layer-segmenst, upper layer segments,pins and IO Ports.
Access Point Cluster:
Is the union of all Access Points derived from some lower layer segment,upper layer guide, a pin or a IO Port.
Algorithm for Routing Topology:
Find the minimal and the most optimal point between two APCs.
- Kunal Ghosh
- Nickson Jose
- Mili and Mansi