Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Area of SRAM generated by OpenRAM using FreePDK-45nm is not consistent with CACTI 7.0 #184

Closed
Ruiggg opened this issue Mar 21, 2023 · 2 comments

Comments

@Ruiggg
Copy link

Ruiggg commented Mar 21, 2023

Describe the bug
I have generated a SRAM whose size is 16 Row * 512 bit,the area is 0.35 mm^2 (using freePDK-45nm).
However, I generated the same size of SRAM in CACTI, the area is only 0.028 mm^2. (more than 10X smaller than 0.35 mm^2)

Version
The openRAM is from https://hub.docker.com/r/vlsida/openram-ubuntu/tags
The CACTI version is 7

To Reproduce
To use Openram:

# Data word size
word_size = 512
# Number of words in the memory
num_words = 16

# Technology to use in $OPENRAM_TECH
tech_name = "freepdk45"

# You can use the technology nominal corner only
nominal_corner_only = True
# Or you can specify particular corners
# Process corners to characterize
# process_corners = ["SS", "TT", "FF"]
# Voltage corners to characterize
# supply_voltages = [ 3.0, 3.3, 3.5 ]
# Temperature corners to characterize
# temperatures = [ 0, 25 100]

# Output directory for the results
output_path = "temp"
# Output file base name
output_name = "sram_{0}_{1}_{2}".format(word_size,num_words,tech_name)

# Disable analytical models for full characterization (WARNING: slow!)
# analytical_delay = False

To use CACTI:

-size (bytes) 1024
# this value is defined in systolic.cfg
# Cache size
# -size (bytes) 131072
# -size (bytes) 8192

# this value should be calculated according to the SRAM trace and the block size
# Multiple banks connected using a bus
-UCA bank count 1

# block line size
-block size (bytes) 64

# Bus width include data bits and address bits required by the decoder
-output/input bus width 512

# power gating
-Array Power Gating - "false"
-WL Power Gating - "false"
-CL Power Gating - "false"
-Bitline floating - "false"
-Interconnect Power Gating - "false"
-Power Gating Performance Loss 0.01

# in current setting, we only apply 32nm tech
# technode
-technology (u) 0.045

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
# parameters after this point can be ignored for simulating SRAM here
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 

-read-write port 1
-exclusive read port 0
-exclusive write port 0
-single ended read ports 0

-Add ECC - "false"

# here set to direct map as it's only scratchpad memory
# To model Fully Associative cache, set associativity to zero
-associativity 1

# following parameter can have one of five values -- (itrs-hp, itrs-lstp, itrs-lop, lp-dram, comm-dram)
-Data array cell type - "itrs-lop"

# following parameter can have one of three values -- (itrs-hp, itrs-lstp, itrs-lop)
-Data array peripheral type - "itrs-lop"

# following parameter can have one of five values -- (itrs-hp, itrs-lstp, itrs-lop, lp-dram, comm-dram)
-Tag array cell type - "itrs-lop"

# following parameter can have one of three values -- (itrs-hp, itrs-lstp, itrs-lop)
-Tag array peripheral type - "itrs-lop"

# 300-400 in steps of 10
-operating temperature (K) 360

# Type of memory - cache (with a tag array) or ram (scratch ram similar to a register file) 
# or main memory (no tag array and every access will happen at a page granularity Ref: CACTI 5.3 report)
-cache type "ram"

# to model special structure like branch target buffers, directory, etc. 
# change the tag size parameter
# if you want cacti to calculate the tagbits, set the tag size to "default"
-tag size (b) "default"

# fast - data and tag access happen in parallel
# sequential - data array is accessed after accessing the tag array
# normal - data array lookup and tag access happen in parallel
#          final data block is broadcasted in data array h-tree 
#          after getting the signal from the tag array
-access mode (normal, sequential, fast) - "normal"

# following values for design objective and deviate are obtained from CACTI-6 Technical report
# DESIGN OBJECTIVE for UCA (or banks in NUCA)
# Percentage deviation from the minimum value 
# Ex: A deviation value of 10:1000:1000:1000:1000 will try to find an organization
# that compromises at most 10% delay. 
# NOTE: Try reasonable values for % deviation. Inconsistent deviation
# percentage values will not produce any valid organizations. For example,
# 0:0:100:100:100 will try to identify an organization that has both
# least delay and dynamic power. Since such an organization is not possible, CACTI will
# throw an error. Refer CACTI-6 Technical report for more details
-design objective (weight delay, dynamic power, leakage power, cycle time, area) 100:20:20:10:10
-deviate (delay, dynamic power, leakage power, cycle time, area) 10:1000:1000:1000:1000

# Objective for NUCA
-NUCAdesign objective (weight delay, dynamic power, leakage power, cycle time, area) 100:20:20:10:10
-NUCAdeviate (delay, dynamic power, leakage power, cycle time, area) 10:1000:1000:1000:1000

# Set optimize tag to ED or ED^2 to obtain a cache configuration optimized for
# energy-delay or energy-delay sq. product
# Note: Optimize tag will disable weight or deviate values mentioned above
# Set it to NONE to let weight and deviate values determine the 
# appropriate cache configuration
-Optimize ED or ED^2 (ED, ED^2, NONE): "ED^2"

-Cache model (NUCA, UCA)  - "UCA"

# not used
# In order for CACTI to find the optimal NUCA bank value the following
# variable should be assigned 0.
-NUCA bank count 0

# NOTE: for nuca network frequency is set to a default value of 
# 5GHz in time.c. CACTI automatically
# calculates the maximum possible frequency and downgrades this value if necessary

# By default CACTI considers both full-swing and low-swing 
# wires to find an optimal configuration. However, it is possible to 
# restrict the search space by changing the signaling from "default" to 
# "fullswing" or "lowswing" type.
-Wire signaling (fullswing, lowswing, default) - "default"

# "global" or "semi-global"
-Wire inside mat - "semi-global"
-Wire outside mat - "semi-global"

# "conservative" or "aggressive"
-Interconnect projection - "conservative"

# not used
# Contention in network (which is a function of core count and cache level) is one of
# the critical factor used for deciding the optimal bank count value
# core count can be 4, 8, or 16
-Core count 8
-Cache level (L2/L3) - "L3"

-Print level (DETAILED, CONCISE) - "DETAILED"

# for debugging
-Print input parameters - "true"
# force CACTI to model the cache with the 
# following Ndbl, Ndwl, Nspd, Ndsam,
# and Ndcm values
-Force cache config - "false"
-Ndwl 1
-Ndbl 1
-Nspd 0
-Ndcm 1
-Ndsam1 0
-Ndsam2 0

# not used
# following three parameters are meaningful only for main memories
-page size (bits) 8192 
-burst length 8
-internal prefetch width 8

#### Default CONFIGURATION values for baseline external IO parameters to DRAM. More details can be found in the CACTI-IO technical report (), especially Chapters 2 and 3.

# Memory Type (D3=DDR3, D4=DDR4, L=LPDDR2, W=WideIO, S=Serial). Additional memory types can be defined by the user in extio_technology.cc, along with their technology and configuration parameters.

-dram_type "DDR3"
//-dram_type "DDR4"
//-dram_type "LPDDR2"
//-dram_type "WideIO"
//-dram_type "Serial"

# Memory State (R=Read, W=Write, I=Idle  or S=Sleep) 

//-io state "READ"
-io state "WRITE"
//-io state "IDLE"
//-io state "SLEEP"

#Address bus timing. To alleviate the timing on the command and address bus due to high loading (shared across all memories on the channel), the interface allows for multi-cycle timing options. 

//-addr_timing 0.5 //DDR
-addr_timing 1.0 //SDR (half of DQ rate)
//-addr_timing 2.0 //2T timing (One fourth of DQ rate)
//-addr_timing 3.0 // 3T timing (One sixth of DQ rate)

# Memory Density (Gbit per memory/DRAM die)

-mem_density 4 Gb //Valid values 2^n Gb

# IO frequency (MHz) (frequency of the external memory interface).

-bus_freq 800 MHz //As of current memory standards (2013), valid range 0 to 1.5 GHz for DDR3, 0 to 533 MHz for LPDDR2, 0 - 800 MHz for WideIO and 0 - 3 GHz for Low-swing differential. However this can change, and the user is free to define valid ranges based on new memory types or extending beyond existing standards for existing dram types.

# Duty Cycle (fraction of time in the Memory State defined above)

-duty_cycle 1.0 //Valid range 0 to 1.0

# Activity factor for Data (0->1 transitions) per cycle (for DDR, need to account for the higher activity in this parameter. E.g. max. activity factor for DDR is 1.0, for SDR is 0.5)
 
-activity_dq 1.0 //Valid range 0 to 1.0 for DDR, 0 to 0.5 for SDR

# Activity factor for Control/Address (0->1 transitions) per cycle (for DDR, need to account for the higher activity in this parameter. E.g. max. activity factor for DDR is 1.0, for SDR is 0.5)

-activity_ca 0.5 //Valid range 0 to 1.0 for DDR, 0 to 0.5 for SDR, 0 to 0.25 for 2T, and 0 to 0.17 for 3T

# Number of DQ pins 

-num_dq 72 //Number of DQ pins. Includes ECC pins.

# Number of DQS pins. DQS is a data strobe that is sent along with a small number of data-lanes so the source synchronous timing is local to these DQ bits. Typically, 1 DQS per byte (8 DQ bits) is used. The DQS is also typucally differential, just like the CLK pin. 

-num_dqs 18 //2 x differential pairs. Include ECC pins as well. Valid range 0 to 18. For x4 memories, could have 36 DQS pins.

# Number of CA pins 

-num_ca 25 //Valid range 0 to 35 pins.

# Number of CLK pins. CLK is typically a differential pair. In some cases additional CLK pairs may be used to limit the loading on the CLK pin. 

-num_clk  2 //2 x differential pair. Valid values: 0/2/4.

# Number of Physical Ranks

-num_mem_dq 2 //Number of ranks (loads on DQ and DQS) per buffer/register. If multiple LRDIMMs or buffer chips exist, the analysis for capacity and power is reported per buffer/register. 

# Width of the Memory Data Bus

-mem_data_width 8 //x4 or x8 or x16 or x32 memories. For WideIO upto x128.

# RTT Termination Resistance

-rtt_value 10000

# RON Termination Resistance

-ron_value 34

# Time of flight for DQ

-tflight_value

# Parameter related to MemCAD

# Number of BoBs: 1,2,3,4,5,6,
-num_bobs 1
	
# Memory System Capacity in GB
-capacity 80	
	
# Number of Channel per BoB: 1,2. 
-num_channels_per_bob 1	

# First Metric for ordering different design points	
-first metric "Cost"
#-first metric "Bandwidth"
#-first metric "Energy"
	
# Second Metric for ordering different design points	
#-second metric "Cost"
-second metric "Bandwidth"
#-second metric "Energy"

# Third Metric for ordering different design points	
#-third metric "Cost"
#-third metric "Bandwidth"
-third metric "Energy"
	
	
# Possible DIMM option to consider
#-DIMM model "JUST_UDIMM"
#-DIMM model "JUST_RDIMM"
#-DIMM model "JUST_LRDIMM"
-DIMM model "ALL"

#if channels of each bob have the same configurations
#-mirror_in_bob "T"
-mirror_in_bob "F"

#if we want to see all channels/bobs/memory configurations explored	
#-verbose "T"
#-verbose "F"`

**Expected behavior**
We expect the two area are close because both technologies are 45nm.
However, the area from OpenRAM is over 10X than that from CACTI.

Expected behavior
We expect the two area are close because both technologies are 45nm.
However, the area from OpenRAM is over 10X than that from CACTI.

Logs
The Output of OpenRAM:

sram_512_16_freepdk45.html

Compiled at: 2023-02-03

DRC errors: skipped

LVS errors: skipped

Git commit id: 5ad1db9

Ports and Configuration

TypeValue
WORD_SIZE512
NUM_WORDS16
NUM_BANKS1
NUM_RW_PORTS1
NUM_R_PORTS0
NUM_W_PORTS0
Area (µm2)348716

Operating Conditions

ParameterMinTypMaxUnits
Power supply (VDD) range1.01.01.0Volts
Operating Temperature252525Celsius
Operating Frequency (F)464MHz

Timing Data

Using analytical model: results may not be precise

ParameterMinMaxUnits
din0[511:0] setup rising0.0090.009ns
din0[511:0] setup falling0.0090.009ns
din0[511:0] hold rising0.0010.001ns
din0[511:0] hold falling0.0010.001ns
dout0[511:0] cell rise0.5810.582ns
dout0[511:0] cell fall0.5810.582ns
dout0[511:0] rise transition0.0010.001ns
dout0[511:0] fall transition0.0010.001ns
csb0 setup rising0.0090.009ns
csb0 setup falling0.0090.009ns
csb0 hold rising0.0010.001ns
csb0 hold falling0.0010.001ns
addr0[3:0] setup rising0.0090.009ns
addr0[3:0] setup falling0.0090.009ns
addr0[3:0] hold rising0.0010.001ns
addr0[3:0] hold falling0.0010.001ns
web0 setup rising0.0090.009ns
web0 setup falling0.0090.009ns
web0 hold rising0.0010.001ns
web0 hold falling0.0010.001ns

Power Data

PinsModePowerUnits
!csb0 & clk0 & !web0Read Rising1.1399mW
!csb0 & clk0 & !web0Read Falling1.1399mW
!csb0 & !clk0 & web0Write Rising1.1399mW
!csb0 & !clk0 & web0Write Falling1.1399mW
csb0leakage0.00888mW

Characterization Corners

Transistor TypePower SupplyTemperatureCorner Name
TT1.025_TT_1p0V_25C.lib

Deliverables

TypeDescriptionLink
.gdsGDSII layout viewssram_512_16_freepdk45.gds
.htmlThis datasheetsram_512_16_freepdk45.html
.lefLEF filessram_512_16_freepdk45.lef
.libSynthesis modelssram_512_16_freepdk45_TT_1p0V_25C.lib
.logOpenRAM compile logsram_512_16_freepdk45.log
.pyOpenRAM configuration filesram_512_16_freepdk45.py
.spSPICE netlistssram_512_16_freepdk45.sp
.vVerilog simulation modelssram_512_16_freepdk45.v

Operating Conditions

ParameterMinTypMaxUnits
Power supply (VDD) range1.01.01.0Volts
Operating Temperature252525Celsius
Operating Frequency (F)464MHz
  |

Timing Data

Using analytical model: results may not be precise

ParameterMinMaxUnits
din0[511:0] setup rising0.0090.009ns
din0[511:0] setup falling0.0090.009ns
din0[511:0] hold rising0.0010.001ns
din0[511:0] hold falling0.0010.001ns
dout0[511:0] cell rise0.5810.582ns
dout0[511:0] cell fall0.5810.582ns
dout0[511:0] rise transition0.0010.001ns
dout0[511:0] fall transition0.0010.001ns
csb0 setup rising0.0090.009ns
csb0 setup falling0.0090.009ns
csb0 hold rising0.0010.001ns
csb0 hold falling0.0010.001ns
addr0[3:0] setup rising0.0090.009ns
addr0[3:0] setup falling0.0090.009ns
addr0[3:0] hold rising0.0010.001ns
addr0[3:0] hold falling0.0010.001ns
web0 setup rising0.0090.009ns
web0 setup falling0.0090.009ns
web0 hold rising0.0010.001ns
web0 hold falling0.0010.001ns
  |

Power Data

PinsModePowerUnits
!csb0 & clk0 & !web0Read Rising1.1399mW
!csb0 & clk0 & !web0Read Falling1.1399mW
!csb0 & !clk0 & web0Write Rising1.1399mW
!csb0 & !clk0 & web0Write Falling1.1399mW
csb0leakage0.00888mW
  |

Characterization Corners

Transistor TypePower SupplyTemperatureCorner Name
TT1.025_TT_1p0V_25C.lib
  |

Deliverables

TypeDescriptionLink
.gdsGDSII layout viewssram_512_16_freepdk45.gds
.htmlThis datasheetsram_512_16_freepdk45.html
.lefLEF filessram_512_16_freepdk45.lef
.libSynthesis modelssram_512_16_freepdk45_TT_1p0V_25C.lib
.logOpenRAM compile logsram_512_16_freepdk45.log
.pyOpenRAM configuration filesram_512_16_freepdk45.py
.spSPICE netlistssram_512_16_freepdk45.sp
.vVerilog simulation modelssram_512_16_freepdk45.v
 


The Output of CACTI: line_sz: 64 Cache size : 1024 Block size : 64 Associativity : 1 Read only ports : 0 Write only ports : 0 Read write ports : 1 Single ended read ports : 0 Cache banks (UCA) : 1 Technology : 0.045 Temperature : 360 Tag size : 42 array type : Scratch RAM Model as memory : 0 Model as 3D memory : 0 Access mode : 0 Data array cell type : 2 Data array peripheral type : 2 Tag array cell type : 2 Tag array peripheral type : 2 Optimization target : 2 Design objective (UCA wt) : 100 20 20 10 10 Design objective (UCA dev) : 10 1000 1000 1000 1000 Cache model : 0 Nuca bank : 0 Wire inside mat : 1 Wire outside mat : 1 Interconnect projection : 1 Wire signaling : 0 Print level : 1 ECC overhead : 0 Page size : 8192 Burst length : 8 Internal prefetch width : 8 Force cache config : 0 Subarray Driver direction : 1 iostate : WRITE dram_ecc : NO_ECC io_type : DDR3 dram_dimm : UDIMM IO Area (sq.mm) = inf IO Timing Margin (ps) = -14.1667 IO Votlage Margin (V) = 0.155 IO Dynamic Power (mW) = 1506.36 PHY Power (mW) = 232.752 PHY Wakeup Time (us) = 27.503 IO Termination and Bias Power (mW) = 2505.96

---------- CACTI (version 7.0.3DD Prerelease of Aug, 2012), Uniform Cache Access SRAM Model ----------

Cache Parameters:
Total cache size (bytes): 1024
Number of banks: 1
Associativity: direct mapped
Block size (bytes): 64
Read/write Ports: 1
Read ports: 0
Write ports: 0
Technology size (nm): 45

Access time (ns): 0.328414
Cycle time (ns):  0.421685
Total dynamic read energy per access (nJ): 0.0121486
Total dynamic write energy per access (nJ): 0.0116077
Total leakage power of a bank (mW): 0.022123
Total gate leakage power of a bank (mW): 0.131501
Cache height x width (mm): 0.116263 x 0.24348

Best Ndwl : 2
Best Ndbl : 2
Best Nspd : 1
Best Ndcm : 1
Best Ndsam L1 : 1
Best Ndsam L2 : 2

Data array, H-tree wire type: Global wires with 30% delay penalty

Time Components:

Data side (with Output driver) (ns): 0.328414
H-tree input delay (ns): 0
Decoder + wordline delay (ns): 0.219156
Bitline delay (ns): 0.0221284
Sense Amplifier delay (ns): 0.0045617
H-tree output delay (ns): 0.0825678

Power Components:

Data array: Total dynamic read energy/access (nJ): 0.0121486
Total energy in H-tree (that includes both address and data transfer) (nJ): 0
Output Htree inside bank Energy (nJ): 0
Decoder (nJ): 4.86508e-05
Wordline (nJ): 0
Bitline mux & associated drivers (nJ): 0
Sense amp mux & associated drivers (nJ): 0.000144445
Bitlines precharge and equalization circuit (nJ): 0.00109084
Bitlines (nJ): 0.000486859
Sense amplifier energy (nJ): 0.000962157
Sub-array output driver (nJ): 0.00941569
Total leakage power of a bank (mW): 0.022123
Total leakage power in H-tree (that includes both address and data network) ((mW)): 0
Total leakage power in cells (mW): 0
Total leakage power in row logic(mW): 0
Total leakage power in column logic(mW): 0
Total gate leakage power in H-tree (that includes both address and data network) ((mW)): 0

Area Components:

Data array: Area (mm2): 0.0283078
Height (mm): 0.116263
Width (mm): 0.24348
Area efficiency (Memory cell area/Total area) - 8.55583 %
MAT Height (mm): 0.116263
MAT Length (mm): 0.24348
Subarray Height (mm): 0.005256
Subarray Length (mm): 0.1206
[MARK] GHR 1

Wire Properties:

Delay Optimal
Repeater size - 61.34
Repeater spacing - 0.15395 (mm)
Delay - 0.240487 (ns/mm)
PowerD - 0.000265471 (nJ/mm)
PowerL - 0.000392114 (mW/mm)
PowerLgate - 0.00303194 (mW/mm)
Wire width - 0.045 microns
Wire spacing - 0.045 microns

5% Overhead
Repeater size - 32.34
Repeater spacing - 0.15395 (mm)
Delay - 0.251597 (ns/mm)
PowerD - 0.000185205 (nJ/mm)
PowerL - 0.000206732 (mW/mm)
PowerLgate - 0.00159851 (mW/mm)
Wire width - 0.045 microns
Wire spacing - 0.045 microns

10% Overhead
Repeater size - 33.34
Repeater spacing - 0.25395 (mm)
Delay - 0.264482 (ns/mm)
PowerD - 0.000168422 (nJ/mm)
PowerL - 0.000129201 (mW/mm)
PowerLgate - 0.000999019 (mW/mm)
Wire width - 0.045 microns
Wire spacing - 0.045 microns

20% Overhead
Repeater size - 25.34
Repeater spacing - 0.25395 (mm)
Delay - 0.284786 (ns/mm)
PowerD - 0.000152588 (nJ/mm)
PowerL - 9.8199e-05 (mW/mm)
PowerLgate - 0.000759302 (mW/mm)
Wire width - 0.045 microns
Wire spacing - 0.045 microns

30% Overhead
Repeater size - 20.34
Repeater spacing - 0.25395 (mm)
Delay - 0.30947 (ns/mm)
PowerD - 0.000143076 (nJ/mm)
PowerL - 7.88227e-05 (mW/mm)
PowerLgate - 0.000609479 (mW/mm)
Wire width - 0.045 microns
Wire spacing - 0.045 microns

Low-swing wire (1 mm) - Note: Unlike repeated wires,
delay and power values of low-swing wires do not
have a linear relationship with length.
delay - 0.304043 (ns)
powerD - 4.32079e-06 (nJ)
PowerL - 9.84723e-09 (mW)
PowerLgate - 1.38271e-07 (mW)
Wire width - 9e-08 microns
Wire spacing - 9e-08 microns

top 3 best memory configurations are:
Memory cap: 80 GB num_bobs: 1 bw: 533 (MHz) cost: $731.2 energy: 32.6101 (nJ)
{
(0) BoB cap: 80 GB num_channels: 1 bw: 533 (MHz) cost: $731.2 energy: 32.6101 (nJ)
==============
(0) cap: 80 GB bw: 533 (MHz) cost: $731.2 dpc: 3 energy: 32.6101 (nJ) DIMM: RDIMM low power: F [ 0(4GB) 0(8GB) 1(16GB) 2(32GB) 0(64GB) ]
==============

}

=============================================

@mguthaus
Copy link
Collaborator

mguthaus commented Mar 21, 2023

CACTI isn't a real memory. It is an estimate based on models so it is likely quite wrong. While OpenRAM won't be the most efficient compared to a hand tuned memory array, it is an actual design.

@hanm2019
Copy link

hanm2019 commented Jun 3, 2023

@Ruiggg I failed to get the area from CACTI7 due to the error: ERROR: no valid data array organizations found would you help me with CACTI7?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants