Conversation
Distributed macro placement (4096x16 along the bottom, 16x1024 and 1024x16 sharing the top row with a 52 um channel between them), CORE_UTILIZATION 25 -> 45, MACRO_PLACE_HALO 12 -> 20. Detail routing reaches a stable 1780 Lef58EolKeepOut M4 violations (down from 4180 with default RTLMP placement at util 25 / halo 12) but does not clear to zero -- residual violations are signals exiting the macro's M4 side-pins immediately hitting the macro's M4 power straps, same root cause that keeps cnn/bp_uno/bp_quad in the asap7 NOT-CACHED cohort. Timing closes cleanly: TNS/WNS 0, worst slack +3253 ps, fmax 572 MHz (target 200 MHz). Filed under: best DRC count any HighTide asap7 bsg-fakeram-style design has reached, but partial -- design ships as NOT-CACHED with the residual count recorded in the results page.
Update — 031f73a (2026-05-12)New commit changes the floorplan strategy. Result is the same class of failure the original analysis described, but the EOL plateau drops from ~4150 → 1780. Changes
DRT trajectory
DRT spent iters 0–9 looking worse than the original (smaller die = denser initial state) and then broke through, eventually reaching half the original count. Root cause confirmation99.94% of the remaining 1780 violations are still The hand placement buys roominess — signals that can escape on M5+ now have room to do so — but signals that must exit at a macro M4 pin still hit the macro's own internal power straps regardless of layout. The original analysis's open fix (doubling pin pitch via Other QoRTiming closes cleanly:
Results-page coverageResults row + gallery card landed on the |
I had Claude do all this for me. I've done 0 debugging on my own.
Here is the output of DECISIONS.md:
Ternip
Ternip is a custom fixed-point ternary matrix-multiply inference processor written in SystemVerilog. It requires native SV synthesis via yosys-slang (
SYNTH_HDL_FRONTEND: slang) and three FakeRAM macros replacing the behavioralternip_pipelined_memmodule.asap7
Status: not finishing — detail routing does not converge
Last updated: 2026-05-08
Configuration
SYNTH_HDL_FRONTEND = slang(native SystemVerilog — no sv2v)SYNTH_HIERARCHICAL = 0(hierarchical mode caused CTS ODB-1200 InsertBufferBeforeLoads failure)CORE_UTILIZATION = 25,PLACE_DENSITY = 0.55MACRO_PLACE_HALO = 12 12TNS_END_PERCENT = 100clk_i, 5000 ps (200 MHz)CONFIG_FILENAMEset viaVERILOG_DEFINES;hightide.svhresolved fromVERILOG_INCLUDE_DIRSFakeRAM macros (asap7)
ternip_pipelined_memis the sole memory primitive, parameterized byDATA_WIDTHandNUM_ENTRIES. Three instances are synthesized; each is replaced by afakeram7_*macro viaternip_pipelined_mem_fakeram7.v.fakeram7_4096x16vector_registers.pipelined_memternip_vector_registers.sv—FixedPointPrecision × (D × NumVectorRegisters)= 16 × 4096fakeram7_1024x16tmatmul/exportvectorfus/ternip_tmatmul.sv— export vector buffer,DATA_WIDTH=FixedPointPrecision,NUM_ENTRIES=Dfakeram7_16x1024tmatmul/importvectorfus/ternip_tmatmul.sv— import vector buffer,DATA_WIDTH=D×FixedPointPrecision/TmatmulParallelism,NUM_ENTRIES=TmatmulParallelismWith
D=1024,FixedPointPrecision=16,TmatmulParallelism=64: importvector DATA_WIDTH = 1024×16/64 = 256... see note below.LEF/LIB files generated by
designs/src/ternip/dev/gen_fakeram.py --platform asap7. Macro geometry targets a 2:1 aspect ratio; pin pitch matches bsg_fakeram's proven asap7 format (M4, 0.144 µm pitch, 0.072 µm protrusion).Floorplan — macro placement
Die: 514.9 × 514.9 µm at 25% utilization. RTLMP places all three macros automatically:
fakeram7_4096x16vector_registers.pipelined_memfakeram7_1024x16tmatmul/exportvectorfakeram7_16x1024tmatmul/importvectorDetail routing — convergence failure
Global routing passes cleanly (0 overflow, 1.79% resource usage). Detail routing does not converge; the router reaches the 50-iteration limit with ~4,150
eolKeepOutviolations remaining.Selected per-iteration violation counts:
The count drops sharply in iterations 0–3 (general routing cleanup), then plateaus at ~4,150
eolKeepOutviolations from iteration 4 onward with no further improvement.Root cause:
fakeram7_16x1024has 2 × 1024 data pins + 4 address pins + 3 control pins = 2055 signal pins at 0.144 µm pitch on a 148.8 µm-tall body. The macro sits in the upper-right corner of the die (x = 502 µm in a 515 µm-wide die) in R180 orientation. The resulting pin clusters at the macro edges create a local routing hot spot that the detail router cannot escape — every attempted reroute around oneeolKeepOutviolation displaces another.Global routing sees no overflow because the congestion is localized to the pin-access layer directly adjacent to the macro edge; the global router operates at a coarser granularity and does not model per-pin eolKeepOut constraints.
Open fix
Increase
pin_track_countfrom 3 to 6 ingen_fakeram.pyforfakeram7_16x1024(doubling the pin pitch from 0.144 µm to 0.288 µm). This grows the macro height from 148.8 µm to ~296 µm but gives the detail router 2× more routing space between adjacent pins. Requires regenerating the LEF/LIB and rerunning from floorplan.