You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The new C++ TOG trace path (the default sim path: build_skeleton -> lower_to_emitc -> trace.so -> togsim_runtime) does not model indirect (gather/scatter) DMA access. The indirect-addressing logic that exists in the C++ engine lives only on the legacy ONNX/TileGraphParser path, which is being retired, and was never wired to the new trace runtime. So gathers are currently modeled as contiguous DMAs, giving wrong DRAM timing.
There is no indirect / offset-buffer field (togsim_ops.py ATTR_* list has none), and build_skeleton has no indirect handling at all. So a gather lowers to a normal togsim_dma with a single base offset -- the per-position scattered addresses (computed from the offset spad) are not represented.
The C++ engine does have indirect modeling -- TOGSim/src/Instruction.cc (_is_indirect_mode, load_indirect_index, _indirect_index_path) + CoreTraceLog.cc -- but that is reached via TileGraphParser (the ONNX TOG), i.e. the legacy path dropped in "Make the C++ trace the sole main TOG path; drop legacy ONNX TOG". togsim_runtime.cc (the new trace runtime) does not reference indirect. So the connection from the trace path to the indirect model is missing.
What needs to happen (when picked up)
Extend the togsim.dma ABI with an indirect descriptor (offset-buffer reference + an indirect flag).
lower_to_emitc: pass the indirect descriptor into the togsim_dma(...) call.
togsim_runtime (C++): model the scattered per-position access for the trace (port / reuse the Instruction.cc indirect logic onto the new trace runtime).
The dependency edge (gather DMA must wait for the offset build) is a smaller, separable fix in build_skeleton (add the offset spad to the gather DMA's read_bufs) and is being handled separately -- it is correct regardless of whether the scattered addressing is modeled.
Validation needs a TOGSim build with --trace_so support.
Summary
The new C++ TOG trace path (the default sim path:
build_skeleton->lower_to_emitc->trace.so->togsim_runtime) does not model indirect (gather/scatter) DMA access. The indirect-addressing logic that exists in the C++ engine lives only on the legacy ONNX/TileGraphParser path, which is being retired, and was never wired to the new trace runtime. So gathers are currently modeled as contiguous DMAs, giving wrong DRAM timing.Details
The
togsim.dmaop /togsim_dmaABI carries only:There is no indirect / offset-buffer field (
togsim_ops.pyATTR_* list has none), andbuild_skeletonhas no indirect handling at all. So a gather lowers to a normaltogsim_dmawith a single baseoffset-- the per-position scattered addresses (computed from the offset spad) are not represented.The C++ engine does have indirect modeling --
TOGSim/src/Instruction.cc(_is_indirect_mode,load_indirect_index,_indirect_index_path) +CoreTraceLog.cc-- but that is reached viaTileGraphParser(the ONNX TOG), i.e. the legacy path dropped in "Make the C++ trace the sole main TOG path; drop legacy ONNX TOG".togsim_runtime.cc(the new trace runtime) does not reference indirect. So the connection from the trace path to the indirect model is missing.What needs to happen (when picked up)
togsim.dmaABI with an indirect descriptor (offset-buffer reference + an indirect flag).build_skeleton: emit it from thememref.dma_startindirect_offsetsymbol attribute (see Indirect access: make offset an explicit togsim.transfer operand (blocked by memref.dma_start; needs direct togsim.transfer -> gemmini lowering) #282).lower_to_emitc: pass the indirect descriptor into thetogsim_dma(...)call.togsim_runtime(C++): model the scattered per-position access for the trace (port / reuse theInstruction.ccindirect logic onto the new trace runtime).Relationship to other work
memref.dma_start {indirect_offset = @spad}symbol attribute, which is exactly the input this modeling would consume. The Spike functional path already reads it (CONFIG4) and produces correct values; only the trace timing model is missing the indirect semantics.build_skeleton(add the offset spad to the gather DMA'sread_bufs) and is being handled separately -- it is correct regardless of whether the scattered addressing is modeled.--trace_sosupport.