Questions About Noc #190

morainer · 2023-02-08T08:26:31Z

Hello, I am confused about the usage of Noc.

How to connect two non-adjacent components? For example, in my arch.yaml, there are three GLBs (ifmap_glb, weight_glb, psum_glb) and one PE_array. Then, according to the timeloop rule, the network connection of the architecture is as follows: ifmap_ glb<==>weight_ glb, weight_ glb<==>psum_ glb, psum_ glb<==>PE_ array. However, the network connection of my architecture I want is as follows: ifmap_ glb<==>PE_ array, weight_ glb<==>PE_ array, psum_ glb<==>PE_ Array, PE_ Array<==>PE_ Array, and the connections: ifmap_ glb<==>weight_ glb, weight_ glb<==>psum_ glb should be removed.
Further, if my architecture has three GLBs (ifmap_glb [0.. 15], weight_glb [0.. 15], psum_glb [0.. 15]) and one PE_ Array [0.. 255], when this architecture uses MeshXY network topology, how can I connect ifmap_glb with the first row of PE_array in the MeshX dimension, connect weight_ glb, psum_ Glb with the first column of PE_array in the MeshY dimension?
How to use specific network topologies (XY_NoC, ReductionTree, SimpleMulticast). What are the meanings of network_ read, network_ update, network_ fill, network_drain?

angshuman-parashar · 2023-02-08T14:45:41Z

This is a good question. As long as you have set up the bypasses correctly, the way you have described it in your arch.yaml will model the right dataflow. The only imprecision is going to be in the wire distance from the ifmap_glb/weight_glb to the PE array because those wires will be modeled as going over the weight_glb and psum_glb. Let's call this Approach A, i.e., our current baseline.

One incremental improvement we can make to Timeloop (Approach B) is to allow for "partitioned" structures. In your example, the GLB can have 3 partitions. However, we do not want to tie the number of partitions to the number of tensors. For example, your architecture may have 2 GLB partitions, but the mapping may place weights and ifmaps to partition 1 and psums to partition 2. An additional restriction with this model would be that all partitions must have the same number of instances (16 in your example).

Note that it is possible to relax this restriction as well (i.e., allow different number of instances for each partition), and this can lead to interesting architectures. See Figure 4(b) in this paper for an example: https://research.nvidia.com/publication/2021-01_hardware-abstractions-targeting-eddo-architectures-polyhedral-model. However, this needs a more sophisticated architectural model that is described using 2 levels of abstraction, a symbolic level and a physical level. We'll call this Approach C.

We had considered implementing Approach B (since it's a straightforward extension to Approach A), but since we developed Approach C for a code-generation project we decided to wait and eventually apply the same model to Timeloop as well, which would allow us to integrate everything. That effort has lagged and so we've been stuck on Approach A, which fortunately has sufficed for our modeling needs so far.

If you are interested in contributing towards implementing either Approach B or C in Timeloop please let us know, we welcome all the help we can get.

morainer · 2023-02-09T07:34:22Z

myarch.yaml.txt

Hello, thank you for your reply. But I have some more specific questions.
Figure 1 shows my initial target architecture. I do implement my arch based on Approach A, as shown in Figure 2. However, I encountered the following error:

I set the same quantities of adjacent GLB instances ( ifmap_glb[0..15], weight_glb[0..15], psum_glb[0..15] ) ：

1.When I set the attributes MeshX=16 of ifmap_glb, MeshY=16 of weight_glb, and MeshY=16 of psum_glb, Timeloop will report an error ( inner weight_spad meshX = 1, outer ifmap_spad meshX = 16, timeloop-mapper: src/mapping/arch-properties.cpp:86: void ArchProperties::DeriveFanouts(): Assertion `inner_meshX % outer_meshX == 0' failed.Aborted (core dumped) )

So, I have to set the attributes MeshX=16 of 3 GLBs to avoid this error. But this will lead to deviation from my original target architecture description.

2.When adding the spatial constraint on 3 GLBs, Timeloop will report an error (cannot find spatial tiling level associated with storage level ifmap_spad. This is because the number of instances of the next-inner level (weight_glb) is the same as this level, which means there cannot be a spatial fanout)

Therefore, I can only comment on all GLB's spatial constraints to avoid this error, but this makes it impossible for me to perform spatial constraints on GLB.

Based on the above questions, can you give me some specific suggestions?

How should I realize the architecture shown in Figure 1?

Due to the large content of the source code of the timeloop, I am so sorry that I still unable to provide some contributions just by myself. I want to know whether Timeloop can achieve the target architecture I want to describe without changing the source code?

In addition, I learned some additional specific network topologies (XY_NoC, ReductionTree, SimpleMulticast) from the source code. How should these topologies be used? Is there a corresponding explanation document for the attributes of each network topology? For example, what are the meanings of network_ read, network_ update, network_ fill, network_drain?

morainer changed the title ~~Questions About~~ Questions About Noc Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions About Noc #190

Questions About Noc #190

morainer commented Feb 8, 2023 •

edited

Loading

angshuman-parashar commented Feb 8, 2023

morainer commented Feb 9, 2023 •

edited

Loading

Questions About Noc #190

Questions About Noc #190

Comments

morainer commented Feb 8, 2023 • edited Loading

angshuman-parashar commented Feb 8, 2023

morainer commented Feb 9, 2023 • edited Loading

morainer commented Feb 8, 2023 •

edited

Loading

morainer commented Feb 9, 2023 •

edited

Loading