-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing memory size leads to decrease in load cycles #52
Comments
Hi Lukas, The One possible reason for your case is you are using EDP instead of latency as the optimization target. To analyze in more depth, you can try to rerun zigzag by fixing the temporal mapping of the second case, to see if the EDP will be higher than the current result. Best regards, |
Hey Jiacong, thank you very much for the reply! I ran it again with "latency" as the optimisation metric, but the results are exactly the same. For the mapping I get the following with
And the following with
This makes sense to me: smaller local buffers -> more data in the DRAM and more memory accesses, but it is not reflected in the number of cycles. Also looking at the json representation (
And the following with
I notice the 4waydatamoving is higher is the configuration with smaller buffers (again, this makes sense to me), but that is not reflected in the cycles, only in the power. I am not sure what you mean by "fixing the temporal mapping"? Best Regards, |
Hi Lukas, Thanks for the more detailed information. What I mean is you can adjust the temporal loop orders for both cases to understand the results, though I believe your current information is sufficient for this purpose. It makes sense that the increased memory access count is not reflected in the total latency, as its latency can be hidden by the computation latency (through double-buffering). Therefore, the computation latency does not change in both cases, since the PE array size is the same. In short, the latency difference between the two cases is due to the difference in on/off-loading latency. On/off-loading latency is calculated based on the required data amount of lower-level memories. In the second case (with bigger memory sizes), all loops are unrolled within SRAM, so the entire workloads/layers have to be loaded from DRAM to SRAM, which dominates the on/off-loading latency. However, in the first case (with smaller memory sizes), only partial workloads/layers are loaded from DRAM to SRAM during the data on/off-loading process. The way of processing data on/off-loading is hardware-dependent, so you may need to consider how to calculate the cost of this component if this is not your case. You can check how the on/off-loading latency is calculated in the code here. Here is an intuitive explanation for your case. In your first setting (with smaller memory sizes), the amount of weight data required to be loaded from DRAM to SRAM_64KB is calculated as C_total×K_total/C(dram)/K(dram) =400. This means the required cycles for weight on-loading to SRAM is 400×precision/dram_portwidth=200. In the second setting (with bigger memory sizes), no for loop is unrolled on DRAM, resulting in the required weight data amount by SRAM becoming 10,000. This causes a latency cost of 10,000×precision/dram_portwidth=5000. The calculation in the code is more complicated, as it also considers factors such as whether a memory port is shared, multiple levels of memories, and the ceiling nature of latency calculation. Best regards, |
Hello and thank you for creating this great open source tool!
I am currently trying to model my accelerator in ZigZag for DSE, but I encountered an issue when running ZigZag with different memory sizes. To check if my implementation is the problem, I recreated the issue with the Eyeriss implementation from the repository.
This is my test code for Eyeriss
Now when I run this with the default memory sizes:
eyeriss = Eyeriss(Eyeriss_config(DIM=4,mem0_kb=8, mem1_kb=64,mem2_kb=1024))
I get:latency_total2 = 260519
data_loading_cycle = 5020
But when I run it with:
eyeriss = Eyeriss(Eyeriss_config(DIM=4,mem0_kb=1, mem1_kb=2, mem2_kb=4))
I get:latency_total2 = 250439
data_loading_cycle = 220
So decreasing the memory size leads to way fewer cycles for data loading, which seems very counter intuitive to me. I am using the latest version of ZigZag from the master branch.
Maybe I messed up somewhere in my configuration?
However when looking at the evaluation breakdown for both configs, the energy usage goes way up (as I would expect):
eyeriss = Eyeriss(Eyeriss_config(DIM=4,mem0_kb=8, mem1_kb=64,mem2_kb=1024))
:eyeriss = Eyeriss(Eyeriss_config(DIM=4,mem0_kb=1, mem1_kb=2, mem2_kb=4))
:I would love some input on this,
Best Regards,
Lukas
The text was updated successfully, but these errors were encountered: