hierarchical & discrete low power clock gating #1166
Replies: 3 comments 17 replies
-
Hi ^^ So, i'm not an asic guy, and i never practiced glock gating.
So if i understand well, hierarchical clock gating would detect that sub modules local clock gating are in sleep, and so will also gate the clock at a higher level ? In general, what is the difference between a local clock gating and the following ? : val r = Reg(UInt(8 bits))
when(cond){
r := r + 1
} Regards |
Beta Was this translation helpful? Give feedback.
-
For some components that we put on an ASIC we did what @Dolu1990 proposed: We used some ClockDomains with enable signals and in our toolchain clock gate cells are then automatically inserted by DC (if there are more than ~10 FFs with the same enable) Since ClockDomains can be derived from each other it should also be quite doable to get a tree. The other place where I see some possible work it with sync primitives between those domains. Going to/from a CD and a version of it with enable can use much more lightweight sync than between async CDs (i.e. just making sure that in the fast CD you don't fire multiple times on the same valid of a Stream). That is one of the major reasons why I often don't use a separate CD (especially if I targetting an FPGA). |
Beta Was this translation helpful? Give feedback.
-
I still could not understand the logic underneath, why the BlackBox is expected? Does it mean a hack point for user who want to customize the function? |
Beta Was this translation helpful? Give feedback.
-
In low power ASICs, clock gating is essential to save power, as the clock network (wires, buffers, inverters, and transistors in DFF cells) can eat a significant part of the dynamic power of a circuit as it is a high-frequency signal.
Discrete clock gating can be handled automatically by most closed-source simulator when they encounter a code structure such as:
Here, the signal
state_reg_update
will be used as a enable condition, and the tool will create a clock gating cell to gate the clock landing on those registers. It is important for power efficiency as well as area efficiency that a clock gating cell drives multiple registers. Thus, registers can be grouped together in a clock gate group if their update condition is similar (maybe at the price of a mux at their input D to take the Q value if they should not be updated at this cycle).However, the tool does not do hierarchical clock gating. Hierarchical clock gating means to clock gate a group of clock gating cells, to create a clock gating tree, spanning across modules hierarchies. It is up to the designer to OR those *_update signals and lower clock enable conditions of clock gating cells, to create a higher-level clock enable signal used to clock gate at a higher level. Usually, the designer tries to be clever, and hack a smaller clock enable signal condition than the ORing of lower level clock enable signals (and that creates more bugs :D ). Usually, the designer also add a busy / a clk_req output signal to each modules, telling that the block needs the clock to process, that can drive a clock gating cell at a higher level.
Unfortunately, hierarchical clock gating is the least concern for a designer, because it is often added to an existing block / hierarchy of block that was not made with that optimization in mind. Often, a sub-optimal clock enable condition will be guessed to drive a few clock gating cells, leading to poor clock power reduction. Such designs are often legacy or fgpa centric.
I would like to help ease clock gating handling in SpinalHDL. SpinalHDL designs are quite fpga centric, and clock gating, even the discrete/automatic way, is nonexistent. A simple example would be a RegNext popping up in a function used to pipe a stream. This is understandable, as in fpga designs, power is not a concern. Yet I believe that SpinalHDL could be turned into a great HDL for low power asic design at a minimal design cost.
First, SpinalHDL would have some context to be aware if the design requires clock gating or not. Usually for fpga, or area constrained designs, we don't care for instance.
In a Component, registers could be grouped manually into a clock gating group. I considered using something derived from an area, but that would be invasive, as it is a design choice that can be decoupled from the functional logic at hand.
For discrete clock gating of registers & registers created inside a function:
It is rather ugly, maybe incorrect, but I am sure Scala could give some syntax sugar. There a clock gating group is created for registers a, b and registers created RegNextWhen in .pipe() function. The update condition of all those registers are OR'ed together to create the clock enable condition. This enable condition could then be used to instanciate a clock gating cell, or just by adding a when(cg_en) {...} into spinalhdl graph.
For designer that like to golf some area, a clock gating condition could be specified manually:
cg.override_enable(io.cmd.valid && someInflightValid)
For hierarchical clock gating, at the end of the elab of a Component, a clk_req output port could be created (todo: what happens if multiple clocks ?). Similarly, some clock gating groups could be created to group component instances together.
To go further into the idea, there can also be a clk_gnt port. If clock is not granted to the module instance, it must be disconnected from other modules instances communicating with it: streams are cut at the boundary. However it is a bit beyond the initial scope.
So I tried to make some inital code for this feature, but it was outside SpinalHDL core to avoid burdening it. However, this implementation means registers instanciated inside functions in libraries are hard to clock gate, or I was unable to. Thus I would like to discuss about with you, developers of SpinalHDL, if you would be interested in such a feature, how the user interface would be, how do you think it could be implemented inside the core. Again, I think SpinalHDL is a great designer tool, and with some features, would prove tremendously efficient in ASIC design as well. I have some limited knowledge of SpinalHDL internals, so I am ready to help developing such features :)
If you read this monster, thank you :D
Beta Was this translation helpful? Give feedback.
All reactions