## **NoC Power Estimation at the RTL Abstraction Level**

Guilherme Guindani, Cezar Reinbrecht, Thiago Raupp, Ney Calazans, Fernando Gehm Moraes PUCRS – FACIN – Av. Ipiranga 6681 – Porto Alegre – 90619-900 - Brazil guilherme.guindani@pucrs.br, moraes@pucrs.br

### **Abstract**

The increasing use of mobile electronic devices forces the design of integrated circuits to consider low power techniques. Current power estimation models for NoCs capitalize mostly in the volume of information transmitted through the network. This work presents a more precise NoC power estimation model, based in buffer reception rates, according to the traffic scenario applied to the network. Results show the accuracy of the model compared to industrial power estimation tools, with reduced execution time. The proposed method allows exploring the NoC design space, being employed to evaluate the benefit on using the multicast service.

#### 1. Introduction

MPSoCs are widely used in mobile computing devices. One of the most important challenges in designing such MPSoCs is the fast and correct power estimation. There are several techniques to estimate the power consumption on microprocessors, but there are few techniques to estimate the power consumption in NoCs [1].

Hu et al. [2] propose a mathematical power consumption model for the macro blocks (interconnection wires and routers) of a NoC. For this model, the Authors make use of the bit energy concept [3], which represents the energy of a data bit transported throughout the interconnection wires and routers of the NoC.

Chang et al. [4] extend the NoC power consumption model proposed by Hu. In this model the average energy consumption while sending a data bit from point  $t_i$  to point  $t_j$  is given by the summation of the energy spent in the routers and communication wires that links these two points through a given route.

Banerjee et al. [5] developed power consumption models for each of the elements in a NoC individually. To build this model, the Authors synthesize the RTL NoC description, which contains the routers and the interconnection wires. The Authors utilize a SPICE simulation to compute the power consumption for each basic block. Adding the contribution of each block gives the power

consumption of each module. The router power consumption is then obtained with the summation of the modules that compose its structure.

Palma et al. [6] make use of a power consumption model based in the network packet transmission switching activity. The proposed NoC power evaluation flow comprises three basic steps. The first step starts with the VHDL NoC and traffic files. The traffic files inject packets into the network in each router local port, simulating the hardware modules plugged into each router. In the second step, the module to be evaluated (e.g. input buffer) is synthesized using a  $0.35~\mu m$  technology, and a netlist is generated. This netlist is converted into a SPICE description. The third step is a SPICE simulation. The resulting electrical information generates the network power parameters for a given traffic.

As presented above, the state of the art in NoC power estimation models uses the volume of information transmitted by the routers as the main estimation metric. This approach is inaccurate, since it does not consider the effect of congestion induced by packet collision.

This paper describes a model for power estimation in NoCs at the RTL abstraction level, based on average reception rates at each router buffer. Fast power estimation at this level is very important for designers, since commercial power estimation tools require a large amount of memory and prohibitive execution time for complex circuits, such as a complete NoC.

#### 2. Reference NoC architecture

The network used in this work is a synchronous NoC with the following features [8]: 2D mesh topology, wormhole package switching, parameterizable number of virtual channels and XY routing. The basic router architecture contains: (i) input buffers; (ii) centralized control logic, with a round-robin arbiter and XY routing; (iii) credit based flow control; (iv) internal crossbar to connect input to output ports; (v) TDM or priority-based output scheduling. Although the above features do not cover all possible NoC architectures, they represent a significant set of existing architectures [7][8].



#### 3. Power estimation model

As shown in [6], the power consumption of a router may be divided into three components: buffers, control logic and crossbar. Buffers are responsible for the most significant part of power consumption in the router. Results obtained in this work show that the buffers contribution in the total router power consumption reaches 88.6% in average. Buffer power consumption is due to the switching activity in the clock signal and in the stored value. The constant transition in the buffer clock signal provides its base power consumption. The power consumption related to the storage of new flits is added to the base power consumption. This power consumption is proportional to the number of switching bits between consecutive flits [6]. The buffer switching activity, and therefore its power consumption, is proportional to its reception rate. A buffer reception rate is obtained by the amount of flits received in a given sampling period.

The proposed power estimation model comprises two steps: *calibration* and *application*.

The calibration step defines the parameters used in the model. This step starts synthesizing the central router in the target technology (Synopsys Design compiler, technology XFAB XL035). The synthesis generates a mapped RTL, and this new RTL description replaces the original router description. The new RTL NoC description is then simulated (1 in Figure 1), applying the traffic scenarios previously created (4 in Figure 1). At the end of each simulation, a value change dump (VCD) file of the synthesized router is generated. These files contain the switching activity of each signal presented in the evaluated router (2 in Figure 1). Then, the power consumption of a complete router is annotated using a commercial power estimation tool, Synopsys PrimePower tool [9] (3 in Figure 1). Injection rates from 5% to 50% of the network bandwidth are simulated At the end of the calibration phase, a table with the power consumption for each injection rate is generated, for each element of the router (buffers, crossbar and control logic) (5 in Figure 1). An equation is obtained for each table, applying a linear adjustment technique (6 in Figure 1). This equation gives the power consumption as a function of the injection rate. This is a generic procedure, and it can be applied to networks with features different to the NoC presented in the previous section.

In the second step, the application of the model (7 in Figure 1), the NoC is simulated (in a RTL simulator - Mentor ModelSim) to obtain the reception rate of each buffer (8 in Figure 1). The buffer reception rate is measured with a *monitor* inserted in each buffer of every router

(9 in Figure 1). This monitor counts the amount of flits received in a parameterizable sample window.



Figure 1 - Proposed flow for RTL NoC power estimation.

For each buffer reception rate the associated power consumption (*Pbuffer*) is annotated, applying the power consumption equations generated in the calibration phase. The power consumption of the control logic (*Pcontrol*) and the crossbar (*Pcrossbar*) are obtained applying to each equation element the average reception rate of all buffers present in a given router (10 in Figure 1).

The power consumption of a router is given by equation (1):

$$P_{\text{avg}} = \sum_{k=1}^{m} \frac{\sum_{i=1}^{n} Pbuffer_{k_i}}{n} + Pcrossbar + Pcontrol$$
 (1)

where, m represents the number of buffers present in the router; n is the number of sampling periods.

The precision of the proposed model is a function of the reception rate sampling period. For sample windows different to the one used in the calibration phase, an error will be inserted to the power estimation. Bigger sample windows can be applied to long simulations in order to reduce the number of intermediate power consumptions values, but this will increase the error in the average router power consumption estimation.

## 4. Application of the power estimation model

The reference NoC is parameterized as follows: 2D 3 x 3 mesh topology; 8-bit flit size; 8-flit buffer depth; no virtual channels. A 50 MHz clock frequency is applied to the NoC, resulting in a maximum transmission rate per link equals to 400 Mbps.

Six traffic scenarios are applied to the RTL simulation, with an injection rate varying from 5 to 50% of the avail-

able link bandwidth. At the end of all evaluations the average power consumption table for the buffer, control logic and the crossbar are obtained (Table 1).

Table 1 - Average power consumption as a function of the reception rate. The *base* buffer power consumption, is obtained without traffic (0%).

|                   | Power Consumption (mW) |                  |          |
|-------------------|------------------------|------------------|----------|
| Reception<br>Rate | Buffer                 | Control<br>Logic | Crossbar |
| 0%                | 2.07                   | -                | -        |
| 5%                | 2.17                   | 1.33             | 0.03     |
| 10%               | 2.26                   | 1.40             | 0.05     |
| 20%               | 2.45                   | 1.56             | 0.10     |
| 30%               | 2.65                   | 1.73             | 0.16     |
| 40%               | 2.80                   | 1.86             | 0.20     |
| 50%               | 2.91                   | 1.88             | 0.21     |

Figure 2 shows the power consumption graphs for the buffer, the control logic and the crossbar. The power consumption equations are obtained from these graphs, applying a liner adjustment technique.



Figure 2 - Buffer, crossbar and control logic power consumption as a function of the buffer reception rate.

In the application step, the reception rate monitors are inserted into the original NoC RTL code, and the sample window is adjusted to the same value of the calibration step, to minimize the estimation error. In this example both sample windows have the same value, 1000 clock cycles.

Now, the RTL NoC is simulated applying the traffic under evaluation. In this example, a random traffic was automatic generated in the ATLAS framework [10], and then applied into the NoC simulation. The reception rates of each buffer, in each sample windows is extracted and stored in a table.

Applying the equations obtained in the calibration step, the power consumption of routers 11 (central router), 10 (middle router) and 00 (corner router) are estimated. The simulation time used in this example is 1 ms, and the estimated power consumption values are presented in the first line of the Table 2. Using the same traffic and simulation time, the routers were evaluated with the *Synopsys PrimePower* power estimation tool. The

estimated power consumption of these routers are presented in the second line of the Table 2. The observed error is proportional to the buffer switching activity. If a given buffer has small switching activity, as the ones presented in the corner of the mesh, a smaller error is obtained.

Table 2 - Power estimation in different routers using both the proposed model and the Synopsys PrimePower tool.

| Router     | 11 (Central) | 10 (Middle) | 00 (Corner) |
|------------|--------------|-------------|-------------|
| Model      | 12.48 mW     | 9.97 mW     | 7.83 mW     |
| PrimePower | 13.2 mW      | 10 mW       | 7.84 mW     |
| Error      | 5.4%         | 0.31%       | 0.16%       |

Note in Table 2 that only individual routers are evaluated. The required time to evaluate one router with *PrimePower* is in average 15 minutes (logic synthesis, simulation, and power estimation). To execute the complete NoC power evaluation, the whole process must be applied, consuming several hours to be executed. Our method enables to evaluate the complete NoC power consumption, since the CPU time spent is in effect the RTL simulation time. In this 3x3 NoC the CPU time spent in 2.6 GHz Pentium-D is around 10 minutes.

# 5. Case study: applying the proposed model to a unicast/multicast NoC

In this Section, the proposed model is applied to a NoC with different features from the one used to develop and validate the model. The NoC evaluated in this Section has the following characteristics: (i) 4 x 4 2D-mesh topology; (ii) 8-bit flit size; (iii) 16-flit buffer depth; (iv) credit-based flow control; (v) support to unicast and multicast services; (vi) support to packet and circuit switching; (vii) deterministic Hamiltonian routing algorithm.

The calibration step procedure is applied to this NOC, exactly as presented in Section 3. Table 3 presents the average power consumption for each NoC component, as a function of the reception rate.

Table 3 - Average power consumption for the input buffer, control logic and crossbar as a function of the reception rate for the multicast NoC.

|                | Power Consump. (mW) - Multicast NoC |               |          |
|----------------|-------------------------------------|---------------|----------|
| Reception Rate | Buffer                              | Control Logic | Crossbar |
| 0%             | 3.88                                | -             | -        |
| 5%             | 4.01                                | 1.48          | 0.004    |
| 10%            | 4.12                                | 1.49          | 0.007    |
| 20%            | 4.35                                | 1.51          | 0.01     |
| 30%            | 4.57                                | 1.53          | 0.02     |
| 40%            | 4.79                                | 1.55          | 0.03     |
| 50%            | 5.02                                | 1.56          | 0.04     |

In the next step, application, the estimated average power consumption for this NoC is obtained: 302.17 mW. The CPU time to simulate the traffic scenario applied to this NoC was 5 minutes. It was not possible to compute the average NoC power consumption with *PrimePower* tool, due to the circuit complexity (the job was aborted after 9 hours).

The major benefit of the proposed method is to quickly explore the design space. To illustrate this benefit, the next experiment evaluates the advantages on using multicast service implemented in NoC, in terms of power and energy.

The traffic scenarios used to evaluate the multicast service vary the percentage (10 to 90%) of messages being transmitted to 6 simultaneous targets. Figure 3 plots the NoC power consumption comparing transmission with multicast service (mixed) and unicast only (unicast). In average, the power consumption is quite similar for both services, since the power consumption is dominated by the switching activity in buffers.



Figure 3 - NoC average power consumption for both unicast and mixed multicast traffics.

When multicast is used, the time spent to deliver all messages is reduced. In such a way, the multicast-NoC spent less time consuming power, hence reducing the energy consumption. Figure 4 plots the energy consumption for the multicast traffic (mixed) and unicast messages.



Figure 4 - NoC energy consumption for both unicast and mixed multicast traffics.

Multicast messages are widely used in MPSoCs for cache coherence protocols and parallel applications. These messages can reduce the energy consumption of the MPSoC extending its battery lifetime.

#### 6. Conclusions and future work

In this work a generic NoC power consumption estimation model was developed, based in the router buffers reception rates. Differently from other power estimation models, which estimate the power consumption based on the application graph and the volume of information transmitted in the network, this model requires a RTL simulation of the evaluated network to obtain the reception rates in the input buffer of every router. The proposed model is more precise compared to the volume information models, as the buffers reception rates contain information about the network congestion and its effects. Hence, depending on the analyzed traffic the NoC RTL simulation may present a high simulation execution time.

Future works includes: (i) abstract modeling of the buffer reception rates; (ii) integration of the model to a MPSoC platform; (iii) develop of a low power router architecture.

#### 7. References

- [1] Penolazzi, S.; Jantsch, A. "A High Level Power Model for the Nostrum NoC". In: *EUROMICRO*, 2006, pp. 673-676.
- [2] Hu, J.; Marculescu, R. "Energy-aware mapping for tile-based NoC architectures under performance constraints". ASP-DAC, 2003, pp. 233-239.
- [3] Ye, T.; Benini, L.; De Micheli, G. "Analysis of Power Consumption on Switch Fabrics in Network Routers". In: DAC, 2002, pp. 524-529.
- [4] Chang, K.; Shen, J.; Chen, T. "A Low-Power Crossroad Switch Architecture and Its Core Placement for Network-On-Chip". In: ISLPED, 2005, pp.375-380.
- [5] Banerjee, N.; Vellanki, P.; Chatha, K. "A Power and Performance Model for Network-on-Chip Architectures". In: DATE, 2004, pp.1250-1255.
- [6] Palma, J.; et al. "Mapping Embedded Systems onto NoCs The Traffic Effect on Dynamic Energy Estimation". In: SBCCI, 2005, pp. 196-201.
- [7] Bjerregaard, T.; Mahadevan, S. "A survey of research and practices of Network-on-chip". ACM Computing Surveys, v.38(1), 2006, pp. 1-51.
- [8] Moraes, F.; et al. "Hermes: an Infrastructure for Low Area Overhead Packet-switching Networks on Chip". Integration the VLSI Journal, v.38(1), Oct. 2004, pp. 69-93.
- [9] The PrimeTime® Static Timing Analysis (STA). http://www.synopsys.com/products/analysis/primetime\_ds.h tml, 2007, Nov. 2007.
- [10] Atlas An Environment for NoC Generation and Evaluation. http://www.inf.pucrs.br/~gaph/AtlasHtml/AtlasIndex\_ us.html, 2007, Nov. 2007.