# A Low-Power Fat Tree-based Optical Network-on-Chip for Multiprocessor System-on-Chip

Huaxi Gu<sup>1</sup>, Jiang Xu<sup>1</sup>, Wei Zhang<sup>2</sup>
1. ECE, Hong Kong University of Science and Technology, Hong Kong, China
2. EE, Princeton University, NJ, USA
{eeguhx, jiang.xu}@ust.hk, weiz@princeton.edu

Abstract—Multiprocessor system-on-chip (MPSoC) is an attractive platform for high-performance applications. Networkson-Chip (NoCs) can improve the on-chip communication bandwidth of MPSoCs. However, traditional interconnects consume significant amount of power to deliver even higher communication bandwidth required in the near future. Optical NoCs are based on CMOS-compatible optical waveguides and microresonators, and promise significant bandwidth and power advantages. This paper proposes a fat tree-based optical NoC (FONoC) including its topology, floorplan, protocols, and a low-power and low-cost optical router, optical turnaround router (OTAR). Different from other optical NoCs, FONoC does not require building a separate electronic NoC for network control. It carries both payload data and network control data on the same optical network, while using circuit switching for the former and packet switching for the latter. The FONoC protocols are designed to minimize network control data and the related power consumption. An optimized turnaround routing algorithm is designed to utilize the low-power feature of OTAR, which can passively route packets without powering on any microresonator in 40% of all cases. Comparing with other optical routers, OTAR has the lowest optical power loss and uses the lowest number of microresonators. An analytical model is developed to characterize the power consumption of FONoC. We compare the power consumption of FONoC with a matched electronic NoC in 45 nm, and show that FONoC can save 87% power comparing with the electronic NoC on a 64-core MPSoC. We simulate the FONoC for the 64-core MPSoC and show the end-to-end delay and network throughput under different offered loads and packet sizes.

#### 1. Introduction

As the number of transistors available on a single chip increases to billions or even larger numbers, multiprocessor system-on-chip (MPSoC) is becoming an attractive choice for high-performance and low-power applications [1]. Traditional on-chip communication architectures for MPSoC face several issues, such as poor scalability, limited bandwidth, and high power consumption [2][3]. Networks-on-chip (NoCs) relieve MPSoC of these issues by using modern communication and networking theories. Many NoCs have been studied, and most of them are based on metallic interconnects and electronic routers [4][5][6][7][8][9]. As new applications continuously push the limits of MPSoC, the conventional metallic interconnects and electronic routers gradually become the bottlenecks of NoC performance due to the limited bandwidth, long delay, large area, high power consumption, and crosstalk noise [10][11].

Optical NoCs use silicon-based optical interconnects and routers, which are compatible with CMOS technologies [12]. Studies shows that optical NoC is a promising candidate to achieve significant higher bandwidth, lower power, lower interference, and lower delay compared with electronic NoCs [13]. Optical interconnects have demonstrated their strengths in multicomputer systems, on-board inter-chip interconnect, and the switching fabrics of Internet routers. Silicon-based optical waveguides can be used to build on-chip optical interconnects [14]. The progress in photonic technologies, especially the development of microresonators, makes optical on-chip routers possible [15]. Microresonators can be fabricated on silicon-on-insulator (SOI) substrates, which have been used for CMOS-based high-performance low-leakage SoCs. Microresonators, as small as 3µm in diameter, have been demonstrated [16].

Several optical NoCs and optical routers are proposed to use microresonators. A. Shacham et al. proposed an optical NoC [10]. The optical NoC uses an augmented torus network to transmit payload data, while network control data are transmitted through a separate electronic network. It is built from 4x4 optical routers, injection switches, and ejection switches. The injection and ejection switches are used for local injection and ejection packets. M. Briere et al. proposed a multistage optical router called  $\lambda$ -router [11].  $\lambda$ -router uses a passive switching fabric and wavelength-division multiplexing (WDM) technology. An NxN  $\lambda$ -router needs N wavelengths and multiple basic 2x2 switching elements to achieve nonblocking switching. A. W. Poon et al. proposed a nonblocking optical router based on an optimized crossbar for 2D mesh optical NoC [17]. Each port of the router is aligned to its corresponding direction to reduce the waveguide crossings around the switching fabric. We proposed an optical router, which significantly reduces the cost and optical power loss of 2D mesh/torus optical NoCs [18]. Previous optical NoC and router studies are concentrated 2D topologies, such as mesh and torus.

In this paper, we propose a new optical NoC, FONoC (fat tree-based optical NoC), including its topology, protocols, as well as a low-power and low-cost optical router, OTAR (optical turnaround router). Different from previous optical NoCs, FONoC does not require building a separate electronic NoC. It transmits both payload data and network control data on the same optical network. FONoC is based on fat tree which is a hierarchical multistage network. Fat tree has been used by multi-computer systems [19]. It also attracts the

attentions of electronic NoC studies [20][21][22]. While electronic fat tree-based NoCs use packet switching for both payload data and network control data, FONoC uses circuit switching for payload data and packet switching for network control data. The protocols of FONoC minimize the network control data and the related power consumed by opticalelectronic conversions. An optimized turnaround routing algorithm is designed to utilize the minimized network control data and a low-power feature of OTAR, which can passively route packets without powering on any microresonator in 40% of cases. An analytical model is developed to assess the power consumption of FONoC. Based on the analytical model and SPICE simulations, we compare FONoC with a matched electronic NoC in 45nm. We simulate the FONoC for the 64core MPSoC and show its performance under different offered loads and packet sizes.

The rest of the paper is organized as follows. Section 2 describes the optical router proposed for FONoC. Section 3 details FONoC, including the topology, floorplan, and protocols. Section 4 evaluates and analyzes the power consumption, optical power loss, and network performance of FONoC. Conclusions are drawn in Section 5.

## 2. Optical Turnaround Router for FONoC

OTAR (optical turnaround router) is the key component of FONoC. It implements the routing function. OTAR switches packets from an input port to an output port using a switching fabric, which is composed of basic switching elements. OTAR uses two types of basic switching elements which are based on microresonators. We will introduce the working principles of the microresonator and switching elements before detailing the router.



Figure 1. Switching elements

## 2.1. Microresonator and switching elements

The two switching elements used by OTAR are crossing and parallel elements, which implement the basic 1x2 switching function (Figure 1). Both of the switching elements consist of a microresonator and two waveguides. The parallel element does not have any waveguide crossing, and hence no crossing insertion loss. The resonance wavelength of the microresonator can be controlled by voltage. While powered off, the microresonator has an off-state resonance wavelength  $\lambda_{\rm off}$ , which is determined by the material and internal structure of the microresonator. When the microresonator is powered on, the resonance wavelength changes to the on-state resonance wavelength  $\lambda_{on}$ . If the wavelength of an optical signal is different from a resonance wavelength, it will be directed to the through port. Otherwise, the signal will be routed to the drop port. By powering on or off the microresonator, the basic switching elements can be controlled to switch a packet to either the drop port or through port. The switch time of the microresonator is small, and a 30ps switching time has been demonstrated [15].

#### 2.2. Traditional switching fabrics

The switching fabric of an optical router can be implemented by the traditional fully-connected crossbar. An nxn optical router requires an nxn crossbar, which is composed of  $n^2$  microresonators and 2n crossing waveguides. Figure 2a shows a 4x4 fully-connected crossbar, which has four input ports and four output ports. The fully-connected crossbar can be optimized based on the routing algorithm used by an optical router. Turnaround routing algorithm has been favored by many fat tree-based networks [23][24]. It is also called least common ancestor routing algorithm. In turnaround routing algorithm, first, a packet is routed upstream until it reach the common ancestor node of the source and destination of the packet; then, the packet is routed downstream to the destination. Turnaround routing is a minimal path routing algorithm and is free of deadlock and livelock. In addition, it is a low-complexity adaptive algorithm without using any global information. These features make turnaround routing algorithm particularly suitable for optical NoCs, which require both low latency and low cost at the same time. Some microresonators can be removed from the fully-connected crossbar based on turnaround routing algorithm (Figure 2b). Compared with the fully-connected crossbar, the optimized crossbar saves six microresonators, but still has the same number of waveguide crossings. The optimized crossbar does not improve the optical power loss or power consumption compared with the fully-connected crossbar.



Figure 2. 4x4 crossbar-based switching fabrics

#### 2.3. Optical turnarround router

We propose a new router, OTAR, for FONoC (Figure 3). OTAR is a 4x4 optical router using turnaround routing algorithm. It consists of an optical switching fabric, a control unit, and four control interfaces (CI). The switching fabric uses only six microresonators and four waveguides. The control unit uses electrical signals to configure the switching fabric according to the routing requirement of each packet. The control interfaces inject and eject control packets to and from optical waveguides.

The OTAR router has four bidirectional ports, which called UP right, UP left, DOWN right, and DOWN left respectively. OTAR has a low-power feature. It can passively route packets which travel on the same side. Packets, travelling between UP left and DOWN left as well as between UP right and DOWN right, do not require to power on any microresonator. These account for 40% of all cases. The four ports are aligned to their intended directions, and the input and output of each port is also properly aligned. The microresonators in OTAR are identical, and have the same onstate and off-state resonance wavelengths,  $\lambda_{on}$  and  $\lambda_{off}$ . OTAR

uses the wavelength  $\lambda_{on}$  to transmit the payload packets which carry payload data, and  $\lambda_{off}$  to transmit control packets which carry network control data.



Figure 3. Optical turnaround router

The switching fabric implements a 4x4 switching function for the four bidirectional ports. It is designed to minimize waveguide crossings. U-turn function is not implemented because the routing algorithm does not use it. Two unnecessary turn are also eliminated since payload packets will not make turns when they flow down the fat tree in turnaround routing. OTAR router is strictly non-blocking while using turnaround routing algorithm. This can be proved by enumerate all cases. The non-blocking property can help to increase the network throughput.

The control unit processes the control packets and configures the optical switching fabric. Control packets are used to setup and maintain optical paths for payload packets, and are processed in the electronic domain. The control unit is built from CMOS transistors and uses electrical signals to power on and off each microresonator according to the routing requirement of each packet. It uses an optimized routing algorithm, which we will describe in the next section. Each port of OTAR has a control interface. The control interface includes two parallel switching elements, an optical-electronic converter (OE), and an electronic-optical converter (EO). The parallel switching elements minimize the optical loss. The OE converts optical control packets into electronic signals, and EO does the reverse conversion. The microresonators in the control interface are always in the off-state and identical to those in the optical switching fabric. Their off-state resonance wavelength  $\lambda_{\text{off}}$  is used to transmit control packets.

## 3. Fat Tree-based Optical NoC

We propose a new optical NoC, FONoC (fat tree-based optical NoC), for MPSoC including its topology, floorplan, and protocols. Different from other optical NoCs, FONoC transmits both payload packets and control packets on the same optical network. This saves the cost for building a separate electronic NoC for control packets. The hierarchical network topology of FONoC makes it possible to connect the FONoCs of multiple MPSoCs and other chips, such as off-chip memories, into an inter-chip optical network and form a more powerful multiprocessor system.

#### 3.1. Topology and floorplan

FONoC is based on fat tree to connect OTARs and processor cores (Figure 4). It is a non-blocking network, and provides path diversity to improve performance. Processors are connected to OTARs by optical-electronic and electronicoptical interfaces (OE-EO), which convert signals between optical and electronic domains. FONoC(l,k) connects kprocessors using an *l*-level fat tree. There are k processors at level 0 and k/2 OTARs at other levels. To connect kprocessors, the number of network levels required is  $l = \log_2 k + 1$ . While connecting with other MPSoCs and offchip memories, OTARs at the topmost level route the packets from FONoC to an inter-chip optical network. In this case, the number of OTARs required is  $\frac{k}{2}\log_2 k$ . If an inter-chip optical network is not used, OTARs at the topmost level can be omitted. In this case, only  $\frac{k}{2}(\log_2 k - 1)$  OTARs are required. In Figure 4, each optical interconnect is bidirectional, and includes two optical waveguides.



Figure 4. FONoC topology of for a 64-core MPSoC



Figure 5. FONoC floorplan for the 64-core MPSoC

The corresponding floorplan of FONoC for a 64-core MPSoC is shown in Figure 5. Started from level 2, multiple OTARs are grouped into a router cluster for floor planning purpose. The router clusters are connected by optical interconnects. FONoC can be built on the same device layer as the processors. To reduce chip area, 3D chip technology can fabricate FONoC on a separate device layer and stack it onto a device layer for processor cores [25].

#### 3.2. FONoC protocols

FONoC uses connection-oriented circuit switching to transfer payload packets and packet switching for control packets. Lacking effective optical buffers, optical NoCs using packet switching converts signals from optical domain to electronic domain for buffering, and converts them back to optical domain for transmission. The conversions consume a lot of power. FONoC uses packet switching only for control packets, because network control data are critical for network performance and usually processed and shared by the routers along its path.

Before payload packets can be transmitted in FONoC, an optical path is first reserved from a source processor to a destination processor. The path consists of a series of OTARs and interconnects, and is managed by three control packets, SETUP, ACK, and RELEASE. A SETUP packet is issued by the source and requests OTARs to reserve a path. OTAR finds and reserves a path based an optimized turnaround routing algorithm, which will be described shortly. It has  $l_{setup}$  bits and only contains the destination address. For FONoC with k processors,  $l_{setup}$  is  $\log_2 k$ . When the SETUP reaches the destination, an ACK packet is sent backward to the source and requests OTAR to power on the resonators along the path. Once receiving the ACK, the source sends the payload packets. Along with the last bit of payload packets, the source sends a RELEASE packet to free the reserved path. There is no buffer required for payload packets. Once the connection is established, the latency and bandwidth are guaranteed.

We optimize the traditional turnaround routing algorithm for FONoC, and call it EETAR (energy-efficient turnaround routing). EETAR utilizes the special feature of OTAR. It is an adaptive and distributed routing algorithm. In EETAR, a packet first climbs the tree. Each router chooses an available port to move the packet upward until it arrives at a router which is the common ancestor of the source and destination. Then, the packet will move downward along a deterministic path. EETAR takes account of the power consumption of microresonators. It chooses to passively route packets whenever possible. For example, EETAR tries to route packets coming from the DOWN left port of OTAR to the UP left port, and avoid powering on any microresonator. This not only reduces power consumption but also avoids the high insertion loss of microresonators. Moreover, EETAR makes routing decisions without using source addresses. This reduces the length of SETUP packets to half, and hence reduces the power consumption at the control interfaces of OTAR. In the best case, EETAR can save half of the power consumed by a packet comparing with traditional turnaround routing. The pseudo-code of EETAR is as follows.

We define a node in FONoC(l,k) as either a processor or router. Node (x, y) is the x-th node at y-th level (Figure 4). Except nodes at 0-th level, each node connects two parent nodes and two child nodes through UP left and UP right ports, which are labeled as  $p_{up}^0$  and  $p_{up}^1$ , and DOWN left and DOWN right ports, which are labeled as  $p_{down}^0$  and  $p_{down}^1$ .

#### /\* EETAR Algorithm \*/

INPUT destination  $(x_d, 0)$ , current node  $(x_c, y_c)$ , input port  $p_{in}$ 

IF 
$$U \le x_d \le U + 2^{y_c} - 1$$
,  $U = 2^{y_c} \cdot \lfloor x_c \ DIV \ 2^{y_c - 1} \rfloor$ 
/\* make turns and move downward \*/
 $p_{out} = p_{down}^i$ 
 $i = (x_d \ SHIFTRIGHT \ (y_c - 1) \ bits) \ MOD \ 2$ 
ELSE /\* move upward \*/
IF port  $P_{up}^{P_{in}}$  is available,  $p_{out} = p_{up}^{P_{in}}$ 
ELSE  $p_{out} = p_{up}^{1-p_{in}}$ 

RETURN output port  $p_{out}$ 

## 4. Comparasion and Analysis

We analyze the power consumption, optical power loss, and network performance for FONoC. The power consumption of FONoC is compared with a matched electronic NoC. The optical power loss of OTAR is compared with three other optical routers under different conditions. We simulate and compare the network performance of the FONoC for the 64-core MPSoC under different offered loads and packet sizes.

#### 4.1. Power consumption

Power consumption is a critical aspect of NoC design. For high-performance computing, low power consumption can reduce the cost related with packaging, cooling solution, and system integration. FONoC consumes power in several ways. OE-EO interfaces consume power to generate, modulate, and detect optical signals. Optical routers consume power to route packets. Control units need power to make decisions for control packets. We develop an analytical model to characterize the power consumption of FONoC.

 $E_{PK}^o$  is defined as the energy consumed to transmit a payload packet. It has two portions as shown in equation (1), where  $E_{payload}^o$  is the energy consumed by a payload packet directly, and  $E_{ctrl}$  is control overhead.

$$E_{PK}^{o} = E_{payload}^{o} + E_{ctrl} \tag{1}$$

 $E^o_{payload}$  can be calculate by equation (2), where m is the number of microresonators in the on-state while transferring the payload packet,  $P^o_{mr}$  is the average power consumed by a microresonator when it is in the on-state,  $E^o_{payload}$  is the payload packet size, R is the data rate of EO-OE interfaces, d is the distance traveled by the payload packet, c is light speed in vacuum, n is the reflection index of silicon optical waveguide,  $E^o_{oeeo}$  is the energy consumed for 1-bit OE and EO conversions.

$$E_{payload}^{o} = mP_{mr}^{o} \cdot \left(\frac{L_{payload}^{o}}{R} + \frac{d \cdot n}{c}\right) + E_{oeeo}^{o} \cdot L_{payload}^{o} \tag{2}$$

 $E_{ctrl}$  can be calculated by equation (3). Additional variables are defined as follows.  $L_{ctrl}^o$  is the total size of the control packets used, h is the number of hops to transfer the payload

packet,  $E_{cu}^{e}$  is the average energy required by the control unit to make decisions for the payload packet.

$$E_{ctrl} = E_{oeeo}^{o} \cdot L_{ctrl}^{o} \cdot h + E_{cu}^{e} \cdot (h+1)$$
 (3)

The power consumption of a matched electronic fat treebased NoC is analyzed in a similar way. The electronic NoC has the same topology as FONoC and uses turnaround routing algorithm. We designed and simulated a 4x4 input-buffered pipelined electronic router for the electronic NoC based on the 45nm Nangate open cell library and Predictive Technology Model [26]. Each port of the electronic router is 32-bit wide. The switching fabric of the electronic router is a crossbar. We assume each processor core is 1mm by 1mm. The metal wires in the electronic NoC are modeled as fine-grained lumped RLC networks. The coupling capacitances among adjacent wires are considered. Since mutual inductance has a significant effect in deep submicron process technologies, it is considered up to the third neighboring wires. The electronic router and metal wires are simulated in Cadence Spectre. Simulation results show that on average the crossbar consumes 0.06pJ/bit, the input buffer consumes 0.003pJ/bit, and the control unit consumes 1.5pJ to make decisions for each packet. We assume the data rates at the interfaces of FONoC and the electronic NoC are both 12.5Gbps, which has been demonstrated [27]. The average size of payload data is 512bits. While interfacing with 45nm CMOS circuits, the energy consumed of OE and EO conversions is estimated to be 1 pJ/bit, which scales down from an 80nm design [28]. OTAR uses the same control unit as the electronic router. In the on-state, a microresonator need a DC current and consume less than 20µW [17].

We compares the power consumed by FONoC and the electronic NoC (ENoC) while connecting a different number of processors and using different packet sizes. The results show that FONoC consumes significantly less power than the electronic NoC. For example, for a 64-core MPSoC and 64-byte packets, FONoC consumes only 0.71nJ/packet, while the electronic NoC consumes 5.5nJ/packet. That is an 87% power saving. The results show that the power saving could increase to 93% while using 128-byte packets in a 1024-core MPSoC.

### 4.2. Optical power loss

We analyze and compare the optical power loss of OTAR with three other optical routers including the fully-connected crossbar, optimized crossbar, and the 4x4 optical router proposed in [10], which is referred to as COR for clarity. In our comparison, we considered two major sources of optical power losses, the waveguide crossing insertion loss and microresonator insertion loss. The waveguide crossing insertion loss is 0.12dB per crossing [17], and the microresonator insertion loss is 0.5dB [29]. In an optical router, packets transferring between different input and output ports may encounter different losses. We analyze the maximum loss, minimum loss, and average loss of all possible cases (Figure 6). The results show that OTAR is the best in all comparisons. OTAR has 4% less minimum loss, 23% less average loss, and 19% less maximum loss than the optimized crossbar. COR has the same maximum loss as OTAR, but has higher average and minimum losses.



Figure 6. Comparision of optical power loss (dB)

The number of microresonators used by an optical router indicates the area cost. While the optimized crossbar uses fewer microresonators than the fully-connected crossbar, they have the same losses. OTAR uses six microresonators; the fully-connected crossbar uses 16; the optimized crossbar uses ten; and COR uses eight. OTAR uses the lowest number of microresonators, which is 40% less than the optimized crossbar.



Figure 7. End-to-end delay and network throughut of FONoC

#### 4.3. Network performance

We simulate the FONoC for the 64-core MPSoC and study the network performance in terms of end-to-end (ETE) delay and network throughput. The ETE delay is the average time between a packet is generated and reaches its destination. It is the sum of the connection-oriented path-setup time and the time used to transmit optical packets. We simulated a range of packet sizes used by typical MPSoC applications. We assumed a moderate bandwidth of 12.5Gbps for each interconnect. In the simulations, processors generate packets independently and at time intervals following a negative exponential distribution. We used the uniform traffic pattern, i.e. each processor sends packets to all other processors with the same probability. FONoC is simulated in a network simulator, OPNET [30].

The ETE delay under different offered loads and packet sizes is shown in Figure 7. It shows that FONoC saturates at different loads with different packet sizes. The ETE delay is very low before the saturation load, and increases dramatically after it. For 32-byte packets, ETE delay is 0.06us before the saturation load 0.2, and goes up to 110us after it. Packets larger than 32-byte have higher saturation load. This is due to the number of control packets is fewer when using larger packets under the same offered load. Larger packets also have longer transmission times and cause longer inter-packet arrival gaps compared with smaller packets under the same offered load. These both help to reduce network contention during path setup, and lead to higher saturation loads. Figure 7 also shows the network throughput under different offered load and

packet sizes. Ideally, throughput should increase with the offered load. However, when the network becomes saturated, it will not be able to accept higher offered load which beyond its capacity. The results show that the throughput keeps at a certain level after a saturation point.

#### 5. Concluions

This paper proposes FONoC including its protocols, topology, floorplan, and a low-power and low-cost optical router, OTAR. FONoC carries payload data as well as network control data on the same optical network, while using circuit switching for the former and packet switching for the latter. We analyze the power consumption, optical power loss, and network performance of FONoC. An analytical model is developed to assess the power consumption of FONoC. Based on the analytical model and SPICE simulations, we compare FONoC with a matched electronic NoC in 45nm. The results show that FONoC can save 87% power to achieve the same performance for a 64-core MPSoC. OTAR can passively route packets without powering on any microresonator in 40% of all cases. Comparing with three other optical routers, OTAR has the lowest optical power loss and uses the least number of microresonators. We simulate the FONoC for a 64-core MPSoC and show the end-to-end delay and network throughput under different offered loads and packet sizes.

## 6. Acknowledgment

This work is partially supported by HKUST PDF and RGC of the Hong Kong Special Administrative Region, China.

#### References

- [1] L. Benini, G. De Micheli, "Networks on chip: A new paradigm for systems on chip design", *Design, Automation and Test in Europe Conference and Exhibition*, 2002.
- [2] M. Sgroi, M. Sheets, A. Mihal, K. Keutzer, S. Malik, J. Rabaey, A. Sangiovanni-Vincentelli, "Addressing the system-on-a-chip interconnect woes through communication-based design", *Design Automation Conference*, 2001.
- [3] V. Reyes, T. Bautista, G. Marrero, A. Núñez, W. Kruijtzer, "A multicast inter-task communication protocol for embedded multiprocessor systems," *Conference on Hardware-Software Codesign and System Synthesis*, 2005: 267-272.
- [4] W. Dally, B. Towles, "Route packets, not wires: On-chip interconnection networks", *Design Automation Conference*, 2001.
- [5] S. Kumar, A. Jantsch, J.P. Soininen, M. Forsell, M. Millberg, J. Öberg, K. Tiensyrjä, and A. Hemani, "A network on chip architecture and design methodology", *IEEE Computer Society Annual Symposium on VLSI*, 2002.
- [6] K. Goossens, J. Dielissen, A. Radulescu, "Æthereal network on chip: Concepts, architectures and implementations", *IEEE Design Test Comput*, Vol.22, No.5, pp414–421. 2005.
  [7] A. Kumar, L. S. Peh, P. Kundu, N. K. Jha, "Toward Ideal On-
- [7] A. Kumar, L. S. Peh, P. Kundu, N. K. Jha, "Toward Ideal On-Chip Communication Using Express Virtual Channels," *IEEE Micro* 28(1): 80-90, 2008.
- [8] M. Amde, T. Felicijan, A. Efthymiou, D. Edwards, and L. Lavagno, "Asynchronous on-chip networks," in IEE Proceedings: Computers and Digital Techniques, 2005, pp. 273-283
- [9] J. Xu, W. Wolf, J. Henkel, and S. Chakradhar, "A Design Methodology for Application-Specific Networks-on-Chip", ACM Transactions on Embedded Computing Systems, July 2006.

- [10] A. Shacham, B.G. Lee, A. Biberman, K. Bergman, L.P. Carloni, "Photonic NoC for DMA Communications in Chip Multiprocessors", Hot Interconnects, 2007.
- [11] M. Briere, B. Girodias, et al, "System Level Assessment of an Optical NoC in an MPSoC Platform", *Design, Automation & Test in Europe Conference & Exhibition*, 2007.
- [12] G. Chen, H. Chen, M. Haurylau, N. A. Nelson, D. H. Albonesi, P. M. Fauchet, and E. G. Friedman, "Predictions of CMOS Compatible On-Chip Optical Interconnect," *Integration, the VLSI Journal*, Vol. 40, No. 4, pp. 434 - 446, July 2007.
- VLSI Journal, Vol. 40, No. 4, pp. 434 446, July 2007.
  [13] A. Shacham, K. Bergman, L. P. Carloni, "The Case for Low-Power Photonic Networks on Chip," in *Design Automation Conference* 2007, pp. 132-13.
- [14] F. Xia, L. Sekaric, and Y. Vlasov, "Ultracompact optical buffers on a silicon chip," *Nature Photonics*, 65-71, 2007.
- [15] Q. Xu, B. Schmidt, S. Pradhan, M. Lipson, "Micrometre-scale silicon electro-optic modulator", Nature, Vol.435, No.7040, pp325-327, 2005.
- [16] B. E. Little, J. S. Foresi, G. Steinmeyer et al., "Ultra-compact Si-SiO<sub>2</sub> microring resonator optical channel dropping filters," *IEEE Photonics Technology Letters*, vol. 10, no. 4, pp. 549-551, 1998.
- [17] A. W. Poon, F. Xu, X. Luo, "Cascaded active silicon microresonator array cross-connect circuits for WDM networkson-chip", in Proc. SPIE. Vol.6898, 2008.
- [18] H. Gu, J. Xu, Z. Wang, "ODOR: a Microresonator-based Highperformance Low-cost Router for Optical Networks-on-Chip", in Proceedings of International Conference on Hardware-Software Codesign and System Synthesis, 2008.
- [19] C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, and et al., "The network architecture of the connection machine CM-5," in Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures, pp. 272–285, 1992.
- [20] H. Hossain, M. Akbar, and M. Islam, "Extended-butterfly fat tree interconnection (EFTI) architecture for network on chip," in IEEE Pacific Rim Conference on Communications, Computers and signal Processing, 2005, pp. 613-616.
- and signal Processing, 2005, pp. 613-616.

  [21] Y. L. Jeang, W. H. Huang, and W. F. Fang, "A binary tree architecture for Application Specific Network on Chip (ASNOC) design," in *IEEE Asia-Pacific Conference on Circuits and Systems*, 2004, pp. 877-880.
- [22] A. Adriahantenaina, H. Charlery, A. Greiner, L. Mortiez, and C. A. Zeferino, "SPIN: a scalable, packet switched, on-chip micronetwork," in *Design, Automation and Test in Europe Conference and Exhibition(DATE)*, 2003, pp. 70-73.
- [23] P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, "Performance evaluation and design trade-offs for network-onchip interconnect architectures," *IEEE Transactions on Computers*, vol. 54, pp. 1025-1040, Aug 2005.
- Computers, vol. 54, pp. 1025-1040, Aug 2005.

  [24] V. Strumpen and A. Krishnamurthy, "A collision model for randomized routing in fat-tree networks," Journal of Parallel and Distributed Computing, vol. 65, pp. 1007-1021, 2005.
- [25] J. Kim, C. Nicopoulos, D. Park, R. Das, Yuan Xie, N. Vijaykrishnan, C. Das. "A Novel Dimensionally-Decomposed Router for On-Chip Communication in 3D Architectures." Proceedings of the Annual International Symposium on Computer Architecture (ISCA), pp. 138-149, June 2007.
- [26] www.si2.org
- [27] Q. Xu, S. Manipatruni, B. Schmidt, J. Shakya, M. Lipson, "12.5 Gbit/s carrier injection-based silicon microring silicon modulators", *Optics Express*, Vol.15, No.2, pp430–436, 2007.
- [28] C. Kromer, G. Sialm, C. Berger, T. Morf, M.L. Schmatz, F. Ellinger, et al., "A 100-mW 4×10 Gb/s transceiver in 80-nm CMOS for high-density optical interconnects," IEEE Journal of Solid-State Circuit 40 (2005) (12), pp. 2667–2679.
- [29] S. Xiao, M. H. Khan, H. Shen, and M. Qi, "Multiple-channel silicon micro-resonator based filters for WDM applications," *Optics Express*, vol. 15, pp. 7489-7498, 2007.
- [30] ww.opnet.com