## COMS3008A:

# **Parallel Computing**

Lecture 2: Modelling Parallel Computation and Interconnection Networks

#### Hairong Bau

School of Computer Science & Applied Mathematics University of the Witwatersrand, Johannesburg

Semester 1 2025



### Contents |

- Objectives
- Modelling Parallel Computation
  - PRAM
  - LMM and MMM
- Interconnection networks
  - Introduction
  - The classification of interconnection networks
  - Evaluating the interconnect networks



- Objectives
- Modelling Parallel Computation
  - PRAM
  - LMM and MMM
- Interconnection networks
  - Introduction
  - The classification of interconnection networks
  - Evaluating the interconnect networks



# **Objectives**

- Understand the basics of three random access machine (RAM) based parallel computing models including parallel RAM (PRAM), local memory machine (LMM) and modular memory machine (MMM).
- Understand the topologies and properties of various interconnection networks used for both shared memory and distributed memory parallel computers.



- Objectives
- Modelling Parallel Computation
  - PRAM
  - LMM and MMM
- Interconnection networks
  - Introduction
  - The classification of interconnection networks
  - Evaluating the interconnect networks



- Objectives
- Modelling Parallel Computation
  - PRAM
  - LMM and MMM
- 3 Interconnection networks
  - Introduction
  - The classification of interconnection networks
  - Evaluating the interconnect networks



### **RAM**

- A computation model abstracts relevant properties of a computation from the irrelevant ones.
- Random access machine (RAM) is a sequential computation model. It consists of
  - A processing element (PE) (or processing unit (PU))
  - A memory



Figure: RAM model of computation: memory M - contains program instructions and data; processing unit P - execute instructions on data.

### **PRAM**

- A natural extension of RAM to parallel computation consists of multiple processing elements and a global memory of unbounded size that is uniformly accessible to all PEs.
- The generalization of RAM to parallel computing can be done in 3 different ways:
  - Parallel RAM (PRAM)
  - Local memory machine (LMM)
  - Modular memory machine (MMM)



#### PRAM



Figure: (a) PRAM model for parallel computation; (b) multiple PEs try to access the same memory location simultaneously.



- What happens if multiple PEs need to access the same memory location? It could be both read, both write, or one read and the other write?
- A solution could be to serialize such contending accesses; however, we then have another issue about the uncertainty of which one will happen first — uncertainty.
- Problem with PRAM: simultaneous accesses to a memory location could lead to unpredictable data in PEs, as well as in the memory location accessed.
- A number of variants of PRAM are proposed, they differ in the ways in simultaneous access, and the ways in avoiding unpredictability.



- Exclusive read exclusive write PRAM (EREW-PRAM): It does not support simultaneous access to the same memory location - any access to any memory location must be exclusive.
- Concurrent read exclusive write PRAM (CREW-PRAM): Allows simultaneous reads from the same memory location, but writing to a memory location must be exclusive.
- Concurrent read concurrent write (CRCW-PRAM): Supports simultaneous reads from the same memory location; simultaneous writes to the same memory location, and simultaneous reads and writes to the same memory location.



- CRCW-PRAM: The unpredictability is handled in different ways:
  - Consistent CRCW-PRAM: PEs may simultaneously write to the same memory location, but they need to write the same value;
  - Abstract CRCW-PRAM: PEs may simultaneously try to write to the same memory location (not necessarily the same value), but only one of them will succeed, and it is unpredictable which one will succeed.
  - Priority CRCW-PRAM: There is a priority order imposed on PEs.
  - Fusion CRCW-PRAM: PEs may simultaneously try to write to the same memory location, but it is assumed that a particular operation is first performed on fly, and only the result of such operation will be written. Such operation should be associative and commutative, which includes sum, product, max, min, and logical AND and logical OR.



- Note that the restriction of simultaneous access is relaxed from EREW-PRAM, to CREW-PRAM, and to CRCW-PRAM.
- This leads to some power gain from EREW-PRAM to CRCW-PRAM gradually, but not much, it is only in the order of logarithm.



- Objectives
- Modelling Parallel Computation
  - PRAM
  - LMM and MMM
- Interconnection networks
  - Introduction
  - The classification of interconnection networks
  - Evaluating the interconnect networks



### LMM and MMM

- LMM: Each PE has its own local memory; accessing such memory is fast; A PE can access non-local memory via interconnect network
- MMM: No local memory to PEs;



Figure: (a) LMM; (b) MMM.



- Objectives
- Modelling Parallel Computation
  - PRAM
  - LMM and MMM
- Interconnection networks
  - Introduction
  - The classification of interconnection networks
  - Evaluating the interconnect networks



- Objectives
- Modelling Parallel Computation
  - PRAM
  - LMM and MMM
- 3 Interconnection networks
  - Introduction
  - The classification of interconnection networks
  - Evaluating the interconnect networks



### Interconnection networks

- An important factor for the efficiency and scalability of parallel computers or programs: interconnection networks.
- Provide mechanisms for data communication between processing nodes or between processors and memory modules.



Figure: (a) Fully connected network; (b) A fully connected crossbar network

# Important factors of interconnection networks

- Routing: the process for choosing a path in an interconnection network traffic;
- Flow control: the process of managing the rate of data transmission between two nodes to avoid scenario where a fast sender could overwhelm a slow receiver;
- Network topology: the arrangement of various elements, such as communication nodes and channels, of an interconnection network.



# Some basic properties of interconnection networks

- An interconnection network can be represented as a graph G(V, E), where V is the set of nodes, and E is the set of links (or edges).
- Topological properties:
  - Node degree: the number of edges connecting a node
    - An interconnection network is regular if all nodes have the same node degree
  - Diameter
    - The number of communication nodes traversed by a packet from a source to the destination (a path) is called hop count
    - If there are multiple paths between a pair of source and destination nodes, the path with the **shortest** hop count gives the minimum hop count, denoted by I
    - Average distance,  $l_{avg}$ : the average of all ls taken over all possible pairs of source and destination nodes.
    - Diameter ( $I_{max}$ ): The maximum of all the minimum hop counts taken over all pairs of source and destination nodes.

# Some basic properties of interconnection networks cont.

- Topological properties:
  - Path diversity: Multiple paths between a pair of communication nodes
  - Scalability: i) the capability of a network that handles growing amount of workload; ii) the potential of a network to be enlarged to accommodate growing amount of work.



# Some basic properties of interconnection networks cont.

### Performance properties:

- Bisection width: the minimum number of communication links that must be removed to partition the network into two equal parts (or almost equal parts)
- Channel bandwidth: the peak rate at which data can be communicated over a communication link (channel), e.g., if the transfer time of a word is  $t_w$ , then the bandwidth is  $1/t_w$ .
- Bisection bandwidth: the minimum volume of data communication allowed between any two halves of the network. It is the product of bisection width and channel bandwidth.
- Cost: One way of defining the cost of a network is in terms of the number of communication links required by the network.



- Objectives
- Modelling Parallel Computation
  - PRAM
  - LMM and MMM
- Interconnection networks
  - Introduction
  - The classification of interconnection networks
  - Evaluating the interconnect networks



### The classification of interconnection networks

- Direct networks (static): Each node is directly connected to its neighbours. It has point-to-point communication links (network interface) between nodes.
  - Fully connected network: If the number of nodes is n, then such a network has  $\frac{1}{2}n(n-1)$  connections.
- Indirect networks (dynamic): Connect nodes and memory modules via switches and communication links. A cross point is a switch that can be opened or closed. Uses switches to establish paths among nodes.
  - Fully connected crossbar switch: On one end is nodes, and the
    other end memory modules. The fully connected crossbar has too
    large complexity to be used for connecting large numbers of input
    and output ports. For example, if we have 1000 nodes and 1000
    memory modules, then we need one million switches to build the
    fully connected crossbar switches.

### The classification of interconnection networks cont.



Figure: (a) Fully connected network; (b) A fully connected crossbar network.



- Bus. Used in both LMMs and MMMs. At one time, only one process is allowed to use the bus for communication.
  - Advantages: simple to build; buses are ideal for broadcasting data among nodes.
  - Disadvantages: unscalable in terms of performance (but scalable in terms of cost)



Figure: Bus topology.



 Linear array. Used in LMMs. Every node (except the two nodes at the ends) is connected to two neighbours, see Figure 7. Simple. If a node index is i, its two neighbours can be found using indices (i+1) mod n and (i-1) mod n, where n is the total number of nodes in the linear array.



Figure: (a) Linear array without wraparound link; (b) linear array with wraparound link (also called ring, see next slide).



• Ring. Used in LMMs. Every node is connected to two neighbours, see Figure 8. Simple. If a node index is i, its two neighbours can be found using indices  $(i + 1) \mod n$  and  $(i - 1) \mod n$ , where n is the total number of nodes in the ring.



Figure: Ring topology. Each node represents a processing element with local memory.



- 2D mesh. Can be used in LMMs. Each node is connected to a switch. The number of switches can be determined by the lengths of the two sides. Every switch, except those along the 4 borders, has 4 neighbours.
- 2D torus. Similar to 2D mesh, however, each pair of corresponding border switches is connected. Every switch has 4 neighbours.





Figure: (a) 2D mesh topology; (b) 2D torus topology. Each node represents a processing element with local memory.



- 3D mesh. Similar to 2D mesh, however, in 3 dimension. Every switch except the border ones, has 6 neighbours.
- 3D torus, Every pair of opposite switches are connected in 3D mesh.
- Hypercube: An interconnection network that has  $n = 2^d$  nodes, where d is the number of dimensions. Each node has a distinct label consisting of d binary bits. For example, d = 3, then n = 8. Two nodes are connected via a link if and only if their labels differ in only one bit location. Used in LMMs.





Figure: (a) 3D mesh, (b) Hypercube topology Each node represents a processing element with local memory.





Figure: Hypercubes of 1D, 2D, 3D, and 4D.



 Multistage network: used in MMM, where input switches are connected to PEs, and output switches are connected to memory modules.



Figure: Multistage network topology. A 4-stage interconnection network capable of connecting 16 PEs to 16 memory modules. Each switch can establish a connection between a pair of input and output channels.



 Fat tree: used in constructing LMM, where PEs with their local memories are attached to the leaves.



Figure: Fat tree topology. A fattree interconnect network of 16 processing nodes. Each switch can establish a connection between arbitrary pair of (leaf) nodes. Edges closer to the root are thicker. The idea is to increase the number of communication links and switching nodes closer to the root.



- Objectives
- Modelling Parallel Computation
  - PRAM
  - LMM and MMM
- Interconnection networks
  - Introduction
  - The classification of interconnection networks
  - Evaluating the interconnect networks



# Evaluating the interconnect networks

- Diameter
- Bisection width
- Cost

Table: Quantitative characteristics of various interconnect networks

|         | Network                                                                     | Diameter                                                                                                                   | bisection width                            | Cost                                                 |
|---------|-----------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|--------------------------------------------|------------------------------------------------------|
| Static  | Fully connected<br>Linear array<br>2D mesh<br>Ring<br>2D torus<br>Hypercube | $ \begin{array}{l} 1 \\ p-1 \\ 2(\sqrt{p}-1) \\ \lfloor p/2 \rfloor \\ 2\lfloor \sqrt{p}/2 \rfloor \\ \log p \end{array} $ | $p^{2}/4$ 1 $\sqrt{p}$ 2 $2\sqrt{p}$ $p/2$ | $p(p-1)/2$ p-1 $2(p-\sqrt{p})$ p $2p$ $(p \log p)/2$ |
| Dynamic | Crossbar<br>Fat tree                                                        | 1<br>2 log <i>p</i>                                                                                                        | p<br>the # of links                        | $p^2$ the # of links or switches                     |

<sup>\*</sup>With p nodes in the above networks



## Summary & References

- Summary
  - The extension of RAM to parallel computing
  - Interconnection networks: properties and topologies
- Bibliography:
  - Introduction to Parallel Computing: From Algorithms to Programming on State-of-the-Art Platforms, by Roman Trobec, Boštjan Slivnik, Patricio Bulić, Borut Robič, Springer, 2018,
  - Introduction to Parallel Computing, second edition, by Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar. Addison Wesley Publisher, 2003.

