**(8)** 

**Data Provided: None** 



## DEPARTMENT OF ELECTRONIC AND ELECTRICAL ENGINEERING

Autumn Semester 2011-2012 (2 hours)

## **EEE6031 Advanced Computer Architectures 6**

Answer THREE questions. No marks will be awarded for solutions to a fourth question. Solutions will be considered in the order that they are presented in the answer book. Trial answers will be ignored if they are clearly crossed out. The numbers given after each section of a question indicate the relative weighting of that section.

**1.** A pipelined system is as shown in Figure 1:



**Figure 1: Pipelined System** 

and it is used to execute the sequence of operations:

Out = D(B(C(B(A(B(In))))))

- **a.** Draw the reservation table for the pipeline.
- **b.** Calculate the data throughput and utilisation of each processing block (assuming a greedy strategy). (2)
- c. Show that the Collision Vector for the operation is 01010. (2)
- **d.** Using the Collision Vector, draw the State Table and, hence, show that there is another way to load data other than the greedy strategy. (6)
- e. Is this alternative method better or worse than the greedy strategy? Give a reason to support your answer. (2)

**(2)** 

2. **i**) Draw two schematics to show two possible ways of making a cross-point а. switch. **(4)** ii) Of these two approaches to making a cross point switch, identify which is the most flexible (in terms of expansion, for example), why this is, and what its disadvantages are. **(4)** A set of 9 processors must be connected to 4 disjoint memory blocks and it is b. decided that a Banyan network will be used. i) What must the parameters of the Banyan network be (s, f, and l)**(3)** ii) Draw a schematic for this Banyan network. **(6)** iii) Calculate the efficiency of this network (in hardware terms) compared to a cross-point switch doing the same job. **(3) (5)** 3. i) Draw a schematic diagram for a 4-way set associative cache. 9 ii) What would the approach be and the costs (in memory terms) of implementing an Least-Recently-Used (LRU) memory policy on a 4-way set associative cache. **(3)** b. Why do we use a memory hierarchy in practical systems? **(2)** A particular processing system with an on-chip clock frequency of 2GHz utilises c. a single write-back cache level (on-chip) with the main memory implemented using Synchronous Dynamic RAM (you can assume that the time taken to read or write a cache line to the SDRAM is 32ns). Empirical studies identify that, on average: the probability of a hit is 0.96 for a read and 0.98 for a write; the proportion of memory accesses that are reads is 0.7; the proportion of dirty blocks is 0.4; the access time to the cache (on hit) is 1 clock cycle for a read and 2 clock cycles for a write. all accesses to the SDRAM are hits. i) Estimate the average access time achieved by the processor. **(8)** ii) How does this access time compare against the best possible access time

that could be achieved

**4. a.** Describe the organisation of a square-connected Single-Instruction Multiple-Data (SIMD) processor.

**(4)** 

**b.** A SIMD processor uses a corner-turn buffer (CTB) ,connected to the West side of the array, to allow data to be loaded into the array. The CTB is byte-wide with the Least-Significant-Bit of the data at the tail-end of the serial registers making up the CTB.

Assuming that the SIMD array has 256 columns of Processing Elements (PE) and supports the following instructions:

WAITCTB; wait until the CTB is full and ready to transfer data

SHIFTCTB; shift the data, serially, by one place in the CTB

BCAST x; broadcast bit address x to N,S,E, and W

*INy* x; input from direction y (N,S,E, or W) to bit address x

i) write a short program that will shift a column of data from the CTB into the array of PEs so that the LSB of the data is in bit address 0 of a PE's memory and the MSB of the data is in bit address 7 of the PE's memory.

**(6)** 

ii) How would this short program be repeated to load a full array (256 columns of data) into the array of PEs? State any assumptions that you make.

**(6)** 

**(4)** 

**c.** What is the disadvantage of connecting a CTB to the inter-PE connections (as in part **b.** above)? Is there a better solution?

**NLS** 

EEE6031 3 END OF PAPER