|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  |  | | | | | | | | | | | | |  |
|  | **Model Question Paper** | | | | | | | | | | | | |  |
| Programme | | | : | **B.Tech.** | Semester |  | : | **-** | | | |
| Course Title | | | : | **Parallel and Distributed Computing** | Code |  | : | **CSE4001** | | | |
| Slot |  | : |  | | | |
| Faculty | | | : | **-** | Class Nbr |  | : |  | | | |
| Time | | | : | **-** | Max. Marks |  | : |  | | | |
|  | **Answer ALL the questions** | | | | | | | | | | |
| **Q.No.** | | **Question Description** | | | | | | | |  |
| 1. | | Consider you are developing an application for VITCC where the execution of floating-point instructions on a particular processor X consumes 70% of the total runtime. Let us assume that 20% of the floating-point time is consumed i cube root calculations n.   1. Based on the research finding, the design team of the next-generation processor Y believes that it could either improve the performance of all floating-point instructions by a factor of 1.8 or alternatively speed up the cube root operation by a factor of 8. From which design alternative would the abovementioned application profit the most? Comment. (5M)   **Ans. Case 1: 1.45 Case2: (1x speed at max could be achieved)**  As an alternative, the developers of the application decide to parallelize the source code. What speedup can be achieved on a 6-CPU system, if 90% of the code can be perfectly parallelized? What fraction of the code has to be parallelized to get a speedup of 10? Justify your answer. (5M)  **Ans: Case1:4.0 Case 2: Impossible to achieve 10x speed -- > increase the processor count** | | | | | | |  | | | |
| 2. | | Using OpenMP constructs, implement a parallel program to generate the following series    Evaluate the following    Find out:    **Ans :10 Marks : Thread creation, OMP construct usage, loop level parallelism, logic, printing output** | | | | | | |
|  | | Consider you are the student coordinator of the Competitive Coding Club (CCC) at VIT Chennai. The club students want to celebrate the club anniversary. Your club has decided the cake will have one candle for each year of their total age. You will only be able to chow down the tallest of the candles. Write an OpenMP program to find out how many candles are tallest and one series that computes the sum of the squares of the N odd numbers and the other series that computes the sum of the cubes of the N odd numbers.  Example:  Candles=[1,2,3,1,4,5,2,3,1,4,1,8,8,8,6]  The maximum tallest candles are 8 units high. There are 3 of them, so return 3.  **Ans: 10 Marks : Thread creation, OMP construct usage, loop level parallelism, logic, printing output** | | | | | | |
| 3. | | I. During your internship, you are assigned a task to study the performance of the processor. Assume a pipeline having 5 phases with duration 20, 10, 50, 30 and 70 ns. Given latch delay is 10 ns. As a computer architect, calculate the following. (5M)   1. Pipeline cycle time (80ns) 2. Non-pipeline execution time (180ns) 3. Speed up ratio (180/80=2.25ns) 4. Pipeline time for 1000 tasks (80320ns) 5. Sequential time for 1000 tasks (180000ns)   II.  **Instruction Set A**  MUL R1, R2, R3  ADD R4, R5, R6  **Instruction Set B**  MUL R1, R2, R3  ADD R1, R2, R3  Discuss how instruction pipeline architecture executes the above instruction sets (A) and (B) and compare them with single CPU sequential instruction execution for the same. List out the possible hazards involved in instruction pipelining in general and discuss the hazards in detail if there are any, for the above instruction sets. (5M)  **Ans : Instruction Set A – No pipeline hazard B-> Data hazards , Justification** | | | | | | |
|  | |  | | | | | | |
|  | | I. During your internship, you are assigned a task to study the performance of a chip design company. Assume a pipeline having 4 phases with duration 15, 55, 35 and 40 ns. Given latch delay is 10 ns. As a computer architect, calculate the following. (5M)   1. Pipeline cycle time (65ns) 2. Non-pipeline execution time(145ns) 3. Speed up ratio (2.230ns) 4. Pipeline time for 10000 tasks (650195ns) 5. Sequential time for 10000 tasks (1450000ns)   II. S1 is a vector that takes 1 clock cycle for one arithmetic operation like add, multiply, etc. In S1, the execution unit has a vector depth of 4. How many clock cycles will it take to execute the below pseudo-code in system S1? Justify your answer in detail. (5M)  *Matrices: X, Y, Z*  *Matrix Size: n\*n where n=10000*  *for ‘I’ varying from 1 to ‘n’ values do the following*  *{*  *for ‘J’ varying from 1 to ‘n’ values do the following*  *X[I][J]=Y[I][J]+Z[I][J]*  *}*  **Answer: n2/16 = 100000000/16 =62,50,000**  *Identifying 100000000 (2 mark)*  *Identifying 16 (2 mark)*  *Correct answer (1 mark)* | | | | | | |
|  | | During your internship, you are assigned a task to study the performance of a chip design company. Assume a pipeline having 6 phases with duration 51,22, 25, 28, 39 and 90 ns. Given latch delay is 10 ns. As a computer architect, calculate the following. (5M)   1. Pipeline cycle time (100ns) 2. Non-pipeline execution time (255ns) 3. Speed up ratio (2.55ns) 4. Pipeline time for 100000 tasks (10000500ns) 5. Sequential time for 100000 tasks (25500000 ns)   II. Consider a system S that has a latency of any instruction to be 1 clock cycle. S has only one vector processor with a vector depth of 2. In other words, 2 sub execution units (SE1, SE2) are available in S, each with its Arithmetic and Logic Unit. System S has TLP disabled. If the below pseudo-codes (code A and Code B) are to be executed in S, give one correct sequence of execution of the below code that takes advantage of vector processing. Provide answers in the following format. If no instruction can be executed by the sub-execution unit, then give NA in that place. (5M)  Answer Format: Instruction executed by SE1 & Instruction Executed by SE2   |  |  | | --- | --- | | Code A | Code B | | For (i=0;i<4;i++)  A[i]=b[i]+c[i] | For (i=0;i<4;i++)  D[i]=e[i]\*f[i] |   **Answer: A[0]=b[0]+c[0] & A[1]=b[1]+c[1]**  **A[2]=b[2]+c[2] & A[3]=b[3]+c[3]**  **D[0]=e[0]\*f[0] & D[1]=e[1]\*f[1]**  **D[2]=e[2]\*f[2] & D[3]=e[3]\*f[3]**  *(1 Correct sequence à 1.25 mark; min 4 correct sequenceà5marks)* | | | | | | |
|  | |  | | | | | | |