**MT-1/Fall-2020/CSE332 (Sec-4&5)/Time:1:30 Hours/Marks= 50**

Answer All questions: 5 x 10 = 50

|  |  |  |
| --- | --- | --- |
| 1. | Suppose that you currently have a computer system with the following two characteristics:  There are only three components that determine the overall performance — CPU, memory, and disk.  For any given computation, the system spends 30% of the time for CPU, 40% of the time for memory, and 30% of the time for disk. Now suppose that you want to purchase a new system that improves the performance in the following ways:  The new system spends 1/4 of the memory time compared to your current system.  The new system spends 1/5 of the disk time compared to your current system.  Everything else is the same.  Using Amdahl’s Law, calculate the overall speedup of the new system over your current system. | |
|  |  | |
| 2. | A program contains 33% of load and store instructions. Each Load/Store requires to read data from memory. How many memory assesses are required for each instruction? Assume that cycles per Instruction execution and cycles to access main memory are 1 and 100 respectively. Find the average CPI for this program.  If a cache memory is used with the system having miss rate of 2.0% and cycles to access cache is 2. Find the average CPI for the same program and compare the results. | |
|  | For each instruction, CPU has to access memory to read (fetch) machine code of each instruction. Additionally CPU has to access memory again to read/write data for each load/store instruction.  So, memory access/instruction = (0.33 x 1 + 0.67 x 1) + 0.33 x 1 = 1.33  (Fetch Instructions) (data read/write for load/store)  Average CPI for the program : cycles for memory access (fetch plus data read/write) + cycles for execution  1.33 x 100 + 1 = 134  With cache memory:  Hit rate = 98%  Miss rate = 2%  Clock cycles required to read instruction and data from cache = 1.33 x 0.98 x 2 = 2.6  Clock cycles required to read instructions and data from memory in case of cache miss = 1.33 x 0.2 x 102 = 27.132  Clock cycles required to execute instruction = 1  Average CPI = 2.60 + 27.132 + 1 = 30.73  Average CPI without cache/average CPI with cache = 134/30.73 = 4.36 | |
| 3. | Consider the following sequence of instructions, where the syntax consists of an opcode followed by the destination register/memory followed by one or two source registers/memory locations:  Inst-1: ADD R3, R1, R2  Inst-2: SUB R6, R2, R3  Inst-3: AND M1, R5, 3  Inst-4: ADD R1, R6, M1  Inst-5: JMP NSU1  Inst-6: NSU2: OR R2, R4, R7  Inst-7: SUB R5, R3, R4  Inst-8: ADD R0, R1, R10  Inst-9: NSU1: LOAD R6, M2  Inst-10: SUB R2, R1, R6  Inst-11: JMP NSU2  Assume the use of a four-stage pipeline: fetch, decode, execute, and write back. Assume that all pipeline stages take one clock cycle. If a simple scalar pipeline allows only in-order execution, show the pipeline stages considering all types of hazards and one particular recovering technique as well. | |
|  |  | |
| 4. | Assume an add takes 1 cycle, a div 4 cycles, a load 6 cycles and a sub 2 cycles. Two different compilers produce the following loops for the same code each running for 1000 times. Calculate average CPIs and run times for the same program running on a 300MHz CPU with different compilers? Also calculate MIPS ratings. | |
|  | div add load  sub  Compiler-A | add add div sub  load add  Compiler-B |
|  |  | |
| 5. | We wish to consider the performance of two different machines: M1 and M2. The clock frequencies for machines M1 and M2 are 800 MHz and 1000 MHz respectively.  A program was run on both machines and the following measurements were made:  Time on M1 Time on M2  2.5 seconds 2 seconds  In addition, the following additional measurements were made:  No. of Instructions No. of Instructions  Executed on M1 Executed on M2  100x106 125x106  Finally, the frequency that instructions occur in the program for M1 and M2 are shown below  Instruction M1% M2%  ADD 40 60  MULT 10 8  CMP 20 12  SUB 30 20   1. Find the clock cycles per instruction (CPI or average CPI) for Program on both machine 2. How much faster will the program run on M1 and M2 respectively if we 3. reduce the execution time of the ADD instruction by 20%, assuming that an ADD instruction requires 5 cycles on both machines 4. reduce the execution time of the MULT instruction by 20%, assuming a MULT instructions requires 20 cycles on M1 and 25 cycles on M2 5. Which is better for M1 and which for M2? | |
|  |  | |
|  |  | |