# CSE 120 Homework Assignment #1

| Name:  |  |
|--------|--|
| Email: |  |

#### **Submission Guidelines:**

- This homework is due on April 26th.
- Submit the homework as a PDF through canvas
  - o Late submissions are **not** accepted
- Please write your name and your UCSC email address
- The homework should be "readable" without too much effort
  - o If your handwriting is like mine, type it or risk not being graded
- You are allowed to collaborate with other students
- State your assumptions for any missing parameter
- For once, you are allowed to work in groups, but the goal is to prepare for midterm
- Points: 20

Two engineers solve the same problem, but each has a different solution (A and B). Using the default compiler options, the A program takes 6 seconds to run on processor P1 when compiled, and 5 seconds for the B program. This means that B is 1.2 times faster with the default options.

a) if we enable the gcc optimizations, we increase the CPI by 1.3 times and decrease the dynamic instruction count by 20% for program B (program A remains the same). What is the new speedup? *(1 point)* 

b) If we also enable the compiler optimizations for the A program, we increase CPI by 70%. How much does the instruction count need to change (increase or decrease?) so that it matches the performance of both programs (both with optimizations)? (1 point)

The following table shows how many cycles each type of instruction takes and what the percentage in a given P processor.

| Instruction Type      | Instruction Frequency | # of Cycles |  |
|-----------------------|-----------------------|-------------|--|
| Loads and Stores      | 25%                   | 1           |  |
| Arithmetic Operations | 45%                   | 2           |  |
| Other                 | 30%                   | 3           |  |

a) Given the information in the table above, calculate the CPI and IPC. (1 point)

- *b)* Now if some engineering effort is done on P to improve the arithmetic operations by 25%. Calculate the new CPI? *(1 point)*
- c) What is the maximum speedup if the loads and stores are optimized? (1 point)

#### Question 3 (2 points)

There are 4 components that can affect the performance of computer A. The I/O usage, computational operations, memory operations, and branches. A group of engineers are analyzing how modifications to computer A have affected its performance. Each supposed enhancement has affected the performance differently.

Before any optimization, I/O represents 20% of the execution time. Computational operations 30%. Branch operations and the memory operations take up the rest of the time, but the designers do not know the execution time breakdown between branches and memory operations.

The designers propose changes to the architecture that achieve an overall speedup of 2 times. The computational operations are 10% faster, the branches 130% faster, and the I/O is 3 times faster, and the memory operations are 4 times faster.

What was the new and the original percentage of the execution time dedicated to memory operations?

The table below shows a list of benchmarks run on 3 different computers, A, B, and C with different execution times.

a) Use A as a reference system and compute the score number for B and C using a methodology similar to SPECint (1 points)

| Benchmarks | Computer A<br>Execution Time (s) | Computer B<br>Execution Time (s) | Computer C<br>Execution Time (s) |
|------------|----------------------------------|----------------------------------|----------------------------------|
| perl       | 637                              | 828                              | 892                              |
| bzip2      | 417                              | 1089                             | 205                              |
| gcc        | 724                              | 1013                             | 1158                             |
| mcf        | 1045                             | 2018                             | 650                              |

b) If you tried to boost the numbers for showing B in a better light (without lying), what can you do? *(1 point)* 

Iron's Law helps us calculate the program run time and is the product of instruction per program, clock period, and cycles per instruction.

Program A is running on the processor P1 with a clock rate of 1.61 GHz. This program consists of 13 billion instructions. Program A has a floating-point multiplication instruction type that is executed 40% of the time and takes 5 cycles and the remaining instructions take 3 cycles on average to finish.

| a) | What is the execution time of program A? (1 point)                                                                                                                                                          |
|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|    |                                                                                                                                                                                                             |
| b) | With some engineering investments, the machine compiler is enhanced to execute more                                                                                                                         |
| ŕ  | instructions and allows a floating-point multiplication to take place three times as fast and rest of the instructions to finish in half the time. How long does it take program A to finish now? (1 point) |
|    |                                                                                                                                                                                                             |
|    |                                                                                                                                                                                                             |
| c) | What is the gained speedup? (1 point)                                                                                                                                                                       |

# Question 6 (4 points)

| You need to write the following expression in 4 different ISAs using the high-level assembly used in class: $d = (b+a)*(b*a)+a$ |
|---------------------------------------------------------------------------------------------------------------------------------|
| a) For a generic accumulator-based architecture (1 point)                                                                       |
|                                                                                                                                 |
|                                                                                                                                 |
| b) For a generic Register-Memory Architecture (1 point)                                                                         |
|                                                                                                                                 |
| c) For a generic Load-Store Architecture (1 point)                                                                              |
|                                                                                                                                 |
|                                                                                                                                 |
| d) For a RISC-V ISA (1 point)                                                                                                   |
|                                                                                                                                 |

a) Write the machine code for the following RISC-V programs (using an RV32G machine).

10238: ??????? beq a5,a4,10242

1023c: ???????? sd zero,128(a5)

101dc: ??????? lui s2,0xc07

b) Write the machine code for the following RISC-V programs. USE ONLY COMPRESS (RV32GC) if you can. Compress use 16bits, RV32 uses 32 bits per instruction.

101e4: ???? or ??????? addi a5,s2,-80

101e8: ???? or ??????? Id a5,8(a5)

10206: ???? or ??????? bnez a5,1020e



a)The previous datapath diagram can execute many instructions. If the CPU were to execute a "XORI R3,R5,-7" in RISC-V, what would be the values in each of the wires marked with a letter? The register file is initialized with r0=0x100,r1=0x101,r2=0x102,r3=0x103... the program counter for the instruction is 0x100. If a value is not relevant or known because of lack of instruction encoding, use X. (1 point)

| Α | В | С | D | E | F | G | н | 1 |
|---|---|---|---|---|---|---|---|---|
|   |   |   |   |   |   |   |   |   |

b) If the previous diagram were to execute a subset of RISC-V instructions. Set the control signals for the following LD, CALL, and SUB instructions. *(1 point)* 

The single-cycle datapath below is taken from some textbook (slight variation).



For the different sets of control instructions write the control values for the multiplexor D and E (either 0 or 1)? *(2 points)* The (taken) (not taken) comment in the branches indicates if the branch was taken or not taken.

| Instruction               | MUX D | MUX E |
|---------------------------|-------|-------|
| jr x30                    |       |       |
| bne X1,X0 foo (not taken) |       |       |
| jalr X0, 0xFF00           |       |       |
| ret                       |       |       |
| jal 0xAAC0                |       |       |
| beq x1, X0 bar ( taken)   |       |       |