Amit Nijjar

A11489111

CSE141 SP 2016

4/20/16

CSE141 HW 2

**Chapter 1:**

1.7

A)

CPI = Texec × f/No. Instr

CPI for compiler A= 1.1

CPI for compiler B = 1.25

B)

fA/fB = (No. Instr(A) ´ CPI(A))/(No. Instr(B) ´ CPI(B))

fA/fB = 0.73

C)

Tnew/TA = 0.6

Tnew/TB = 0.44

1.8(**Except for 1.8.3**)

1.8.1) C = P/V^2 \* clock rate

C (Pentium 4 Prescott) = 100/((1.25^2)(3.6x10^9)) = 17.7 x 10^-9

C (Core 15 Ivy Bridge) = 70/((.9^2)(3.4x10^9) = 25.4 x 10^-9

1.8.2)

Percentage = static/total x100

Ratio = static/dynamic

Pentium:

Percentage 10/100 x 100 = 10%

Ratio 10/90 = 1/9

Intel:

Percentage 30/70 x 100 = 42.86%

Ratio 30/40 = 3/4

1.13 -> 250S = 70s FP + 85 S LS + 40S Branch + 55 INT  
1.13.1) 250 - (70x.8 + 85 + 40 + 55) = 14s

1.13.2) 250x.8 = 70 + 85 + 40 + (55-x) = 50 s

1.13.3) 250x.2 < 40 = False; need to shave off 50 s to be 20% faster and branch instructions only take 40s

**Chapter 4:**

4.8(**Except for 4.8.6**)

4.8.1)

Pipelined – longest single process = 350 ps

Non-pipelined – sum all = 1250 ps

4.8.2)

Pipelined = 1750 ps

Non-pipelined = 1250 ps

4.8.3)

I would split ID making the new clock cycle time 300 ps

4.8.4) 35% utilization of data

4.8.5) 65% utilization of write-register port of the “Registers” unit

4.11.1)

lw r1,0(r1)

and r1,r1,r2

lw r1,0(r1)

lw r1,0(r1)

beq r1,r0,loop

Third execution would be as follows:

|  |  |
| --- | --- |
| LW R1,0(R1) | WB |
| LW R1,0(R1) | EX/MEM/WB |
| BEQ R1,R0,Loop | ID/---/EX/MEM/WB |
| LW R1,0(R1) | IF/---/ID/EX/MEM/WB |
| AND R1,R1,R2 | /---/---/IF/ID/---/EX/MEM/WB |
| LW R1,0(R1) | /---/---/---/IF/---/ID/EX/MEM |
| LW R1,0(R1) | /---/---/---/---/---/IF/ID/---/ |
| BEQ R1,R0,Loop | ---/---/---/---/---/---/IF/---/ |