

# CS2115-2122A - exam

Computer Organization (City University of Hong Kong)



Scan to open on Studocu

### CITY UNIVERSITY OF HONG KONG

Course code & title:

CS2115 Computer Organization

Session

Semester A 2021/22

Time allowed

Two hours

This paper has 9 pages (including this cover page).

- 1. This paper consists of 5 sections.
- 2. Answer <u>ALL</u> questions in <u>ALL</u> sections.

This is a closed-book examination.

No materials or aids are allowed during the whole examination. If any unauthorized materials or aids are found on a student during the examination, the student will be subject to disciplinary action.



## Section A (20 points). 2 points for each question. Only one correct answer for each question.

1. When a program is running, the processor directly fetches instructions from \_\_\_\_.

A. data bus

C. hard diskD. main memory

B. operating system

| 2. | The in the computer is responsible for decoding the instruction.  A. control unit  B. arithmetic logic unit  C. memory controller  D. inputs/outputs controller                                                                                                                         |
|----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 3. | Suppose the timing diagrams of the input D and clock C of a D-flipflop triggered by falling edges are as follows,                                                                                                                                                                       |
|    | D                                                                                                                                                                                                                                                                                       |
|    | cthen the timing diagram of its output Q is                                                                                                                                                                                                                                             |
|    | A                                                                                                                                                                                                                                                                                       |
|    | B. —                                                                                                                                                                                                                                                                                    |
|    | c. ————————————————————————————————————                                                                                                                                                                                                                                                 |
|    | D                                                                                                                                                                                                                                                                                       |
| 4. | Which one of the following types of memory has the highest access speed?  A. Main memory  B. Level-1 cache memory  C. Level-2 cache memory  D. Level-3 cache memory                                                                                                                     |
| 5. | <ul> <li>Which of the statements about the difference between CISC and RISC is correct?</li> <li>A. RISC has higher clock frequency than CISC, so games and multimedia application on RISC processors run faster.</li> <li>B. In RISC, the operands are stored in registers.</li> </ul> |

C. CISC processors are pipelined, while RISC processors are not pipelined.D. RISC has complex and variable-length instructions, while CISC uses simple

E. Instructions of RISC can take several clock cycles, while CISC has single-cycle

instructions with fixed format.

instructions.

|   | 6.        | In the microcomputer system, the I/O devices are connected to the system bus of the motherboard through the  A. device controller  B. USB interface  C. DMA controller  D. network interface                                                                                                                                    |
|---|-----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|   | 7.        | is used to fill the speed gap between processor and main memory.  A. Register  B. Cache  C. Flash memory  D. Stack                                                                                                                                                                                                              |
|   | 8.        | An interrupt that can be temporarily ignored is  A. vectored interrupt  B. non-maskable interrupt  C. maskable interrupt  D. high priority interrupt                                                                                                                                                                            |
|   | 9.        | In DMA transfers, the required signals and addresses are issued by the  A. processor  B. device drivers  C. DMA controllers  D. main memory                                                                                                                                                                                     |
|   | 10        | <ul> <li>SIMD represents an organization that</li> <li>A. is capable of processing several programs at the same time</li> <li>B. contains a control unit, a processor unit and a memory unit</li> <li>C. includes many processing units under the supervision of a common control unit</li> <li>D. none of the above</li> </ul> |
|   | <u>Se</u> | ection B (18 points). 2 points for each question.                                                                                                                                                                                                                                                                               |
|   | 1.        | $(1001\ 0010)_2 = (\phantom{00000000000000000000000000000000000$                                                                                                                                                                                                                                                                |
|   | 2.        | $(D2)_{16} + (3A)_{16} = (\underline{})_{16}$                                                                                                                                                                                                                                                                                   |
| · | 3.        | $(EAD2)_{16} = (\underline{\hspace{1cm}})_7$                                                                                                                                                                                                                                                                                    |
|   | 4.        | The 8-bit 2's complement representation of the result of $(-6)_8*(5)_{10}$ is                                                                                                                                                                                                                                                   |
|   | 5.        | Suppose a 32-bit floating-point number has one sign bit, 8 biased exponent bits and 23 significand bits, the floating-point representation of decimal number -3.375 is                                                                                                                                                          |
|   | 6.        | With the same floating-point number format as the last question,  1 10000010 0111110000000000000000 is the floating-point representation of decimal number () <sub>10</sub> .                                                                                                                                                   |

#### Section C (30 points).

- 1. **(6 points)** Suppose two CPU A and B have exactly the same circuits, but different clock frequencies. Answer the following questions:
  - a) If the clock frequency of CPU A is 8MHz, how long is one clock cycle of CPU A?
  - b) If the CPU A can execute 0.4 million instructions per second on average, how long is the average time to execute one instruction?
  - c) What is the average number of clock cycle to execute one instruction for CPU A?
  - d) The frequency of CPU B is 12MHz. What is the average number of instructions that can be executed per second?
- 2. (4 points) Given the following truth table, please write down the logic expression of the two outputs D and E in the sum-of-products form (no simplification is needed).

|   | Inputs | Out | puts |   |
|---|--------|-----|------|---|
| Α | В      | С   | D    | E |
| 0 | 0      | 0   | 0    | 0 |
| 0 | 0      | 1   | 0    | 1 |
| 0 | 1      | 0   | 0    | 1 |
| 0 | 1      | 1   | 1    | 1 |
| 1 | 0      | 0   | 0    | 1 |
| 1 | 0      | 1   | 1    | 1 |
| 1 | 1      | 0   | 1    | 1 |
| 1 | 1      | 1   | 1    | 0 |

3. (6 points) Please draw the truth table of the following circuit, where A and B are inputs; C, D and E are outputs.



is XOR gate

- 4. **(4 points)** Assume that a program executes on a multi-core computer. 60% of its workload is executed in parallel on 20 cores, 20% of its workload is executed in parallel on 10 cores, and the remaining workload is executed one core. What is the speedup comparing with executing the entire program on one core? Write down the procedure of your calculation.
- 5. **(6 points)** The following figure shows the circuits of a 3-bit carry look ahead adder. The inputs are two 3-bit numbers, A<sub>2</sub>A<sub>1</sub>A<sub>0</sub> and B<sub>2</sub>B<sub>1</sub>B<sub>0</sub>, as well as a carry-in bit C<sub>0</sub>. If A<sub>2</sub>A<sub>1</sub>A<sub>0</sub>=101, B<sub>2</sub>B<sub>1</sub>B<sub>0</sub>=011 and C<sub>0</sub>=1, then what are the values of G<sub>0</sub>, G<sub>1</sub>, G<sub>2</sub>, P<sub>0</sub>, P<sub>1</sub>, P<sub>2</sub>, C<sub>3</sub>, C<sub>2</sub>, C<sub>1</sub>, S<sub>2</sub>, S<sub>1</sub>, S<sub>0</sub>?



6. **(4 points)** Suppose the processor has a direct-mapped cache, and the physical addresses of its memory is 28-bit long. The memory is byte-addressable (i.e., one byte is stored in each memory address). The direct-mapped cache has 8 cache lines, and the size of each cache line is 64 bytes. What are the number of bits for the tag, index and offset, respectively?

#### Section D (10 points).

1. (10 points) Write an assembly language program using the following listed instructions to sort four numbers x, y and z in decreasing order, where x, y and z are input by users. Please print each value of the sorted sequence in decreasing order (each number is print in a separate line). For example, if the inputs are

3 4 2

then the output should be

4

3

2

Instructions that may be used:

| Name    | Tions that may be u      | Functionality                                                                                                                                                     |
|---------|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| li      | li \$r1, x               | load immediate number x to one register (\$r1)                                                                                                                    |
| add     | add \$r1, \$r2, \$r3     | add two register numbers (\$r2 and \$r3) and store the result in one register (\$r1)                                                                              |
| sub     | sub \$r1, \$r2, \$r3     | sub two register numbers (\$r2 and \$r3) and store the result in one register (\$r1)                                                                              |
| move    | move \$r1, \$r2          | move the value of one register (\$r2) to another register (\$r1)                                                                                                  |
| beq     | beq \$r1, \$r2,<br>Label | if the first register number (\$r1) equals the second register number (\$r2), then jump to Label                                                                  |
| blez    | blez \$r1, Label         | if the register number (\$r1) is less than or equal to zero, then jump to Label                                                                                   |
| bnez    | bnez \$r1, Label         | if the register number (\$r1) is not equal to zero, then jump to Label                                                                                            |
| bgtz    | Bgtz \$r1, Label         | if the register number (\$r1) is greater than zero, then jump to Label                                                                                            |
| jal     | jal<br>ProcedureAddr     | jump to the starting address ProcedureAddr of the procedure, at the same time, store the return address into \$ra, and puts the parameter values in \$a0-\$a3     |
| j       | j Label                  | directly jump to the given label without needing to satisfy any condition                                                                                         |
| jr      | jr \$ra                  | the jump register instruction, jump back to the caller when the procedure finishes, at the same time, the return values are stored in \$v0-\$v1                   |
| syscall | syscall                  | the system call code is 1 for printing an integer and is 11 for printing a character. For both cases, the argument (i.e., data to print) should be stored in \$a0 |
| halt    | halt                     | all registers (including PC, the 8 general purpose registers and all other registers) in the processor are disabled (so the processor halts)                      |

#### Section E (22 points).

Given a 32-bit single-cycle processor with 32 registers and separated instruction and data memory. The processor structure is shown in the following picture:



The processor has three types of instructions (R-type, I-type and J-type), the format of which are shown in the following table:

Table 1. The formats for three types (R-type, I-type and J-type) of instructions.

| Туре   |        |       | Bit fi | eld            | <u> </u>  |       | Remarks                                                                         |
|--------|--------|-------|--------|----------------|-----------|-------|---------------------------------------------------------------------------------|
|        | 31-26  | 25-21 | 20-16  | 15-11 10-6 5-0 |           | 5-0   | All instructions are 32-bit                                                     |
| R-type | Opcode | rs    | rt     | rd shamt funct |           | funct | Register type. For arithmetic instructions with register number operands.       |
| l-type | Opcode | rs    | rt     | ir             | immediate |       | Immediate type. For data<br>transfer and conditional<br>branching instructions. |
| J-type | Opcode |       | ir     | nmediat        | e         |       | Jump type. For direct<br>Jump instruction.                                      |

The processor supports the instructions in the following table.

Table 2. The illustration of 5 instructions for the 32-bit single-cycle MIPS processor.

| Table 2. The mustiation of 5 mistractions for the 52-bit single-cycle with 5 p |                                  |                      |                                                     |  |  |  |  |  |
|--------------------------------------------------------------------------------|----------------------------------|----------------------|-----------------------------------------------------|--|--|--|--|--|
| Type                                                                           | Instruction Format Functionality |                      |                                                     |  |  |  |  |  |
|                                                                                | add                              | add \$rd, \$rs,      | add two register numbers (\$rs and \$rt) and store  |  |  |  |  |  |
| R-type                                                                         | auu                              | \$rt                 | the result in one register (\$rd)                   |  |  |  |  |  |
| K-type                                                                         | sub                              | sub \$rd, \$rs, \$rt | sub two register numbers (\$rs and \$rt) and store  |  |  |  |  |  |
|                                                                                | 300                              | 3ub 31u, 313, 31t    | the result in one register (\$rd)                   |  |  |  |  |  |
|                                                                                |                                  |                      | load a word from memory to register \$rt, the       |  |  |  |  |  |
|                                                                                | lw                               | lw \$rt, xxx(\$rs)   | address of memory content to be loaded is           |  |  |  |  |  |
|                                                                                |                                  |                      | computed by xxx+\$rs, where xxx is an immediate     |  |  |  |  |  |
| Ltuno                                                                          |                                  |                      | number.                                             |  |  |  |  |  |
| I-type                                                                         | sw                               |                      | store a word from register \$rt to memory, the      |  |  |  |  |  |
|                                                                                |                                  | sw \$rt, xxx(\$rs)   | memory address of data to be stored is computed     |  |  |  |  |  |
|                                                                                |                                  |                      | by xxx+\$rs, where xxx is an immediate number.      |  |  |  |  |  |
|                                                                                | beq                              | beq \$rs, \$rt, x    | branch instruction, if \$rs=\$rt, jump to address x |  |  |  |  |  |
| Ltype                                                                          | iumn                             | iumn yyy             | jump instruction, directly jump to address xxx      |  |  |  |  |  |
| J-type                                                                         | jump                             | jump xxx             | without needing to satisfy any condition            |  |  |  |  |  |

1. (12 points) Please complete the following table 2 with the corresponding control signals of the different type of instructions. The content with X indicates "don't care". Please draw the output part of this table on your answer sheet and fill the blanks (you don't need to write the output values that are already given to you in the table, but **only** need to answer those are blank).

Table 3. The control function of the processor.

|                 | Oncodo        |   |    | l-type |     |        |  |
|-----------------|---------------|---|----|--------|-----|--------|--|
| Input or output | Opcode R-type |   | lw | sw     | beq | J-type |  |
|                 | Op5           | 0 | 1  | 1      | 0   | 0      |  |
|                 | Op4           | 0 | 0  | 0      | 0   | 0      |  |
| Innut           | Op3           | 0 | 0  | 1      | 0   | 0      |  |
| Input           | Op2           | 0 | 0  | 0      | 1   | 0      |  |
|                 | Op1           | 0 | 1  | 1      | 0   | 1      |  |
|                 | Op0           | 0 | 1  | 1      | 0   | 0      |  |
|                 | Jump          | 0 | 0  | 0      |     |        |  |
|                 | Branch        |   | 0  | 0      |     | Х      |  |
|                 | ALUSrc        | 0 |    | 1      |     | Х      |  |
|                 | ALUOp1        | 1 | 0  | 0      | 0   | Х      |  |
| Output          | ALUOp0        | 0 | 0  | 0      | 1   | X      |  |
| Output          | RegDst        | 1 |    | X      | X   | Х      |  |
|                 | RegWrite      | 1 |    | 0      |     | Х      |  |
|                 | MemRead       |   | 1  | 0      | 0   | Х      |  |
|                 | MemWrite      | 0 |    | 1      | 0   | X      |  |
|                 | MemtoReg      | 0 |    | X      | X   | X      |  |

2. (5 points) Suppose we will execute the following assembly program on this processor

sub \$3,\$4,\$5 lw \$1,20(\$2) sw \$9,20(\$10) beq \$11,\$12,20 jump 10

Assume the "shamt" and "funct" fields for instruction "sub" are 00000 and 000001. Please write the machine codes of the above instructions (in hexadecimal numbers).

- 3. **(5 points)** Now, suppose we change the processor to a 5-stage pipeline structure. The 5 stages are:
  - IF: instruction fetch
  - ID: instruction decode and register file read
  - EX: execution or address calculation
  - MEM: data memory access
  - WB: write data back to register

Moreover, the processor supports *forwarding* to directly send the result of the ALU to the register file and directly send the memory data to the register file within one cycle. Now we will execute the MIPS assembly program on this processor:

Iw \$5, 10(\$6) add \$7, \$8, \$9 sub \$10, \$5, \$7 Iw \$11, 10(\$10) sw \$11,10(\$12)

Please complete the following table showing how the above instructions will proceed through pipeline (draw the table on the answer sheet and fill in your answer).

| Instruction        | cycle 1 | cycle 2 | cycle 3 | cycle 4 | cycle 5 | cycle 6 | cycle 7 | cycle 8 | cycle 9 | cycle 10 |
|--------------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|----------|
| lw \$5, 10(\$6)    |         |         |         |         |         |         |         |         |         |          |
| add \$7, \$8, \$9  |         |         |         |         |         |         |         |         |         |          |
| sub \$10, \$5, \$7 |         |         |         |         |         |         |         |         |         |          |
| lw \$11, 10(\$10)  |         |         |         |         |         |         |         |         |         |          |
| sw \$11,10(\$12)   |         |         |         |         |         |         |         |         |         |          |

**Note:** Please use an **arrow** to show the forwarding between the stage that provides the data and the stage that receives the data. The following shows an example with two instructions and forwarding occurred at the 4<sup>th</sup> clock cycle:

| - | Instruction      | cycle 1 | cycle 2 | cycle 3 | cycle 4     | cycle 5 | cycle 6 | cycle 7 |
|---|------------------|---------|---------|---------|-------------|---------|---------|---------|
|   | lw \$1, 10(\$2)  | IF      | ID      | EX      | MEM         | WB      |         |         |
|   | add \$3 \$1, \$1 |         |         | IF      | ID <b>₩</b> | EX      | MEM     | WB      |