# Disassembler Assignment

In this assignment you will be writing a disassembler. This assignment is worth 50 points.
* 5 points for submitted code and well formatted comments.
* 5 points for the disassembled instructions. You may submit a simple text file.
* 40 points for instructions: 5 points each for each of the 8 instructions.

# Disassembler

A disassembler is a program that will read the binary encoded instructions, interpret them, and present them back to the user in a human readable assembly language. You may have used several examples of a disassembler without realizing; an example is the GNU Debugger (GDB) which uses a disassembler and debugging objects or labels.

You may be curious to try one built into your linux system:

```sh
objdump -d <binary_executable> | less
```

## Instructions and formats

|Type| funct7 | rs2 | rs1 | funct3 | rd | Opcode |
| ---| ------ | --- | --- | ------ | -- | ------ |
| R  |    7   |  5  |  5  |    3   | 5  |   7    |
| I  |   12   |     |  5  |    3   | 5  |   7    |
| S  |   imm[11:5]   |  5  |  5  |    3   | imm[4:0] | 7 |
| SB |   imm[12\|10:5] | 5  | 5   |    3   | imm[4:1\|11] | 7|
|  U |  20 | | | | 5 | 7 |

Here is an example to get started. We'd like to know what the opcode is to start, then the value of rd or immediate.

In [31]:
import numpy as np

In [32]:
instructions_as_bytes = np.fromfile('risc-v_instructions.bin', dtype=np.int32)
# You might also seek to use python's file reader directly
with open('risc-v_instructions.bin', 'rb') as rv_instrs:
    binary_instructions = rv_instrs.read()
print(bin(binary_instructions[0]))

for instruction in binary_instructions:
    print(bin(instruction))

0b10000011
0b10000011
0b100011
0b101
0b0
0b10010011
0b10
0b110000
0b0
0b11
0b10100011
0b11
0b0
0b10011
0b1000011
0b11
0b10
0b100011
0b10100000
0b1100011
0b0
0b10010011
0b10000011
0b1000011
0b0
0b10010011
0b10000010
0b11110010
0b11111111
0b11100011
0b10011100
0b10
0b11111100


In [33]:
instructions_as_bytes.shape[0], len(binary_instructions)
print(instructions_as_bytes.shape[0], len(binary_instructions))
for instruction in instructions_as_bytes:
    print(str(bin(instruction)))

8 32
0b1010010001110000011
0b1100000000001010010011
0b111010001100000011
0b10000000110100001100010011
0b11000111010000000100011
0b10000111000001110010011
-0b11010111110101101101
-0b11111111010110001100011101


In [34]:
bin(instructions_as_bytes[0] & (2**7 - 1))

'0b11'

If we examine the reference sheet, we see that for a value of `0b11` the instructions must be: `lb`, `lh`, `lw`, `lbu`, or `lhu`. We will need to check the higher bits from `funct3` to be sure which specific one.

In [35]:
bin((instructions_as_bytes[0] >> 7) & (2**8 - 1))

'0b1000111'

There are a couple ways forward from here. We can either parse through the binary as a string, utilizing python's fancy string operators, or we can continue similar to how we have been, with shifting the binary, and anding for the result of the section.

For the first part, let's use python's operators.

### String Based Parsing
So let's start with defining all the risc v mappings

In [36]:
# Define instruction types
R_Type = {
    "000 0000000": "ADD",
    "000 0100000": "SUB",
    "001 0000000": "SLL",
    "010 0000000": "SLT",
    "011 0000000": "SLTU",
    "100 0000000": "XOR",
    "101 0000000": "SRL",
    "101 0100000": "SRA",
    "110 0000000": "OR",
    "111 0000000": "AND"
}

I_Type = {
    "000": "ADDI",
    "010": "SLTI",
    "011": "SLTIU",
    "100": "XORI",
    "110": "ORI",
    "111": "ANDI",
    "001 0000000": "SLLI",
    "101 0000000": "SRLI",
    "101 0100000": "SRAI"
}

Load_Type = {
    "000": "LB",
    "001": "LH",
    "010": "LW",
    "100": "LBU",
    "101": "LHU"
}

S_Type = {
    "000": "SB",
    "001": "SH",
    "010": "SW"
}

B_Type = {
    "000": "BEQ",
    "001": "BNE",
    "100": "BLT",
    "101": "BGE",
    "110": "BLTU",
    "111": "BGEU"
}

U_Type = {
    "0110111": "LUI",
    "0010111": "AUIPC"
}

J_Type = {
    "1101111": "JAL"
}


In [37]:

def sign_extend(bin_str, bits):
    """Sign-extend a binary string to a full integer value."""
    if bin_str[0] == "1":  # Negative number
        return int(bin_str, 2) - (1 << bits)
    return int(bin_str, 2)

def decode_instruction(instr):
    """Decode a single 32-bit RISC-V instruction."""

    instr = instr & 0xFFFFFFFF  # Ensure it's treated as an unsigned 32-bit value
    bin_instr = format(instr, '032b')  # Convert to 32-bit binary string

    opcode = bin_instr[-7:]  # Last 7 bits (opcode)
    rd = int(bin_instr[-12:-7], 2)  # Destination register
    funct3 = bin_instr[-15:-12]  # Function (3 bits)
    rs1 = int(bin_instr[-20:-15], 2)  # Source register 1
    rs2 = int(bin_instr[-25:-20], 2)  # Source register 2
    funct7 = bin_instr[:7]  # First 7 bits (R-type)

    # Immediate values
    imm_i = sign_extend(bin_instr[:12], 12)  # Immediate for I-type
    imm_s = sign_extend(bin_instr[:7] + bin_instr[-25:-20], 12)  # Immediate for S-type
    imm_b = sign_extend(bin_instr[0] + bin_instr[24:31] + bin_instr[1:7] + "0", 13)  # Immediate for B-type
    imm_u = int(bin_instr[:20], 2) << 12  # Immediate for U-type (upper 20 bits)
    imm_j = sign_extend(bin_instr[0] + bin_instr[12:20] + bin_instr[11] + bin_instr[1:11] + "0", 21)  # Immediate for J-type

    # R-type (Register-Register)
    if opcode == "0110011":
        operation = R_Type.get(f"{funct3} {funct7}", "UNKNOWN_OP")
        return f"{operation} x{rd}, x{rs1}, x{rs2}"

    # I-type (Immediate-based arithmetic)
    elif opcode == "0010011":
        operation = I_Type.get(funct3, "UNKNOWN_OP")
        return f"{operation} x{rd}, x{rs1}, {imm_i}"

    # I-type (Load instructions)
    elif opcode == "0000011":
        operation = Load_Type.get(funct3, "UNKNOWN_LOAD")
        return f"{operation} x{rd}, {imm_i}(x{rs1})"

    # S-type (Store)
    elif opcode == "0100011":
        operation = S_Type.get(funct3, "UNKNOWN_STORE")
        return f"{operation} x{rs2}, {imm_s}(x{rs1})"

    # B-type (Branching)
    elif opcode == "1100011":
        operation = B_Type.get(funct3, "UNKNOWN_BRANCH")
        return f"{operation} x{rs1}, x{rs2}, {imm_b}"

    # U-type (LUI, AUIPC)
    elif opcode in U_Type:
        operation = U_Type[opcode]
        return f"{operation} x{rd}, {imm_u}"

    # J-type (Jump)
    elif opcode == "1101111":
        return f"JAL x{rd}, {imm_j}"

    return f"UNKNOWN_INSTRUCTION (raw: {bin_instr})"

# Load instructions from binary file
instructions_as_bytes = np.fromfile('risc-v_instructions.bin', dtype=np.uint32)

# Disassemble each instruction
for instr in instructions_as_bytes:
    print(decode_instruction(instr))


LW x7, 0(x10)
ADDI x5, x0, 3
LW x6, 0(x7)
XORI x6, x6, 32
SW x6, 6(x7)
ADDI x7, x7, 4
ADDI x5, x5, -1
BNE x5, x0, 22780


In [38]:
def sign_extend_binary(value, bits):
    """Sign-extend a value to the specified bit-width."""
    # Mask to preserve the size of the value
    mask = (1 << bits) - 1
    value &= mask  # Ensure it's within the target size

    print(f"Original value: {value}, Bits: {bits}")

    # Check for negative values (sign bit)
    if value & (1 << (bits - 1)):
        print(f"Sign bit set, extending: {value} - {1 << bits}")

        # Manually apply the sign extension without overflow
        extended_value = value - (1 << bits)

        # Ensure that the value is within the signed bit range for the given number of bits
        if extended_value < -(1 << (bits - 1)):
            print(f"Overflow detected: {extended_value}")
            extended_value = -(1 << (bits - 1))  # Cap to the minimum value
        elif extended_value >= (1 << (bits - 1)):
            print(f"Overflow detected: {extended_value}")
            extended_value = (1 << (bits - 1)) - 1  # Cap to the maximum value

        return extended_value
    return value



def decode_instruction_bin(instr):
    """Decode a single 32-bit RISC-V instruction."""

    # Mask to ensure it's a 32-bit instruction
    instr &= 0xFFFFFFFF

    # Extract fields using binary shifting and masking
    opcode = instr & 0x7F  # Last 7 bits (opcode)
    rd = (instr >> 7) & 0x1F  # Destination register (5 bits)
    funct3 = (instr >> 12) & 0x7  # Function (3 bits)
    rs1 = (instr >> 15) & 0x1F  # Source register 1 (5 bits)
    rs2 = (instr >> 20) & 0x1F  # Source register 2 (5 bits)
    funct7 = (instr >> 25) & 0x7F  # First 7 bits (R-type)

    # Immediate values
    imm_i = sign_extend_binary((instr >> 20) & 0xFFF, 12)  # Immediate for I-type (12 bits)
    imm_s = sign_extend_binary(((instr >> 7) & 0x1F) | ((instr >> 25) & 0xFE0), 12)  # Immediate for S-type
    imm_b = sign_extend_binary(((instr >> 7) & 0x1F) | ((instr >> 8) & 0x7E0) | ((instr >> 31) & 0x800), 13)  # Immediate for B-type
    imm_u = (instr >> 12) & 0xFFFFF  # Immediate for U-type (upper 20 bits)
    imm_j = sign_extend_binary(((instr >> 21) & 0x3FF) | ((instr >> 20) & 0x7FF000) | ((instr >> 31) & 0x100000), 21)  # Immediate for J-type

    # R-type (Register-Register)
    if opcode == 0b0110011:
        key = f"{funct3:03b} {funct7:07b}"  # Correct key format
        operation = R_Type.get(key, "UNKNOWN_OP")
        return f"{operation} x{rd}, x{rs1}, x{rs2}"

    # I-type (Immediate-based arithmetic)
    elif opcode == 0b0010011:
        operation = I_Type.get(f"{funct3:03b}", "UNKNOWN_OP")
        return f"{operation} x{rd}, x{rs1}, {imm_i}"

    # I-type (Load instructions)
    elif opcode == 0b0000011:
        operation = Load_Type.get(f"{funct3:03b}", "UNKNOWN_LOAD")
        return f"{operation} x{rd}, {imm_i}(x{rs1})"

    # S-type (Store)
    elif opcode == 0b0100011:
        operation = S_Type.get(f"{funct3:03b}", "UNKNOWN_STORE")
        return f"{operation} x{rs2}, {imm_s}(x{rs1})"

    # B-type (Branching)
    elif opcode == 0b1100011:
        operation = B_Type.get(f"{funct3:03b}", "UNKNOWN_BRANCH")
        return f"{operation} x{rs1}, x{rs2}, {imm_b}"

    # U-type (LUI, AUIPC)
    elif opcode == 0b0110111 or opcode == 0b0010111:
        operation = U_Type.get(f"{opcode:07b}", "UNKNOWN_U_TYPE")
        return f"{operation} x{rd}, {imm_u}"

    # J-type (Jump)
    elif opcode == 0b1101111:
        return f"JAL x{rd}, {imm_j}"

    return f"UNKNOWN_INSTRUCTION (raw: {instr:032b})"


# Disassemble each instruction
for instr in instructions_as_bytes:
    print(decode_instruction_bin(instr))

Original value: 0, Bits: 12
Original value: 7, Bits: 12
Original value: 1319, Bits: 13
Original value: 0, Bits: 21
LW x7, 0(x10)
Original value: 3, Bits: 12
Original value: 5, Bits: 12
Original value: 5, Bits: 13
Original value: 1, Bits: 21
ADDI x5, x0, 3
Original value: 0, Bits: 12
Original value: 6, Bits: 12
Original value: 934, Bits: 13
Original value: 0, Bits: 21
LW x6, 0(x7)
Original value: 32, Bits: 12
Original value: 6, Bits: 12
Original value: 838, Bits: 13
Original value: 16, Bits: 21
XORI x6, x6, 32
Original value: 6, Bits: 12
Original value: 0, Bits: 12
Original value: 928, Bits: 13
Original value: 3, Bits: 21
SW x6, 0(x7)
Original value: 4, Bits: 12
Original value: 7, Bits: 12
Original value: 903, Bits: 13
Original value: 2, Bits: 21
ADDI x7, x7, 4
Original value: 4095, Bits: 12
Sign bit set, extending: 4095 - 4096
Overflow detected: 4294967295
Original value: 101, Bits: 12
Original value: 645, Bits: 13
Original value: 1023, Bits: 21
ADDI x5, x5, 2047
Original value: 4032, 

  extended_value = value - (1 << bits)
