# Disassembler Assignment

In this assignment you will be writing a disassembler. This assignment is worth 50 points.
* 5 points for submitted code and well formatted comments.
* 5 points for the disassembled instructions. You may submit a simple text file.
* 40 points for instructions: 5 points each for each of the 8 instructions.

# Disassembler

A disassembler is a program that will read the binary encoded instructions, interpret them, and present them back to the user in a human readable assembly language. You may have used several examples of a disassembler without realizing; an example is the GNU Debugger (GDB) which uses a disassembler and debugging objects or labels.

You may be curious to try one built into your linux system:

```sh
objdump -d <binary_executable> | less
```

## Instructions and formats

|Type| funct7 | rs2 | rs1 | funct3 | rd | Opcode |
| ---| ------ | --- | --- | ------ | -- | ------ |
| R  |    7   |  5  |  5  |    3   | 5  |   7    |
| I  |   12   |     |  5  |    3   | 5  |   7    |
| S  |   imm[11:5]   |  5  |  5  |    3   | imm[4:0] | 7 |
| SB |   imm[12\|10:5] | 5  | 5   |    3   | imm[4:1\|11] | 7|
|  U |  20 | | | | 5 | 7 |

Here is an example to get started. We'd like to know what the opcode is to start, then the value of rd or immediate.

In [23]:
import numpy as np

In [24]:
instructions_as_bytes = np.fromfile('risc-v_instructions.bin', dtype=np.int32)
# You might also seek to use python's file reader directly
with open('risc-v_instructions.bin', 'rb') as rv_instrs:
    binary_instructions = rv_instrs.read()
print(bin(binary_instructions[0]))

for instruction in binary_instructions:
    print(bin(instruction))

0b10000011
0b10000011
0b100011
0b101
0b0
0b10010011
0b10
0b110000
0b0
0b11
0b10100011
0b11
0b0
0b10011
0b1000011
0b11
0b10
0b100011
0b10100000
0b1100011
0b0
0b10010011
0b10000011
0b1000011
0b0
0b10010011
0b10000010
0b11110010
0b11111111
0b11100011
0b10011100
0b10
0b11111100


In [25]:
instructions_as_bytes.shape[0], len(binary_instructions)
print(instructions_as_bytes.shape[0], len(binary_instructions))
for instruction in instructions_as_bytes:
    print(str(bin(instruction)))

8 32
0b1010010001110000011
0b1100000000001010010011
0b111010001100000011
0b10000000110100001100010011
0b11000111010000000100011
0b10000111000001110010011
-0b11010111110101101101
-0b11111111010110001100011101


In [26]:
bin(instructions_as_bytes[0] & (2**7 - 1))

'0b11'

If we examine the reference sheet, we see that for a value of `0b11` the instructions must be: `lb`, `lh`, `lw`, `lbu`, or `lhu`. We will need to check the higher bits from `funct3` to be sure which specific one.

In [27]:
bin((instructions_as_bytes[0] >> 7) & (2**8 - 1))

'0b1000111'

There are a couple ways forward from here. We can either parse through the binary as a string, utilizing python's fancy string operators, or we can continue similar to how we have been, with shifting the binary, and anding for the result of the section.

For the first part, let's use python's operators.

### String Based Parsing
So let's start with defining all the risc v mappings

In [45]:
# Register to register instructions
R_Type = {
    "000 0000000": "ADD",
    "000 0100000": "SUB",
    "001 0000000": "SLL",
    "010 0000000": "SLT",
    "011 0000000": "SLTU",
    "100 0000000": "XOR",
    "101 0000000": "SRL",
    "101 0100000": "SRA",
    "110 0000000": "OR",
    "111 0000000": "AND"
}
# Register to Immediate instructions
I_Type = {
    "000": "ADDI",
    "010": "SLTI",
    "011": "SLTIU",
    "100": "XORI",
    "110": "ORI",
    "111": "ANDI",
    "001 0000000": "SLLI",
    "101 0000000": "SRLI",
    "101 0100000": "SRAI"
}
# load instructions (I-Types)
Load_Type = {
    "000": "LB",
    "001": "LH",
    "010": "LW",
    "100": "LBU",
    "101": "LHU"
}
# Store type
S_Type = {
    "000": "SB",
    "001": "SH",
    "010": "SW"
}
B_Type = {
    "000": "BEQ",
    "001": "BNE",
    "100": "BLT",
    "101": "BGE",
    "110": "BLTU",
    "111": "BGEU"
}



In [57]:
def sign_extend(bin_str, bits=12):
    """Sign-extend a binary string to a full 32-bit integer."""
    if bin_str[0] == "1":  # Negative number
        return int(bin_str, 2) - (1 << bits)
    return int(bin_str, 2)

def decode_instruction(instr):
    """Decode a single 32-bit RISC-V instruction."""
    bin_instr = bin(instr)[2:].zfill(32)  # Convert to binary string
    print(bin_instr)
    opcode = bin_instr[-7:]  # Last 7 bits (opcode)
    rd = bin_instr[-12:-7]  # Destination register (5 bits)
    funct3 = bin_instr[-15:-12]  # Function (3 bits)
    rs1 = bin_instr[-20:-15]  # Source register 1 (5 bits)
    rs2 = bin_instr[-25:-20]  # Source register 2 (5 bits)
    funct7 = bin_instr[:7]  # First 7 bits (R-type)

    imm_i = sign_extend(bin_instr[:12])  # Immediate for I-type
    imm_s = sign_extend(bin_instr[:7] + bin_instr[-25:-20])  # Immediate for S-type
    imm_b = sign_extend(bin_instr[0] + bin_instr[24:31] + bin_instr[1:7] + "0", 13)  # Immediate for B-type
    imm_u = int(bin_instr[:20], 2) << 12  # Immediate for U-type (upper 20 bits)
    imm_j = sign_extend(bin_instr[0] + bin_instr[12:20] + bin_instr[11] + bin_instr[1:11] + "0", 21)  # Immediate for J-type


    # R-type instructions (Register-Register)
    # if opcode == "0110011":
    #     operation = R_Type.get(f'{funct3} {funct7}', "UNKNOWN_OP")
    #     return f"{operation} x{int(rd, 2)}, x{int(rs1, 2)}, x{int(rs2, 2)}"
    #
    # # I-type instructions (Immediate-based)
    # elif opcode == "0010011":  # Arithmetic immediate operations
    #     operation = I_Type.get(funct3, "UNKNOWN_OP")
    #     return f"{operation} x{int(rd, 2)}, x{int(rs1, 2)}, {imm_i}"
    #
    # elif opcode == "0000011":  # Load instructions
    #     operation = FUNCT3_LOAD.get(funct3, "UNKNOWN_LOAD")
    #     return f"{operation} x{int(rd, 2)}, {imm_i}(x{int(rs1, 2)})"
    #
    # # S-type instructions (Store)
    # elif opcode == "0100011":
    #     operation = FUNCT3_STORE.get(funct3, "UNKNOWN_STORE")
    #     return f"{operation} x{int(rs2, 2)}, {imm_s}(x{int(rs1, 2)})"
    #
    # # B-type instructions (Branching)
    # elif opcode == "1100011":
    #     operation = BRANCH_OPS.get(funct3, "UNKNOWN_BRANCH")
    #     return f"{operation} x{int(rs1, 2)}, x{int(rs2, 2)}, {imm_b}"
    #
    # # U-type instructions (LUI, AUIPC)
    # elif opcode == "0110111" or opcode == "0010111":
    #     return f"{mnemonic} x{int(rd, 2)}, {imm_u}"
    #
    # # J-type instructions (Jump)
    # elif opcode == "1101111":
    #     return f"JAL x{int(rd, 2)}, {imm_j}"
    #
    # return f"{mnemonic} (raw: {bin_instr})"

# Disassemble each instruction
# for instr in instructions_as_bytes:
#     print(decode_instruction(instr))
decode_instruction(instructions_as_bytes[6])

00000000000b11010111110101101101


ValueError: invalid literal for int() with base 2: '00000000000b'

LOAD (raw: 0b1010010001110000011)
OP-IMM (raw: 0b1100000000001010010011)
LOAD (raw: 0b111010001100000011)
OP-IMM (raw: 0b10000000110100001100010011)
STORE (raw: 0b11000111010000000100011)
OP-IMM (raw: 0b10000111000001110010011)
OP-IMM (raw: -0b11010111110101101101)
BRANCH (raw: -0b11111111010110001100011101)


In [37]:
# def decode_instruction(instr):
#     opcode = instr & 0x7F
#     rd = (instr >> 7) & 0x1F
#     funct3 = (instr >> 12) & 0x7
#     rs1 = (instr >> 15) & 0x1F
#     rs2 = (instr >> 20) & 0x1F
#     funct7 = (instr >> 25) & 0x7F
#
#     mnemonic = OPCODES.get(opcode, "UNKNOWN")
#
#     if opcode == 0b0110011:  # Register-Register OP format
#         operation = FUNCT3_OP.get(funct3, "UNKNOWN_OP")
#         return f"{operation} x{rd}, x{rs1}, x{rs2}"
#
#     return f"{mnemonic} (raw: {bin(instr)})"


In [38]:
for instr in instructions_as_bytes:
    print(decode_instruction(instr))

UNKNOWN (raw: 0b1010010001110000011)
UNKNOWN (raw: 0b1100000000001010010011)
UNKNOWN (raw: 0b111010001100000011)
UNKNOWN (raw: 0b10000000110100001100010011)
UNKNOWN (raw: 0b11000111010000000100011)
UNKNOWN (raw: 0b10000111000001110010011)
UNKNOWN (raw: -0b11010111110101101101)
UNKNOWN (raw: -0b11111111010110001100011101)
