   It is totally normal to feel like you've been thrown into the deep end here.
   Moving from high-level concepts (like knowing what an ALU is) to actually
   seeing the specific 32-bit physical wiring blueprints (which is what assembly
   formats really are) is a huge jump.

   Think of these "Types" as different puzzle templates. You only have exactly
   32 bits of space to tell the CPU what to do. Depending on the task, you have
   to chop those 32 bits up differently to fit the required pieces.


1. U-Type (Upper Immediate)
   THE CONCEPT: I-Type instructions only give you 12 bits of space to store a
   constant number. But what if you need to load a 32-bit number into a register
   ? You have to do it in two chunks. The U-Type handles the massive top chunk.

   THE BLUEPRINT: It dedicates a whopping 20 bits purely to storing a number
   (`imm[31:12]`), along with a destination register (`rd`) and the `opcode`.

   ASSEMBLY EXAMPLE:
```Assembly
lui x10, 0x87654        # "Load Upper Immediate"
```
   - WHAT IT DOES: It takes the hex number `0x1a65f` (4 bits * 5 placeholders... make sense!)


2. S-Type (Store)

   THE CONCEPT: This is the exact opposite of Load. You have calculated an
   answer in your CPU registers, and now you need to safely store it away in the
   RAM (memory).

   THE BLUEPRINT: Notice in the image that S-Type DOES NOT HAVE A DESTINATION
   REGISTER (`rd`). Why? Because the destination isn't a register; it's the RAM!
   Instead, it uses two source registers (`rs1` and `rs2`) and splits the 
   12-bit immediate into two weird chunks (`imm[11:5]` and `imm[4:0]`). It
   splits the immediate so that `rs1` and `rs2` can stay in the exact same
   physical wire positions as they do in R-Type instructions, which makes the
   hardware engineers' lives easier.

   ASSEMBLY EXAMPLE:
```Assembly
sd x14, 8(x2)           # "Store Doubleword"
```
   - WHAT IT DOES: It takes the data currently sitting in register `x14` (which
     acts as `rs2`) and stores it into the memory address calculated by taking
     the base address in `x2` (`rs1`) and adding an offset of `8`.



3. B-Type (Branch)

   THE CONCEPT: This is your assembly-level `if-statement`. It compares two
   registers, and if a condition is met, it "branches" (jumps) to a different
   line of code.

   THE BLUEPRINT: It looks almost identical to the S-Type. It uses two source
   registers (`rs1` and `rs2`) to compare values. The "immediate" here is the
   offset (how many steps to jump forward or backward). You'll notice the
   immediate bits arte scrambled like crazy (`imm[12]`, `imm[10:5]`, 
   `imm[4:1]`, `imm[11]`). This is a famous RISC-V quirk done purely to make the
   silicon wiring cheaper to manufacture.

   ASSEMBLY EXAMPLE:
```Assembly
beq x19, x10, Label        # "Branch if Equal"
```
   - WHAT IT DOES: The CPU compares `x19` and `x10`. If they hold the exact same
     value, the program counter jumps to wherever `Label` is located in your 
     code.
                  ... huh, how funny.  so the immediate bits are merely so 
                      scrambled in this fashion all because it's cheaper to 
                      manufacture? lolol


4. J-Type (Jump)
   THE CONCEPT: This is an unconditional jump. You aren't checking if two things
   are equal; you are just instantly jumping to a new location. This is how
   function/method calls work in assembly.

   THE BLUEPRINT: Because you might need to jump really far away in your code,
   it dedicates 20 scrambled bits to the jump distance. It also needs a 
   destination register (`rd`). When you jump to a function, you need to 
   remember where to come back to when the function finishes. The `rd` slot
   saves that "return address".


The J-TYPE (Jump) instruction is used for unconditional jumps, meaning the
program skips to a new location in the code without checking a condition first.
In RISC-V, this is primarily used for function calls through the `jal` (Jump and
Link) instruction.


THE BLUEPRINT
   To jump to a new location, the CPU needs two things: the destination address
   and a way to get back. The 32 bits are divided as follows:
   - OPCODE (7 bits): Identifies the instruction as a J-type jump.
   - RD (5 bits): The "Link" register. Before the CPU jumps away, it saves the
     address of the next instruction here so the program knows where to return
     when the function ends.
   - IMMEDIATE (20 bits): This is the "jump distance".

---
THE SCRAMBLED IMMEDIATE
   You might notice in your lecture summary that the bits for the immediate
   (`imm[20]`, `imm[10:1]`, `imm[11]`, `imm[19:12]`) are in a very strange, 
   non-linear order.

   - WHY? This is a hardware optimisation. By placing certain bits in specific
     spots, the physical wires for the `opcode` and `rd` can stay in the same
     place across different instruction types (like U-type), making the chip
     smaller and cheaper to build.


ASSEMBLY EXAMPLE: `jal`
```Assembly
jal x1, 2000        # Jump to the instruction at PC + 2000
```
   
HOW IT WORKS STEP-BY-STEP:
   1. CALCULATE RETURN ADDRESS: The CPU looks at the current PROGRAM COUNTER 
      (PC) and adds 4 (the address of the very next line of code).
   2. LINK: It saves that return address into register `x1`.
   3. JUMP: It takes the scrambled 20-bit immediate, "unscrambles" it, and adds
      it to the current PC.   
   4. EXECUTE: The CPU starts executing the code at that new address.

SUMMARY OF J-TYPE FEATURES:
   - UNCONDITIONAL: It always jumps; it doesn't compare registers like a B-type
     (Branch) instruction.
   - RANGE: Because it has 20 bits for the offset, it can jump much further than
     a B-type instruction (which only has 12 bits). 
   - SAVES PROGRESS: It is designed for subroutines/functions because it "links"
     (saves) the return path.

In [None]:
  _--_                                     _--_
/#()# #\         0             0         /# #()#\
|()##  \#\_       \           /       _/#/  ##()|
|#()##-=###\_      \         /      _/###=-##()#|
 \#()#-=##  #\_     \       /     _/#  ##=-#()#/
  |#()#--==### \_    \     /    _/ ###==--#()#|
  |#()##--=#    #\_   \!!!/   _/#    #=--##()#|
   \#()##---===####\   O|O   /####===---##()#/
    |#()#____==#####\ / Y \ /#####==____#()#|
     \###______######|\/#\/|######______###/
        ()#O#/      ##\_#_/##      \#O#()
       ()#O#(__-===###/ _ \###===-__)#O#()
      ()#O#(   #  ###_(_|_)_###  #   )#O#()
      ()#O(---#__###/ (_|_) \###__#---)O#()
      ()#O#( / / ##/  (_|_)  \## \ \ )#O#()
      ()##O#\_/  #/   (_|_)   \#  \_/#O##()
       \)##OO#\ -)    (_|_)    (- /#OO##(/
        )//##OOO*|    / | \    |*OOO##\\(
        |/_####_/    ( /X\ )    \_####_\|
       /X/ \__/       \___/       \__/ \X\
      (#/                               \#)

3. B-Type (Branch)
   THE CONCEPT: This is your assembly-level `if-statement`. It compares two 
   registers, and if a condition is met, it "branches" (jumps) to a different
   line of code.

   THE BLUEPRINT: It looks almost identical to the S-Type. It uses two source
   registers (`rs1` and `rs2`) to compare values. The "immediate" here is the
   offset (how many steps to jump forward or backward). You'll notice the 
   immediate bits are scrambled like crazy (`imm[12]`, `imm[10:5]`, `imm[4:1]`,
   `imm[11]`). This is a famous RISC-V quirk done purely to make the silicon
   wiring cheaper to manufacture.

ASSEMBLY EXAMPLE:
```Assembly
beq x19, x10 Label              # "Branch if Equal"
```



4. J-TYPE (Jump)
   THE CONCEPT: This is an unconditional jump. You aren't checking if two things
      are equal; you are just instantly jumping to a new location. This is how
      function/method calls work in assembly.
   THE BLUEPRINT: Because you might need to jump really far away in your code,
      it dedicates 20 scrambled bits to the jump distance. It also needs a 
      destination register (`rd`). When you jump to a function, you need to 
      remember where to come back to when the function finishes. The `rd` slot
      saves that "return address".
   ASSEMBLY EXAMPLE:
```Assembly
jal x1, 32                  # "Jump and Link"
```
   - WHAT IT DOES: It jumps forward 32 steps in the code, but before it leaves, 
     it saves the address of the next line of code into register `x1`.




---
THE ULTIMATE CHEAT SHEET SUMMARY
   Here is the cheat sheet of the 6 formats...
   
   - R-Type (Register): Pure math strictly inside the CPU. Takes 2 source 
     registers, does math, saves to 1 destination register.
   - I-Type (Immediate): Math with small constants, or Loading data from RAM. 
     Takes 1 source register and a 12-bit cosntant number, saves to 1 
     destination register.
   - S-Type (Store): Saving data to RAM. Takes 2 source registers and a 12-bit 
     offset. Has no destination register.
   - B-Type (Branch): Conditional IF statements. Compare 2 source registers, and
     jumps using a scrambled 12-bit offset. Has no destination register.
   - U-Type (Upper): Creating massive constants. Takes a giant 20-bit number and
     saves it directly to the top half of 1 destination register.    
   - J-Type (Jump): Function calls. Takes a giant 20-bit scrambled offset to
     jump to, and saves the return address in 1 destination register.  

In [None]:
                  ^
                 / \
            ^   _|.|_   ^
          _|I|  |I .|  |.|_
          \II||~~| |~~||  /
           ~\~|~~~~~~~|~/~
             \|II I ..|/
        /\    |II.    |    /\
       /  \  _|III .  |_  /  \
       |-~| /(|I.I I  |)\ |~-|
     _/(I | +-----------+ |. )\_
     \~-----/____-~-____\-----~/
      |I.III|  /(===)\  |  .. |
      /~~~-----_________---~~~\
     `##########!\-#####%!!!!!| |\
    _/###########!!\~~-_##%!!!\_/|
    \##############!!!!!/~~-_%!!!!\
     ~)#################!!!!!/~~--\_
  __ /#####################%%!!!!/ /
  \,~\-_____##############%%%!!!!\/
  /!!!!\ \ \~-_###########%%%!!!!\
 /#####!!!!!!!\~-_#######%%%!!!!!!\_
/#############!!!\#########%%%!!!!!!\

WHAT IS COMPUTER ARCHITECTURE?
   - architecture = instruction set architecture (ISA) ++ machine organisation
   - ISA examples: x86, ARM, MIPS, SPARC, RISC-V
   - instruction set: How abstract? How complex? Support for general / 
     special-purpose computing? CCompatibility?
   - How to choose an implementation for a given instruction set?

Computer architecutre has two parts: instruction set architecture (ISA), and 
machine organisation that implements a given ISA. So how can we design an ISA?
And how to choose an implementation for an ISA?

ISA: BETWEEN SOFTWARE AND HARDWARE

   - ISA provides an abstraction of hardware resources to software
   - The ISA is an abstraction of the hardware resources of a given machine. It
     forms the lowest level of software, on which higher levels of software
     can be built. Note that opne ISA can be implemented in many ways. For
     example, both Intel and AMD supports the x86 ISA.


---
DESIGN APPROACHES
   - Complex Instruction Set Computers, `CISC`
      - dense code, simple compiler
      - powerful instruction set, variable format
   - Reduced Instruction Set Computers, `RISC`
      - simple instructions, fixed format, optimising compiler
      - speed, low development cost, adapt to new technology

   There are two main ISA approaches: CISC and RISC. CISC includes more complex
   instructions, which reduces the workload of compilers (which translates 
   high-level languages like C into assembly language). However, this also means
   that CISC instructions take more time to execute, as they are more complex.
   Since the compilers are becoming better, RISC is becoming faster and more
   adaptive to new technologies. You will learn the details of their differences
   in the third lecture.

   Most ISAs are now RISC bnased, except the x86 ISA. ...

In [None]:
MODULE COVERAGE
   1. INSTRUCTIONS: format, impact on performance
   2. ALU: architecture, use in multiplication and division
   3. DATAPATH: single-cycle, multi-cycle
   4. CONTROL UNIT: FSM, microsequencer, exception


INSTRUCTIONS: OVERVIEW
   - instruction = opcode + operand
                opcode == what it does
                operand == register / memory / data
   - RISC-V instructions: 4 main types: R, I, S, U
   - design-principles for RISCs
                good performance ++ easy to implement
   - use RISC-V processor to illustrate ideas in this module
                - not as simple as MIPS, but more recent
                - Part 2: covers x86 architecture, a popular CISC

   An instruction for a processor has two parts: opcode and operand. Opcode
   specifies its function, while the operand specifies the information or data
   needed to carry out that function. For example, the RISC-V processor has 4
   main instruction types, which will be detailed later. We will use RISC-V to
   explain ideas in computer architecture.

In [None]:
     .    _    +     .  ______   .          .
  (      /|\      .    |      \      .   +
      . |||||     _    | |   | | ||         .
 .      |||||    | |  _| | | | |_||    .
    /\  ||||| .  | | |   | |      |       .
 __||||_|||||____| |_|_____________\__________
 . |||| |||||  /\   _____      _____  .   .
   |||| ||||| ||||   .   .  .         ________
  . \|`-'|||| ||||    __________       .    .
     \__ |||| ||||      .          .     .
  __    ||||`-'|||  .       .    __________
 .    . |||| ___/  ___________             .
    . _ ||||| . _               .   _________
 _   ___|||||__  _ \\--//    .          _
      _ `---'    .)=\oo|=(.   _   .   .    .
 _  ^      .  -    . \.|

WHY RISC-V?
   
   - open standard, managed by the RISC-V Foundation
      - free from licensing fees and restrictions
   - wide variety of implementations
      - from small edge devices to large servers
   - easy customisation
      - support for vector extensions targeting efficient AI, HPC, ...   
   - growing ecosystem
      - commercial: NVIDIA, Qualcomm, Samsung...
      - research: processor optimisation, processor security...
            "Constraint-aware Bayesian optimisation for FPGA-based soft processors"   

   RISC-V is often regarded to be the first open standard processor that can be
   found in a wide variety of systems, from large to small. In contrast to
   processors from Arm, RISC-V is free from licensing fees and is adopted by a 
   growing number of companies. Many researchers also adopt RISC-V; one of our
   recent papers describes how a ML technque called Bayesian Optimisation can be
   used in optimising RISC-V designs.   

RISC-V architecture
   
   - representative of modern RISC architectures
   - 32 registers
                x0...x31            64 bits each (for RV641)
   - x0 for constant value 0, x1 for return address...
   - 32-bit data: a word; 64-bit data: a double word
   - register-register or load-store architecture
      - most instructions involve registers only: fast
                    `add x1, x2, x3             # x1 = x2 + x3`                   
      - special memory access instructions: possibly multicycle
                    `1w x8, Astart(x19)         # x8 = M[Astart + x19]`
      - goal: minimise memory access; why?

   RISC-V has a simpel architecture, with 32 registers, each of 64 bits. Most
   instructions related to computation involving only registers and not memory,
   since registers are much faster than memory. There are special instructions
   for memory access which are relatively slow; so once data are brought into 
   the CPU, they would stay in the CPU as long as possible before being sent 
   back to memory.

...

   - R-type: arithmetic, comparison, logical, ...
   - funct7, funct3: additional opcode field for R-type

   The four types of RISC-V instructions are R, I, S and U. The R-type covers
   register-based instructions for computation, including arithmetic and logical
   operations.



RISC-V instructions: I-type   
   - immediate (I-type): memory address or arithmetic constant captured in the
     instruction itself
   - memory access: load from memory: $x14 = M[8+x2]$
      `ld x14, Astart(x2)           # x14 = M[Astart + x2], Astart = 8`
   - arithmetic:
      `addi x15, x1, -50            # x15 = x1 - 50`
   - note that rs1 and rd are at the same bit locations as R-type


The I-type instructions cover loading a memory into a register, and arithmetic
operations when, for example, a constant value is involved. Notice that rs1, 
funct3 and rd of the I-type are at the same bit location as those for the R-type
. This regulairty would simplify the processing of RISC-V instructions.


RISC-V INSTRUCTIONS: S-type
   - memory access: store to memory: $M[8+x2] = x14$
            `sd x14, Astart(x2)     # M[Astart + x2] = x14, Astart=8`
   - can be used for branch instructions (SB-type or B-type)
            `beq x19, x10, Label    # if x10 == x19, goto Label`
            .... (16 bytes offset)

   The S-type instructions cover storing a register value in memory. A variation
   called SB-type or B-type, is used for conditional branching.         

RISC-V INSTRUCTIONS: U-type

   - load Upper immediate: load value in upper bits of register
            `lui x10, 0x8e3f1       # x10 - 0x8e3f1000 set upper 20 bits`
   - To load the value `0x8e3f1321` to `x10`, do add immediate after:
            `lui  x10, 0x8e3f1      # x10 = 0x8e3f1000`
            `addi x10, 0x321        # x10 = x10 + 0x321`
                                   `#     = 0x8e3f1321`

   - U-type instructions cover instructions involving data in upper bits of a
     register. The `lui` instruction, for example, load a 20-bit value specified
     by the instruction into the most significant 20 bits of a given register.
     If we need to initialise a register by a 32-bit constant, we use the `lui`
     instruction first to initialise the most significant 20 bits of the 
     register, the remaining 12 bits can be initiailise by an `addi` instruction
     as shown above.

RISC-V instructions: UJ-type
   - Unconditional Jump
            `jal x1, 32     # save addresss of next instruction in x1`
            `               # jump to instruction at address PC + 32`
            `               # immediate [0] = 0 (see figure below)`
   - UJ-type (or J-type) is a variant of the U-type:
   - UJ-type (or J-type) instructions cover unconditional jumps to instructions
     in memory. The `jal` instruction enables returning to the instruction
     right after the `jal` instruction. The UJ-type instruction can be 
     considered a variation of the U-type since they share the same shape.



EXAMPLE: compiling `if-statements`
   - `if (i = j) f = g + h; else f = g - h;`
   -            `allocate   x19=f   x20=g`
   -            `x21=h      x22=i   x23=j`
         - `bne x22, x23, Else            # if i != j goto Else`
         - `add x19, x20, x21             # f = g + h         (if i = j)`
         - `beq  x0,  x0, Exit            # goto Exit`
   - Else: `sub x19, x20, x21             # f = g - h         (if i != j)`
   - Exit: `...`
   
   - while-loop: similar

   This example shows how to assign data to registers to initialise them, so 
   that they can then be used to implement statements from a high-level language
   -- in this case an if-statement.

In [None]:
     \
     - \
     |   \
     |---- \
     |  KKB  \
     |---------\
      \          \
        \--------O-\
          \----^-|-^-\
            \___/_\____\
            __/_____\____\___/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


INSTRUCTIONS: IMPACT ON PERFORMANCE

PERFORMANCE
   - purchasing perspective
      - performance, cost
   - design perspective
      - performance / cost and improvements
   - require
      - method for calculation
      - basis for comparison
      - metric for evaluation
      - understanding of implications for architectural choices

   There are multiple perspectives of performance. If we want to buy a computer,
   we would be concerned about the performance and costs of what are available
   to select the best. If we are designing a new computer, we would need to 
   ensure that when it is ready, it would not only be better than existing
   computers, but would also be competitive againt other computers available at
   that time. We require methods for calculating performance, a basis for 
   comparing designs, metrics for evaluating performance, and appreciation of
   the impact of architectural choices.      

In [None]:
      *                                                            *
                              aaaaaaaaaaaaaaaa               *
                          aaaaaaaaaaaaaaaaaaaaaaaa
                       aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                     aaaaaaaaaaaaaaaaa           aaaaaa
                   aaaaaaaaaaaaaaaa                  aaaa
                  aaaaaaaaaaaaa aa                      aa
 *               aaaaaaaa      aa                         a
                 aaaaaaa aa aaaa
           *    aaaaaaaaa    aaaa
                aaaaaaaaaaa aaaaaaa                               *
                aaaaaaa    aaaaaaaaaa
                aaaaaa a aaaaaa aaaaaa
                 aaaaaaa  aaaaaaa
                 aaaaaaaa                                 a
                  aaaaaaaaaa                            aa
                   aaaaaaaaaaaaaaaa                  aaaa
                     aaaaaaaaaaaaaaaaa           aaaaaa        *
       *               aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
                          aaaaaaaaaaaaaaaaaaaaaaaa
                       *      aaaaaaaaaaaaaaaa



---

... an ISA (Instruction Set Architecture) very much like a formal, binding
convention or contract.

It is an abstract model that defines the interface between hardware (the
processor) and software (the OS/applications). It sets the rules for how 
software communicates with the processor and what operations the processor 
guaranteess to perform.

# Abstract 1

In computer architecture, the Register/Registeer (R-Type) instruction is the 
bread and butter of your CPU's mathematical operations. It is used for tasks 
where all the data is already inside the CPU's fast local storage (the registers
) and does not require touching the slower main memory (RAM).


HOW THE R-TYPE BLUEPRINT WORKS
   Because you have exactly 32 BITS to work with, the R-Type format chops the
   instruction into six specific fields to tell the hardware exactly what to do:
      - `opcode` (7 bits): Tells the Control unit that this is a mathematical
        R-Type operation.
      - `rd` (5 bits): The "Destination" register where the final answer will be
        stored.
      - `funct3` && `func7` (10 bits total): These act as sub-codes. While the
        opcode says "do math", these specific bits tell the ALU whether to ADD,
        SUBTRACT, or perform logical shifts.
      - `rs1 && rs2` (5 bits each): These identify the two "Source" registers
        that contain the numbers you want to use for the calculation.


...


SUMMARY OF R-Type Features
   - NO MEMORY ACCESS: Unlike S-Type or I-Type, it never looks at the RAM; it is
     strictly "internal" to the CPU.
   - SPEED: Because it stays within the registers, these are the fastest
     instructions the processor can execute.
   - FIXED SIZE: Like all RISC-V instructions, it is exactly 32 bits long.    


- Yes, RISC-V B-type (Branch) instructions have a relatively small immediate
  range ($\pm$ 4 KiB) primarily due to a combination of fixed 32-bit instruction 
  constraints and the assumption that most control flow changes occur within 
  a localised section of code.

  Here is a breakdown of why B-type immediates are limited:
     - FIXED INSTRUCTION LENGTH: Like other RISC architectures, all standard
       RISC-V instructions are 32-bit. To encode two register sources (`rs1`,
       `rs2`), a funct3, and an opcode, the remaining bits available for the
       immediate are limited.
     - INSTRUCTION FORMAT CONSTRAINTS: B-type (Branch) instructions need to 
       store the immediate value in a specific way to align with S-type (Store) 
       hardare, allowing them to use the same decoding logic. This results in a
       12-bit immediate field being used.
     - SIGN-EXTENSION and SCALING: The 12 bits are sign-extended and shifted
       left by 1 bit (since all instructions must be 2-b)  ...
                    (I don't understand what is this referring to here...
                    I will skip for now and work on other things for now)
     - PROGRAM STRUCTURE ASSUMPTION: The design asumes that the majority of
       conditional branches in programs are to nearby code (loops, `if-else`
       blocks). For jumps further away, compilers use the J-type (JAL) 
       instruction, which provides a larger, 20-bit immediate ($\pm$ 1 MiB), or
       multiple instructions to reach 64-bit address spaces.

   While the immediate is small, the design optimises for quick, localised
   branching, whereas non-local, longer-range jumps are expected to be rarer and
   handled by a different instruction format.