# Introduction

RTF64 is another attempt by the author at a 64-bit design.

## Features

* 64-bit integer data path
* 64-bit double precision floating-point data path
* 32-entry integer register file
* 32-entry floating-point register file
* 32-entry posit arithmetic register file
* 4 8-bit compare results registers
* 2 dedicated return address registers
* 32-bit fixed size instructions

## Future Features

* 4-way out-of-order (ooo) superscalar execution
* precise exception handling
* branch prediction with branch target buffer (BTB)
* Instruction L1, L2 and data L1, L2 caches
* 7 entry write buffer
* Dual memory channels

## History

rtf64 is a work in progress beginning in September 2020. rtf64 originated from NVIO3 which originated from NVIO which originated from FT64 which originated from RiSC-16 by Dr. Bruce Jacob. RiSC-16 evolved from the Little Computer (LC-896) developed by Peter Chen at the University of Michigan. See the comment in FT64.v. The author has tried to be innovative with this design borrowing ideas from many other processing cores.

## Motivation

The author wanted an FPGA based processing core for experimental purposes. This is in part an example of sub-optimal design. For instance, there is an extra unused bit in the JSR and JMP instructions. Using this extra bit for the target address for instance, would cause the target address field to be shift by a bit relative to byte positions. As it is one can read most of the target address directly from the machine code. So, the sub-optimization is to add a human factor to the design. There are other examples to be found.

### Case Comparisons

Some of the more striking points of a handful of architectures are compared to what is available in RTF64.

### Case Comparison 6502

6502 vs rtf64

#### Overview

This is a bit of an apples to oranges comparison as the two designs are for different environments. The 6502 was designed for a much smaller operating environment and is extremely frugal with transistor usage. The RTF64 was designed as 64-bit processor used for experimentation in a much larger environment.

#### Instruction Format

The 6502 as a byte-oriented design has a compact variable instruction length encoding. Many instructions are encoded using an average of about two bytes.

While variable sized instructions offer great advantage for code density, they add complexity to the processing core. RTF64 uses a fixed 32-bit instruction encoding. As such for a given single instruction it requires twice the memory of a 6502. However, the instructions in the RTF64 operate on 64-bit values, to perform the same operations in the 6502 would require many more bytes. Several instructions in the RTF64 are more powerful than what can be found in the 6502.

#### Registers

The RTF64 has many more registers than the 6502. It is a general-purpose register-oriented design while the 6502 is accumulator oriented. A register file of about 32 registers has been found to be a good match to many computing environments. This is somewhat of a historical determination. The RTF64 has available many more transistors than were available to the 6502 design. The RTF64 has many special purpose registers. The 6502 does not have any.

#### Instructions

The 6502 uses relative branches to allow a code dense instruction encoding. Since there are enough bits available in the RTF64 branch instructions to encode an absolute address, absolute addressing is used. It takes a little bit less hardware to use absolute addressing rather than relative addressing. It is also much easier to “see” the target address in the machine code if it is properly aligned.

The 6502 offers only basic instructions (ADD, SUB, CMP, AND, ORA, EOR, LDA, STA) as examples. There are no complex instructions in the 6502 ISA. All instructions execute within a handful of clock cycles. the RTF64 has a ton of instructions compared to a 6502. It supports floating point and posit arithmetic.

The 6502 is an accumulator-based architecture that allows one memory-based operand for most instructions. RTF64 is register based and the only instructions accessing memory are load and store instructions.

### Case Comparison RISCV

RISCV vs RTF64

#### Instruction Format

While variable sized instructions offer great advantage for code density, they add complexity to the processing core.

In RISCV support for 16-bit compressed instructions consumes two opcode bits, and opcode bits are valuable. The use of these two bits and the reduction of the opcode space for other instructions is an excellent trade-off. Compressed instructions can improve code density by about 25% or more and consequently make better use of the cache. There is only the occasional instruction that can not be encoded using two fewer encoding bits, so only a very small percentage would be gained back in code density by having two more bits available.

The JAL instruction in RISCV allows any register to be used to store the return address. In practice only one or two registers which are fixed by the ABI are used. This means that there are about four bits of opcode space wasted for unnecessary register specification. Making use of these extra four bits is extremely valuable. The RTF64 design only requires a single bit to specify the return address register. The presence of four extra bits to specify the target address makes absolute addressing appealing for this design.

To build constants the LUI instruction is used. In RISCV the LUI instruction allows any register to be used as the target and has a 20-bit constant field because of encoding constraints. In practice it is possible to get by using only one or two registers to build constants with. RTF64 has more direct support for constants larger than 32 bits. It makes use of LMI (load middle immediate) and LUI instructions in a manner like RISCV but allows only two registers to be loaded with constants in that manner. RISCV does not really provide much for building constants over 32 bits.

#### Instructions

RISCV does not include indexed addressing modes in the standard implementation. Indexed addressing is accomplished when required using additional instructions and registers to calculate the effective address. RTF64 directly supports indexed addressing with an optionally scaled index register. When indexed addressing is required RTF64 is more code dense than RISCV. However indexed addressing is not used that often.

RISCV accesses memory and I/O exclusively using load and store instructions. In addition to loads and stores RTF64 also stacks and unstacks the instruction pointer to / from memory with the JSR and RTS instructions to improve code density.

RISCV uses a JAL instruction to return from subroutines. RTF64 has a dedicated subroutine return instruction. This allows RTF64 to also adjust the stack pointer during a return operation improving code density and performance.

#### Register File

RISCV does almost everything using general-purpose registers. This paradigm increases the pressure on the register file. In the RTF64 design there are more register files involved. Effectively, there are a few more additional registers which reduce the pressure on the general-purpose register file. There is a trend to place some global variables in the register file for performance reasons. These variables include operating vars for garbage collection, pointers to global and thread data and pointers for exception handling.

One reason to use more register files is that in a superscalar design it may allow more instructions to be committed at the same time. There is usually a limit on the number of write ports to the general register file. This limit affects how many instructions can be committed at once. By providing separate register files for some operations it effectively increases the number of write ports available making it possible to commit more instructions per cycle.

#### Return Address Registers

There is not a requirement for more than a couple of return address registers. The instruction set may be refined to allow only a single bit to specify the return address register.

#### Compare Results Registers

For this design, the result of a compare operation is stored in a compare result register. A couple of questions come to mind as to the use of compare results registers. Why use them instead of general-purpose registers? And, how many compare results registers is enough? RISCV stores comparison results if needed in general-purpose registers. It has just a single instruction (SLT) dedicated to generating compare results. RISCV makes use of branches that compare-and-branch encoded in a single instruction. This is effective at removing the need for most compare operations. The intermediate result of the compare is hidden in the architecture; there is no need for visible compare results registers. There is still a need for the computed result of a compare operation. Sometimes software records the comparison result for later usage. For example, there may be a line of code: x = y > 10. Which will set x true if y is greater than 10.

Compares are tightly coupled to branch operations. Some architectures like RISCV compare and branch in a single instruction. Other architectures use a flags register or several flags registers. Yet other architectures simply use the general-purpose registers. How many compare results registers are needed? Four was deemed sufficient to provide two additional registers in addition to supporting the use of separate registers for integer and floating-point compare results. With register renaming available in a superscalar processor, there does not need to be whole bunches of compare results registers.

One reason to use a separate group of compare results registers is that in a superscalar design it may allow more instructions to be committed at the same time. There is usually a limit on the number of write ports to the general register file. This limit affects how many instructions can be committed at once. By providing separate register files for some operations it effectively increases the number of write ports available making it possible to commit more instructions per cycle.

#### Operating modes.

This design uses six operating modes. It has the RISCV operating modes plus separate modes for interrupt and debug. The author has seen a comment to the effect that debug on a RISCV processor really acts like an additional mode. This has been made explicit in this design.

#### Memory Management

RISCV offers several memory management options including several different paging arrangements and a couple of optional base and bound registers.

### Case Comparison MMIX

#### Instruction Format

MMIX comes across as more of a pedantic processor design. MMIX instructions are structure simply for the most part using a 32-bit format divided into four-byte regions. The author assumes this is primarily to enhance the readability of instructions. The constant field is often limited to eight bits.

#### Register File

MMIX has a 256-entry register file. It is not clear that this number of registers has any benefit over a 32-register design, but it makes the instruction format clear and easy to understand which may be a goal for a processor used for academic purposes.

#### Instructions

There are a lot of conditional move instructions in the MMIX ISA. RTF64 currently does not have any conditional moves.

### Case Comparison PowerPC

#### Overview

RTF64 is perhaps most like the PowerPC in its design. PowerPC is also a 32-register design. The PowerPC uses condition registers, eight versus RTF64’s four, for branch determination.

#### Instruction Format

The PowerPC like RTF64 uses a fixed 32-bit instruction format.

#### Instructions

The PowerPC supports indexed addressing like the RTF64 although index scaling is not present. The author has found indexed addressing makes up about 3% of instructions and scaled indexes a much smaller percentage.

#### Registers

The PowerPC has a dedicated link register and eight condition code registers. RTF64 is similar with a dedicated pair of link registers and four compare result registers. While RTF64 has fewer registers for condition codes it is not clear to the author that any more offer a benefit. The PowerPC also has a loop count register used for counted loops. RTF64 does not have a loop count register.

The general register array is the same size – 32-entry.

### Case Comparison x86

#### Registers

The x86 series has a register file that is accessible in subparts. Parts of a single register may be referred to instructions. For example, EAX is a 32-bit register that is also accessible as AL for byte operations. This has no-doubt complicated the x86 design. This contrasts with RTF64 and many RISC designs where the registers are always manipulated as whole units.

### Case Comparison SPARC

#### Registers

The SPARC machine uses register windowing, where a subset of registers is available from a much larger set that is “windowed”. In the SPARC the subset register window scrolls up and down automatically during subroutine calls and returns. The idea was to improve performance by not having to stack and unstack registers to memory during subroutine operations. However, with a good modern optimizing compiler the performance level of the SPARC is not much different than that of other architectures.

**Nomenclature**

The ISA refers to primitive object sizes following the convention suggested by Knuth of using Greek.

|  |  |  |
| --- | --- | --- |
| Number of Bits |  | Instructions |
| 8 | byte | LDB, STB |
| 16 | wyde | LDW, STW |
| 32 | tetra | LDT, STT |
| 64 | octa | LDO, STO |
| 128 | hexi | LDH, STH |

The register used to address instructions is referred to as the instruction pointer or IP register. The instruction pointer is a synonym for instruction pointer or PC register.

# Development Aspects

## Device Target

The core has been developed with FPGA usage in mind. In particular it is expected that the register file is built out of block memories.

## Implementation Language

The core is implemented in the System Verilog language primarily for its ability to process array objects. Much of the core is plain vanilla Verilog code.

# Programming Model

## **Registers**

### Overview

The RTF64 ISA is a 32-register machine with a separate register file for integer, floating-point, or posit arithmetic. There are 32 sets of integer registers. There are many control and status (CSR) registers which hold an assortment of specific values relevant to processing.

### Register Sets

Because of the use of block memory in an FPGA there are multiple integer register sets available. There are several register sets dedicated to different operating modes of the processor. The remaining register sets are available for general use.

|  |  |
| --- | --- |
| Register Set | Associated Usage |
| 0 to 25 | general usage |
| 26 | user exceptions |
| 27 | supervisor |
| 28 | hypervisor |
| 29 | machine |
| 30 | interrupt |
| 31 | debug |

Each register set includes integer registers plus return address, compare and exception address registers for which there are also multiple sets.

### General Purpose Registers (x0 to x31)

The register usage convention probably has more to do with software than hardware. Excepting a few special cases, the registers are general purpose in nature. Registers may hold either integer or floating-point values.

x0 always has the value zero. Registers x30 and x31 are used for stack references and subject to stack bounds checking.

x1 may be used with the constant building instructions (LUI, LMI, AMIPC)

|  |  |  |
| --- | --- | --- |
| Register | Description / Suggested Usage | Saver |
| x0 | always reads as zero (hardware) |  |
| x2 | constant building / temporary (cb) |  |
| x3-x9 | temporaries (t0-t6) | caller |
| x10-x19 | register variables (s0-s9) | callee |
| x20-x27 | function arguments (a0-a7) a7/g2 | caller |
| x28 | thread pointer (tp / g1) |  |
| x29 | global data pointer (g0) | callee |
| x30 | base / frame pointer (fp) | callee |
| x31 | current stack pointer (sp) | callee |
|  |  |  |
| cr0-cr3 | compare results |  |
| ra0 | return address register |  |
| ra1 | alternate return address register |  |
| cn | code index register |  |
|  |  |  |
| eip | exceptioned instruction pointer |  |

### Compare Results Registers

The result of a compare operation is stored in a compare result register. There are four eight-bit compare results registers in the design. The compare results registers store the flag results of a compare operation. They are also used to store the Boolean results of a set operation. Typically, one compare result is used for each of integer and floating-point compares. Compare results registers are updated by one of the compare or set instructions. Many other instructions may optionally update one of the compare results registers depending on the instruction. This is option is encoded as the ‘r’ record bit in the instruction.

The set instructions may set the carry bit of a compare results register to a Boolean value of (0 or 1) which may then later be tested with the BCS or BCC instructions.

The carry bit of compare results registers may be used as an operand for some integer operations. For instance, left shift and add operations can make use of the carry bit of a compare result register.

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| N | V | O | P | ~ | ~ | Z | C |

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Meaning |
| 0 | C | Carry flag, set if operation overflows |
| 1 | Z | Zero flag, set if result is zero |
| 2 | ~ | reserved |
| 3 | ~ | reserved |
| 4 | P | Parity (exclusive or of all result bits) |
| 5 | O | Odd, set if result is odd |
| 6 | V | Overflow, set if signed result overflows |
| 7 | N | Negative, set if signed result is less than zero |

*The author has chosen in this design to make the compare results registers look like a processor condition code register. There is a large software base that uses condition code registers and programmers are familiar with them. The author feels keeping a familiar look and feel is important.*

*Choosing to support only four compare results registers is because many architectures including superscalar ones can get by with just a single flags register. There is no need for many registers in this case. Having more than two register may be handy. Typically, one register is used by the compiler for integer operations and second register used for floating-point operations. The compare results registers are effectively more powerful in this architecture because they allow result flags to be accumulated into a single register for multiple compare operations.*

*The presence of compare results register independent of the general register file helps to reduce the pressure on the general register file. It effectively adds to the register set available.*

All the comparison results registers are accessible as an aggregated group by the MOV instruction.

### Return Address Registers

There are two return address registers (ra0 and ra1) available in the design. These are designated the normal and alternate return address registers. Return address registers store the address of a JSR operation. A return instruction ([RTS](#_RTS_–_Return), [ARTS](#_ARTS_–_Alternate)) is used to return to some point after the JSR instruction.

*There are two return address registers as there is no need for more. Often in modern designs a general-purpose register is delegated to store the return address. The ABI typically specifies which register to use. By restricting the number of registers that can be selected as the return address register, it is possible to create an instruction encoding that is a little more bit efficient. The encoding of a JSR instruction in this design allows a 24-bit absolute address specification.*

*Storing the address of the JSR instruction rather than the next program address in the return address register is a little unusual. It is motivated by the observation that the return instruction can return to a point substantially past the place the instruction after the JSR would be located. For inline subroutine parameters and possibly exception handling trampolines the return instruction can return past the normal return address location. Since this capability is built into the return instruction, some hardware may be conserved by simply copying the JSR address to the return address register rather than computing the next address and using that.*

*Like the compare results registers the return address registers independent of the general- purpose registers helps reduce register pressure. It effectively makes two more registers available.*

### Instruction pointer

The instruction pointer, also sometime called a program counter, identifies which instruction to execute. The instruction pointer increments as instructions are processed. The increment may be overridden using one of the flow control instructions. The instruction pointer addresses 32-bit instruction parcels. The instruction pointer remains byte addressable and increments by four. The instruction pointer register is also split into two sections. Only the lower 24 bits of the IP increment.

*There is little reason to increment higher order instruction pointer bits. That would just waste hardware. Most code fragments are small and the JSR instruction is used to set the instruction pointer from routine to routine, overriding the increment. The only time there is an issue is if code passes through the 16MB boundary. This can be handled by carefully aligning code so that a subroutine does not span the boundary. With the use of a virtual memory system and the fact that individual modules are often far less than 16MB in size the limitations of the instruction pointer are not great.*

|  |  |
| --- | --- |
| 63 24 | 23 0 |
| IP High40 | IP Low[23..0] |

### The Code Index Register

There is a register in the architecture dedicated to use in computed target addresses called the code index (Cn) register. The absolute target address of a jump instruction may be adjusted by the value in the code index register. The code index register is accessible via the MOV instruction. To perform a calculated jump first compute the target address or target address displacement in a general-purpose register then move the value to the code index register.

*Some architectures will allow any general-purpose register to be used in calculating a computed target address. That is not done in this design to avoid another set of instructions for performing jumps and subroutine calls. A computed call is a rare use of a jump instruction. The author feels it is adequate to provide only a single register for this function.*

### Register Zero

Register zero – r0 – always reads as zero.

*Although forcing register zero to zero all the time uses up a register it is generally considered valuable enough to do. It removes the need to initialize a register to zero for use.*

### Stack and Frame Pointers

Although the stack and frame pointer registers may be used with any instruction the core has special hardware to detect stack bounds violations by either the stack pointer or frame pointer. The stack and frame pointer registers should be kept aligned on octa-byte boundaries. That is, they should be a multiple of eight, which has the least significant three bits as zero. There is currently no hardware in the core to enforce alignment.

*The author considered having the stack pointer as an independent register but that would require replicating a number of instructions (add, sub, and, or, etc.) just for the stack pointer. The author feels it is better to keep the stack pointer general-purpose in nature so that it may leverage the usage of the existing instruction set. This design is primarily a load / store architecture. There are no special instructions (push or pop) for manipulating the stack and hence no requirement for a special purpose register.*

### Base Registers

There are sixteen base address registers in the design. These registers hold the base address of a memory segment and some basic access rights.

|  |  |
| --- | --- |
| Register | Description / Usage |
| sg0 to sg7 | data segment registers |
| sg8, sg9 | reserved |
| sg10 | stack segment register |
| sg11 | I/O segment register |
| sg12 to sg15 | code segment registers |
|  |  |

*The author chose a set of 16 base registers, in part so that the register selection is easily viewable from the address; another human factor. Several popular architectures have fewer base registers. They may also be called address space or segment registers.*

*The data base register (sg0 to sg7) are chosen to be consecutive starting at zero so that they may be setup to occupy up to ½ of the memory space with consecutive addressing. Addresses may increment onto the next data base register.*

*The stack is often setup to be part of the data area and may share the same base address.*

*There are multiple code segment registers to allow transferring execution between several independent modules.*

There is more descriptive text of base registers in the section on memory management.

### Register Tag Map

The register tag map associates a register with a unique tag used for dependency checking in the core. Logic in the core uses 128-bit wide bit arrays with a bit reserved for each register. The register tags from the register tag map are also used with the [MOV](#_MOV[.]_–_Move) instruction to move values between registers.

|  |  |  |
| --- | --- | --- |
| Tag | Associated Register |  |
| 0 to 31 | GP0 to GP31 | general purpose registers |
| 32 to 63 | FP0 to FP31 | floating point registers |
| 64 to 95 | PS0 to PS31 | posit arithmetic registers |
| 96 to 97 | RA0, RA1 | return address registers |
| 98 | CA | computed address register |
| 99 to 102 | reserved |  |
| 103 | EIP | exception instruction pointer |
| 104 to 111 | reserved |  |
| 112 to 115 | CR0 to CR3 | compare result registers |
| 116 to 123 | reserved |  |
| 124 | reserved |  |
| 125 | CR <all> | all compare results registers |
| 126 | reserved | not used |
| 127 | none | instruction without a target |

## Control and Status Registers

### Overview

There are numerous special purpose control and status registers in the design. Some registers are present to store variables for performance reasons that would otherwise be stored in main memory.

### U\_SEMA (CSR 0x00C) Semaphores

This register is available for user semaphores or flag use. Bits in this CSR may be set or cleared with one of the CSRxx instructions. This register has individual bit set / clear capability.

### S\_PTA (0x103)

This register contains the base address of the highest-level page directory for memory management, the paging table depth and the size of the pages mapped. The base address must be page aligned (16kB).

|  |  |  |  |  |
| --- | --- | --- | --- | --- |
| 63 14 | 13 11 | 10 8 | 7 6 0 | |
| Paging Directory Base Address63..14 | ~ | TD | S | ~ |

|  |  |  |  |  |
| --- | --- | --- | --- | --- |
| TD |  |  | S |  |
| 0 | 1 level lookup |  | 0 | map 16kB pages |
| 1 | 2 level lookup |  | 1 | map 4MB pages |
| 2 | 3 level lookup |  |  |  |
| 3 | 4 level lookup |  |  |  |
| 4 to 7 | reserved |  |  |  |

### S\_TID (CSR 0x110)

This CSR register is reserved for use to contain the task id for the currently running task.

### S\_ASID – (CSR 0x11F)

This register contains the address space identifier (ASID) or memory map index (MMI). The ASID is used in this design to select (index into) a memory map in the paging tables.

### S\_KEYS – (CSR 0x120 to 0x122)

These registers contain a collection of keys associated with the process for the memory system. Each key is twenty bits in size. Each register contains three keys for a total of nine keys. All three registers are searched in parallel for keys matching the one associated with the memory page. Keyed memory enhances the security and reliability of the system.

|  |  |  |  |
| --- | --- | --- | --- |
| 63 60 | 59 40 | 39 20 | 19 0 |
| ~6 | key3 | key2 | key1 |

### Control Register Zero (CSR #300)

This register contains miscellaneous control bits including a bit to enable protected mode.

|  |  |  |
| --- | --- | --- |
| Bit |  | Description |
| 0 | Pe | Protected Mode Enable: 1 = enabled, 0 = disabled |
| 8 to 13 |  |  |
| 16 |  |  |
| 30 | DCE | data cache enable: 1=enabled, 0 = disabled |
| 32 | BPE | branch predictor enable: 1=enabled, 0=disabled |
| 34 | WBM | write buffer merging enable: 1 = enabled, 0 = disabled |
| 35 | SPLE | speculative load enable (1 = enable, 0 = disable) (0 default) |
| 36 |  |  |
| 63 | D | debug mode status. this bit is set during an interrupt routine if the processor was in debug mode when the interrupt occurred. |

This register supports bit set / clear CSR instructions.

DCE

Disabling the data cache is useful for some codes with large data sets to prevent cache loading of values that are used infrequently. Disabling the data cache may reduce security risks for some kinds of attacks. The instruction cache may not be disabled. Enabling / disabling the data cache is also available via the cache instruction.

BPE

Disabling branch prediction will significantly affect the cores performance but may be useful for debugging. Disabling branch prediction causes all branches to be predicted as not-taken. No entries will be updated in the branch history table if the branch predictor is disabled.

WBM bit

Merging of values stored to memory may be disabled by setting this bit. On reset write buffer merging is disabled because it is likely desirable to setup I/O devices. Many I/O devices require updates to individual bytes by separate store instructions. (Write buffer merging is not currently implemented).

SPLE

Enabling speculative loads give the processor better performance at an increased security risk to meltdown attacks.

### M\_HARTID (0x301)

This register contains a number that is externally supplied on the hartid\_i input bus to represent the hardware thread id or the core number. No core should have the value zero as the hartid.

### M\_TICK (0x302)

This register contains a tick count of the number of clock cycles that have passed since the last reset. Note that this register should not be used for precise timing as the processor’s clock frequency may vary for performance and power reasons. The TIME CSR may be used for wall-clock timing as it has its own timing source.

### M\_CAUSE (0x306)

This register contains a code indicating the cause of an exception or interrupt. The break handler will examine this code to determine what to do. Only the low order 8 bits are implemented. The high order bits read as zero and are not updateable.

### M\_BADADDR (CSR 0x307)

This register contains the effective address for a load / store operation that caused a memory management exception or a bus error. Note that the address of the instruction causing the exception is available in the XL register.

### M\_BAD\_INSTR (CSR 0x30B)

This register contains a copy of the exceptioned instruction.

### M\_SEMA (CSR 0x30C) Semaphores

This register is available for system semaphore or flag use. The least significant bit is tied to the reservation address status input (rb\_i). It will be set if a STOC instruction was successful. The least significant bit is also cleared automatically when an interrupt (BRK) or interrupt return (RTI) instruction is executed. Any one of the remaining bits may also be cleared by an RTI instruction. This could be a busy status bit for the interrupt routine. Bits in this CSR may be set or cleared with one of the CSRxx instructions. This register has individual bit set / clear capability.

|  |  |
| --- | --- |
| Semaphore | Usage Convention |
| 0 | LDDR / STDC status bit |
| 1 | system garbage collection protector |
| 2 | system |
| 3 | input / output focus list |
| 4 | keyboard |
| 5 | system busy |
| 6 | memory management |
| 7-63 | currently unassigned |

### M\_TVEC (0x330 to 0x335)

These registers contain the address of the exception handling routine for a given operating level. TVEC[5] (0x335) is used directly by hardware to form an address of the debug routine. The lower eight bits of TVEC[5] are not used. The lower bits of the exception address are determined from the operating level. TVEC[1] to TVEC[5] are used by the REX instruction.

### M\_PM\_STACK (0x340)

This register contains an eight-entry operating mode and interrupt mask stack. When an exception or interrupt occurs, this register is shifted to the left by four bits and the low order bits are set according to the exception mode, when an RTI instruction is executed this register is shifted to the right by four bits. On RTI the last stack entry is set to $B masking all interrupts on stack underflow. The low order four bits represent the current operating mode and interrupt mask. Only the low order 32 bits of the register are implemented.

### M\_RS\_STACK (0x343)

This register contains an eight-entry register set selection stack. When an exception or interrupt occurs, this register is shifted to the left by five bits and the exception register set is inserted into the low order five bits. When an RTx instruction is executed this register is shifted to the right by five bits. On RTx the last stack entry will be set to 31 which will select register set #31 (the debug register set) on stack underflow. Only the low order 40 bits of the register are implemented.

### M\_EIP (0x348)

This register contains the interrupt or exception instruction pointer register.

### FSTAT (CSR 0x014) Floating Point Status and Control Register

The floating-point status and control register may be read using the CSR instruction. Unlike other CSR’s the control register has its own dedicated instructions for update. See the section on floating point instructions for more information.

|  |  |  |  |
| --- | --- | --- | --- |
| Bit |  | Symbol | Description |
| 51:47 |  |  | reserved |
| 46:44 | **RM** | rm | rounding mode |
| 43 | **E5** | inexe | - inexact exception enable |
| 42 | **E4** | dbzxe | - divide by zero exception enable |
| 41 | **E3** | underxe | - underflow exception enable |
| 40 | **E2** | overxe | - overflow exception enable |
| 39 | **E1** | invopxe | - invalid operation exception enable |
| 38 | **NS** | ns | - non standard floating point indicator |
| **Result Status** | | | |
| 32 |  | fractie | - the last instruction (arithmetic or conversion) rounded intermediate result (or caused a disabled overflow exception) |
| 31 | **RA** | rawayz | rounded away from zero (fraction incremented) |
| 30 | **SC** | C | denormalized, negative zero, or quiet NaN |
| 29 | **SL** | neg < | the result is negative (and not zero) |
| 28 | **SG** | pos > | the result is positive (and not zero) |
| 27 | **SE** | zero = | the result is zero (negative or positive) |
| 26 | **SI** | inf ? | the result is infinite or quiet NaN |
| **Exception Occurrence** | | | |
| 21 to 25 |  |  | reserved |
| 20 | **X6** | swt | {reserved} - set this bit using software to trigger an invalid operation |
| 19 | **X5** | inerx | - inexact result exception occurred (sticky) |
| 18 | **X4** | dbzx | - divide by zero exception occurred |
| 17 | **X3** | underx | - underflow exception occurred |
| 16 | **X2** | overx | - overflow exception occurred |
| 15 | **X1** | giopx | - global invalid operation exception – set if any invalid operation exception has occurred |
| 14 | **GX** | gx | - global exception indicator – set if any enabled exception has happened |
| 13 | **SX** | sumx | - summary exception – set if any exception could occur if it was enabled  - can only be cleared by software |
| **Exception Type Resolution** | | | |
| 8 to 12 |  |  | reserved |
| 7 | **X1T** | cvt | - attempt to convert NaN or too large to integer |
| 6 | **X1T** | sqrtx | - square root of non-zero negative |
| 5 | **X1T** | NaNCmp | - comparison of NaN not using unordered comparison instructions |
| 4 | **X1T** | infzero | - multiply infinity by zero |
| 3 | **X1T** | zerozero | - division of zero by zero |
| 2 | **X1T** | infdiv | - division of infinities |
| 1 | **X1T** | subinfx | - subtraction of infinities |
| 0 | **X1T** | snanx | - signaling NaN |

### M\_INFO (0x3F0 to 0x3FF)

This set of registers contains general information about the core including the manufacturer name, cpu class and name, and model number.

### D\_REGSET (0x512)

This register controls which register set is used for register access for each register field in an instruction. If the bit in the regset CSR is clear then the current register is selected, otherwise the previous register set is selected.

|  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- |
| 63 5 | 4 | 3 | 2 | 1 | 0 |
| reserved | EIP | Rs3 | Rs2 | Rs1 | Rd |

Updating the regset CSR returns the current value of the CSR in the newly selected Rd register set.

### D\_DBADx (CSR 0x518 to 0x51B) Debug Address Register

These registers contain addresses of instruction or data breakpoints. The registers may also be used as trace triggering address registers.

|  |
| --- |
| 63 0 |
| Address 63..0 |

### D\_DBCR (CSR 0x51C) Debug Control Register

This register contains bits controlling the circumstances under which a debug interrupt will occur.

|  |  |  |  |
| --- | --- | --- | --- |
| bits |  |  |  |
| 3 to 0 | Enables a specific debug address register to do address matching. If the corresponding bit in this register is set and the address (instruction or data) matches the address in the debug address register then a debug interrupt will be taken. |  |  |
| 17, 16 | This pair of bits determine what should match the debug address register zero in order for a debug interrupt to occur.   |  |  |  | | --- | --- | --- | | 17:16 |  |  | | 00 | match the instruction address |  | | 01 | match a data store address |  | | 10 | reserved |  | | 11 | match a data load or store address |  | |  |  |
| 19, 18 | This pair of bits determine how many of the address bits need to match in order to be considered a match to the debug address register. These bits are ignored when matching instruction addresses, which are always half-word aligned.   |  |  |  | | --- | --- | --- | | 19:18 |  | Size | | 00 | all bits must match | byte | | 01 | all but the least significant bit should match | char | | 10 | all but the two LSB’s should match | tetra | | 11 | all but the three LSB’s should match | octa | |  |  |
| 23 to 20 | Same as 16 to 19 except for debug address register one. |  |  |
| 27 to 24 | Same as 16 to 19 except for debug address register two. |  |  |
| 31 to 28 | Same as 16 to 19 except for debug address register three. |  |  |
| 32 to 35 | Trace enable on address register |  |  |
| 36 | Enable branch compression for trace. |  |  |
| 55 to 62 | These bits are a history stack for single stepping mode. An exception will automatically disable single stepping mode and record the single step mode state on stack. Returning from an exception pops the single step mode state from the stack. |  |  |
| 63 | This bit enables SSM (single stepping mode) |  |  |

### D\_DBSR (CSR 0x51D) - Debug Status Register

This register contains bits indicating which addresses matched. These bits are set when an address match occurs and must be reset by software.

|  |  |
| --- | --- |
| bit |  |
| 0 | matched address register zero |
| 1 | matched address register one |
| 2 | matched address register two |
| 3 | matched address register three |
| 63 to 4 | not used, reserved |

## Operating Levels

The core has six operating modes. The highest operating mode is operating mode five which is called the debug operating mode. Operating mode five has complete access to the machine including special registers reserved for debug. Other operating levels may have more restricted access. When an interrupt occurs, the operating mode is set to the interrupt mode. The core vectors to an address depending on the current operating mode. When not operating at user mode addresses are not subjected to translation and the virtual address and physical address are the same.

|  |  |
| --- | --- |
| Operating Mode | Moniker |
| 0 | user |
| 1 | supervisor |
| 2 | hypervisor |
| 3 | machine |
| 4 | interrupt |
| 5 | debug |

### Switching Operating Modes

The operating mode is automatically switched to the interrupt mode when an interrupt occurs. The BRK instruction may be used to switch operating modes. The REX instruction may also be used by an interrupt handler to switch the operating mode to a lower mode. One of the exception return instructions (RTI, RTD, RTE) will switch the operating level back to what it was prior to the interrupt or exception.

# Exceptions

## External Interrupts

There is little difference between an externally generated exception and an internally generated one. An externally caused exception will force a BRK instruction into the instruction stream. The BRK instruction contains a cause code identifying the external interrupt source.

## Polling for Interrupts

To support code that needs to run with interrupts disabled an interrupt polling instruction (PFI) is provided in the instruction set. For instance, the system could be running a high priority task with interrupts disabled. There may be sections of code where it is possible to process an interrupt however. In some code environments, it is not enough to disable and enable interrupts around critical code. The code must be effectively run with interrupt disabled all the time. This makes it necessary to poll for interrupts in software. For instance, stack prologue code may cause false pointer matches for the garbage collector because stack space is allocated before the contents are defined. If the GC scan occurs on this allocated but undefined area of memory, there could be false matches.

## Effect on Machine Status

The operating mode is always switched to the debug mode on exception. It is up to the debug mode code to redirect the exception to a lower operating mode when desired. Further exceptions at the same or lower interrupt level are disabled automatically. Debug mode code must enable interrupts at some point.

## Exception Stack

The current register set, operating mode and interrupt enable bits are pushed onto an internal stack when an exception occurs. This stack is only eight entries deep as that is the maximum amount of nesting that can occur. Further nesting of exceptions can be achieved by saving the state contained in the exception registers.

## Exception Vectoring

Exceptions are handled through a vector table. The vector table has six entries, one for each operating level the core may be running at. The location of the vector table is determined by TVEC[5]. If the core is operating at mode three for instance and an interrupt occurs vector table address number three is used for the interrupt handler. Note that the interrupt automatically switches the core to operating mode five. An exception handler at the machine level may redirect exceptions to a lower level handler identified in one of the vector registers. More specific exception information is supplied in the cause register.

|  |  |  |
| --- | --- | --- |
| Operating Level | Address (If TVEC[5] contains $F…FC0000) |  |
| 0 | $F…FC0000 | Handler for operating level zero |
| 1 | $F…FC0020 |  |
| 2 | $F…FC0040 |  |
| 3 | $F…FC0060 |  |
| 4 | $F…FC0080 |  |
| 5 | $F…FC00A0 |  |

## Reset

The core begins executing instructions at address $F…FC0100. All registers are in an undefined state. Register set #0 is selected.

## Precision

Exceptions in rtf64 are precise. They are processed according to program order of the instructions. If an exception occurs during the execution of an instruction, then an exception field is set in the reorder buffer. The exception is processed when the instruction commits which happens in program order. If the instruction was executed in a speculative fashion, then no exception processing will be invoked unless the instruction makes it to the commit stage.

## Exception Cause Codes

The following table outlines the cause code for a given purpose. These codes are specific to RTF64. Under the HW column an ‘x’ indicates that the exception is internally generated by the processor; the cause code is hard-wired to that use. An ‘e’ indicates an externally generated interrupt, the usage may vary depending on the system.

|  |  |  |  |  |
| --- | --- | --- | --- | --- |
| Cause Code |  | HW | Description |  |
|  |  |  |  |  |
| 1 | IBE | x | instruction bus error |  |
| 2 | EXF | x | Executable fault |  |
| 4 | TLB | x | tlb miss |  |
|  |  |  | FMTK Scheduler |  |
| 128 |  | e |  |  |
| 129 | KRST | e | Keyboard reset interrupt |  |
| 130 | MSI | e | Millisecond Interrupt |  |
| 131 | TICK | e |  |  |
| 156 | KBD | e | Keyboard interrupt |  |
| 157 | GCS | e | Garbage collect stop |  |
| 158 | GC | e | Garbage collect |  |
| 159 | TSI | e | FMTK Time Slice Interrupt |  |
| 3 |  |  | Control-C pressed |  |
| 20 |  |  | Control-T pressed |  |
| 26 |  |  | Control-Z pressed |  |
|  |  |  |  |  |
| 32 | SSM | x | single step |  |
| 33 | DBG | x | debug exception |  |
| 34 | TGT | x | call target exception |  |
| 35 | MEM | x | memory fault |  |
| 36 | IADR | x | bad instruction address |  |
| 37 | UNIMP | x | unimplemented instruction |  |
| 38 | FLT | x | floating point exception |  |
| 39 | CHK | x | bounds check exception |  |
| 40 | DBZ | x | divide by zero |  |
| 41 | OFL | x | overflow |  |
|  |  |  |  |  |
| 47 |  |  |  |  |
| 48 | ALN | x | data alignment |  |
| 49 | KEY | x | memory key fault |  |
| 50 | DWF | x | Data write fault |  |
| 51 | DRF | x | data read fault |  |
| 52 | SGB | x | segment bounds violation |  |
| 53 | PRIV | x | privilege level violation |  |
| 54 | CMT | x | commit timeout |  |
| 55 | BT | x | branch target |  |
| 56 | STK | x | stack fault |  |
| 57 | CPF | x | code page fault |  |
| 58 | DPF | x | data page fault |  |
| 60 | DBE | x | data bus error |  |
| 61 | PMA | x | physical memory attributes check fail |  |
| 62 | NMI | x | Non-maskable interrupt |  |
|  |  |  |  |  |
| 225 | FPX\_IOP | x | Floating point invalid operation |  |
| 226 | FPX\_DBZ | x | Floating point divide by zero |  |
| 227 | FPX\_OVER | x | floating point overflow |  |
| 228 | FPX\_UNDER | x | floating point underflow |  |
| 229 | FPX\_INEXACT | x | floating point inexact |  |
| 231 | FPX\_SWT | x | floating point software triggered |  |
|  |  |  |  |  |
| 240 | SYS |  | Call operating system (FMTK) |  |
| 241 |  |  | FMTK Schedule interrupt |  |
| 242 | TMR | x | system timer interrupt |  |
| 243 | GCI | x | garbage collect interrupt |  |
| 255 | PFI |  | reserved for poll-for-interrupt instruction |  |

### DBG

A debug exception occurs if there is a match between a data or instruction address and an address in one of the debug address registers.

### IADR

This exception is currently not implemented but reserved for the purpose of identifying bad instruction addresses. If the two least significant bits of the instruction address are non-zero then this exception will occur.

### UNIMP

This exception occurs if an instruction is encountered that is not supported by the processor. It may also occur if there is an attempt to use an instruction in a mode that does not support it.

### OFL

If an arithmetic operation overflows (multiply, add, or shift) and the overflow exception is enabled in the arithmetic exception enable register then an OFL exception will be triggered.

### KEY

This fault will occur if an attempt is made to access memory for which the app does not have the key.

### FLT

A floating-point exception is triggered if an exceptional condition occurs in the floating-point unit and the exception is enabled. Please see the section on floating-point for more details.

### DRF, DWF, EXF

Data read fault, data write fault, and execute fault are exceptions that are returned by the memory management unit when an attempt is made to access memory for which the corresponding access type is not allowed. For instance, if the memory page is marked as non-executable an attempt is made to load the instruction cache from the page then an execute fault EXF exception will occur.

### CPF, DPF

The code page fault and data page fault exceptions are activated by the mmu if the page is not present in memory. Access may be allowed but simply unavailable. These faults are not currently implemented.

### PRIV

Some instructions and CSR registers are legal to use only at a higher operating level. If an attempt is made to use the privileged instruction by a lower operating level, then a privilege violation exception may occur. For instance, attempting to use RTI instruction from user operating level.

### STK

If the value loaded into one of the stack pointer registers (the stack pointer sp or frame pointer fp) is outside of the bounds defined by the stack bounds registers, then a stack fault exception will be triggered.

### DBE

A timeout signal is typically wired to the err\_i input of the core and if the data memory does not respond with an ack\_i signal fast enough an error will be triggered. This will happen most often when the core is attempting to access an unimplemented memory area for which no ack signal is generated. When the err\_i input is activated during a data fetch, an exception is flagged in a result register for the instruction. The core will process the exception when the instruction commits. If the instruction does not commit (it could be a speculated load instruction) then the exception will not be processed.

### PMA

The addressed memory did not pass the physical memory attributes testing. For example a write operation attempted to a ROM address space.

### IBE

A timeout signal is typically wired to the err\_i input of the core and if the instruction memory does not respond with an ack\_i signal fast enough and error will be triggered. This will happen most often when the core is attempting to access an unimplemented memory area for which no ack signal is generated. When the err\_i input is activated during an instruction fetch, a breakpoint instruction is loaded into the cache at the address of the error.

### NMI

Non-maskable interrupt.

### BT

The core will generate the BT (branch target) exception if a branch instruction points back to itself. Branch instructions in this sense include jump (JMP) and call (CALL) instructions.

## Caches

### Overview

The core has an instruction cache to improve performance.. The cache is a direct mapped cache that is physically indexed. The size of the cache is configurable with a minimum size of 2kB. The cache size of the instruction is available for reference from one of the INFO CSR registers.

### Instructions

Since the instruction format affects the cache design it is mentioned here. For this design instructions are a single size. Specific formats are listed under the instruction set description section of this book.

The [CACHE](#_CACHE_–_Cache) instruction may be used to invalidate individual lines of the cache, or the entire cache.

### L1 Instruction Cache

L1 defaults to 2kB in size and is made from distributed ram to get single cycle read performance. L1 is organized as 64 lines of 32-bytes.

# Relative Addressing

The core does not use relative addressing for branches, jumps, or calls. It uses absolute addresses. With the presence of a virtual memory system it is not necessary to have relative addressing. With a virtual memory system all programs can be assumed to begin at a fixed address.

*The author feels that with a virtual memory system present the need for relative addressing is greatly reduced. One of the characteristics of relative addressing is that the relative displacement can often be much smaller than the addressable memory of the machine. Meaning a displacement does not consume many opcode bits. Relative addressing for branches allowed significant code compression for many micro architectures. This is important when memory space is extremely limited. Given that the core has a fixed size instruction, there is no real memory space savings to be had by using relative addressing.*

*There are known work arounds to a lack of relative addressing. If needed the target addresses in the program may be modified to accommodate a move of the program in the virtual memory space. Moving a program that has only absolute addressing available is a task that has been done frequently historically. It is not impossible to move a program that uses absolute addresses, the question that comes to mind is why?*

The core does allow any register to be used with a displacement when forming addresses for data access. So, for data access relative addressing is present.

# Memory Access Alignment

The core supports unaligned data memory access; however, it does not guarantee the atomicity of the access.

# Memory Management Unit - MMU

## Introduction

Many systems can benefit from the provision of virtual memory management. Virtual memory may be used to protect the address space of one app from another. Virtual memory can enhance the reliability and security of a system.

The simplified system MMU provides minimalistic base and bound and paging capabilities for a small to mid size system. There are two options available for paging, a simple page map ram, and a software managed TLB. The page mapping ram is not suitable for larger systems as the paging tables would be too large. Base bound and paging are applied only to user mode apps. In other operating modes the system sees a flat address space with no restrictions on access. Base address generation is applied to virtual addresses first to generate a linear address which is then mapped using a paged mapping system. Access rights are governed by the base register since all pages in the based on the same address are likely to require the same access. Support for access rights is optional if it is desired to reduce the hardware cost. To simplify hardware there are no bound registers. Bounds are determined by what memory is mapped into the base address area.

Associated with each memory page and stored in its own table is a memory key. The memory key is matched against the keyset in CSR registers. Access to the memory page is allowed only if one of the keys in the keyset matches the memory key, or if the page is marked generally accessible with the special key of zero. Memory may be shared between apps that share the same memory key.

## Base Registers

The upper address bits of a virtual or effective address are not used for addressing memory and are available to select base register. The MMU includes 16 base registers. The base register in use is selected by the upper nybble of the virtual address. If the program address has all ones in bits 24 to 63 then base addressing is bypassed. This provides a shared program area containing the BIOS and OS code.

|  |  |  |
| --- | --- | --- |
| Base Regno | Usage | Selected By |
| 0 to 7 | data | bits 60 to 63 of effective address |
| 8, 9 | reserved | bits 60 to 63 of effective address |
| 10 | Stack | bits 60 to 63 of effective address |
| 11 | I/O | bits 60 to 63 of effective address |
| 12 to 15 | code | bits 60, 63 of instruction pointer |

### Base Register Format

|  |  |
| --- | --- |
| 63 4 | 3 0 |
| Base Address60 | RWX |

The low order four bits of the base register are reserved for access rights bits. Supporting memory access rights is optional.

R: 1 = segment readable

W: 1 = segment writeable

X: 1 = segment executable

### Base Register Access

Base registers may be read and altered using the [MVSEG](#_MVSEG_–_Move) instruction. The MVSEG instruction works in an indirect fashion as described in the text.

## Linear Address Generation

The base address value contained in the upper 60 bits of a base register is shifted left 14 bits before being added to the virtual address. This gives potentially a 74-bit address space.

Note there is no limit or bound register. Access is limited by what is mapped into the segment. Pages that are inaccessible use the reserved page number of all ones.

## The Page Map

The page directly maps virtual address pages to physical ones. The page map is a dedicated memory internal to the processing core accessible with the custom ‘mvmap’ instruction. It is similar in operation to a TLB but is much simpler. TLB’s cache address translations and create TLB miss exceptions. Page walks of mapping tables are required to update the TLB on a miss. There are no exceptions associated with the page mapping table.

In addition to based addresses, memory is divided up into 16kB pages which are mapped. There are 32 memory maps available. A memory map represents an address space; a five-bit address space identifier is in use. Address spaces will need to be shared if more than 32 apps are running in the system. The desire is to keep the mapping tables small so they may fit into a small number of standard memory blocks. For instance, for the sample system there are 4096 pages required to map the 256MB address space. Any individual app is limited to maximum of 64MB (one quarter of the memory available). The virtual page number is used to lookup the physical page in the page mapping table. Addresses with the top eight bits set are not mapped to allow access to the system ROM. Pages that are inaccessible use the reserved physical page number of all ones.

The page mapping table is indexed by the ASID and the virtual page number to determine the physical page. The ‘mvmap’ instruction uses Rs1 to contain a mapping table index. Bits 16 to 20 of Rs1 are the ASID, bits 0 to 15 of Rs1 are used for the virtual page number. It is expected that the virtual page number is a small number, in this case 12 bits. Rs2 contains the new value of the physical page. The current value of the physical page is placed in Rd when the instruction executes.

|  |  |  |
| --- | --- | --- |
| ASID5 | Virtual Page | Physical Page |
| 0 | 0 | 10 |
| 1 | 11 |
| … |  |
| 4094 | 18 |
| 4095 | 19 |
| 1 | 0 |  |
| 1 |  |
| … |  |
| 4094 |  |
| 4095 |  |
| … 30 more address spaces | |  |

The low order 14 bits of an address pass through both linear address generation and paging unchanged.

### The 16kB Page

Many memory systems use a 4kB page size. A 16kB page size is used here mainly to restrict the number of page entries in the page map table. A smaller page size would result in too many pages of memory to support multiple tasks. Even given a 16kB page size there are still 4096 pages of memory available in a map.

*The author was tempted to divide the page mapping table into several different regions capable of mapping different amounts of the address space (small, medium, and large areas). This potentially could allow more memory maps to be present while at the same time not increasing the page table size. However, it would add extra complexity to the memory system which is currently simpler in nature.*

### The MVMAP Instruction

The memory mapping table is managed with a dedicated instruction - [MVMAP](#_MVMAP_–_Move). MVMAP allows high-speed access to the mapping table.

*While the memory mapping table could have been managed with CSRs or possibly be mapped into the main memory space, the author feels that having a dedicated instruction makes the software managing the tables simpler and cleaner.*

Rs1:

|  |  |  |
| --- | --- | --- |
| 63 20 | 20 16 | 15 0 |
| Unused - should be zero | ASID5 | Virtual page number 16 bits max |

## TLB – Translation Lookaside Buffer

### Overview

The page map is limited in the translations it can perform because of its size. The solution to allowing more memory to be mapped is to use main memory to store the translations tables, then cache address translations in a translation look-aside buffer or TLB. This is sometimes also called an address translation cache ATC. The TLB offers a means of address virtualization and memory protection. A TLB works by caching address mappings between a real physical address and a virtual address used by software. The TLB deals with memory organized as pages. Typically, software manages a paging table whose entries are loaded into the TLB as translations are required.

### Size / Organization

The TLB has 1024 entries per set. The size was chosen as it is the size of one block ram for 32-bit data in the FPGA. This is quite a large TLB. Many systems use smaller TLBs. There is not really a need for such a large one, however it is available.

The TLB is organized as a four-way set associative cache.

### What is Translated

The TLB processes all user mode addresses including both instruction and data addresses. It is known as a *unified* TLB. Addresses in other modes of operation are not translated. Additionally, addresses with the top forty bits set are not translated to allow access to the BIOS and system rom.

### Page Size

Because the TLB caches address translations it can get away with a much smaller page size than the page map can for a larger memory system. 4kB is a common size for many systems. In this case the TLB uses 16kB pages to match the size of pages for keyed memory and segmentation. For a 256MB system (the size of the memory in the test system) there are 16,384 16kB pages.

### Management

The rtf64 TLB unit is a software managed TLB. When a translation miss occurs, an exception is generated to allow software to update the TLB. It is left up to software to decide how to update the TLB. There may be a set of hierarchal page tables in memory, or there could be a hash table used to store translations.

The TLB is updated using the TLBRW instruction which both reads and writes the TLB. More descriptive text is present at the [TLBRW](#_TLBRW_–_Read) instruction description.

### Flushing the TLB

The TLB maintains the address space (ASID) associated with a virtual address. This allows the TLB translations to be used without having to flush old translations from the TLB during a task switch.

#### Global Bit

In addition to the ASID the TLB entries contain a bit that indicates that the translation is a global translation and should be present in every address space.

## PAM – Page Allocation Map

### Overview

The PAM is a software structure made up of 16,384 bit-pairs stored in memory. There is a bit pair for each possible physical memory page. The PAM is used by software to manage the allocation of physical pages of memory.

*The author initially had a specialized PAM unit and supporting instructions in the core. However, there was difficulty getting the unit to work correctly and it was also limited in size. The PAM can be easily managed by software which makes use of EXT and DEP instructions to perform bit manipulations.*

### Memory Usage

Total memory used by the PAM is 4kB.

### Organization

The PAM is organized as a string of bit-pairs, one pair for each physical memory page. Bit pairs are used rather than single bits to mark allocated pages as it is convenient to also mark runs of pages. Marking runs of pages using bit-pairs makes it possible to free the pages of a previous allocation.

|  |  |
| --- | --- |
| Bit-Pair Value | Meaning |
| 0 | Page of memory is free, available for use. |
| 1 | reserved |
| 2 | Page is allocated, end of run of pages |
| 3 | Page is allocated |

## PMA - Physical Memory Attributes Checker

### Overview

The physical memory attributes checker is a hardware module that ensures that memory is being accessed correctly according to its physical attributes.

Physical memory attributes are stored in an eight-entry table. This table includes the address range the attributes apply to and the attributes themselves. Address ranges are resolved only to bit four of the address. Meaning the granularity of the check is 16 bytes.

Most of the entries in the table are hard-coded and configured when the system is built.

Physical memory attributes checking is applied in all operating modes.

### Register Description

|  |  |  |  |
| --- | --- | --- | --- |
| Regno | Bits |  |  |
| 00 | 64 | LB0 | lower bound - address bits 4 to 67 of the physical address range |
| 08 | 64 | UB0 | upper bound - address bits 4 to 67 of the physical address range |
| 10 | 16 | AT0 | memory attributes |
| 18 | ~ | ~ | reserved |
| … | … | … | 6 more register sets |
| E0 | 64 | LB7 | lower bound - address bits 4 to 67 of the physical address range |
| E8 | 64 | UB7 | upper bound - address bits 4 to 67 of the physical address range |
| F0 | 16 | AT7 | memory attributes |
| F8 | ~ | ~ | reserved |

### Attributes

|  |  |  |
| --- | --- | --- |
| Bitno |  |  |
| 0 | X | may contain executable code |
| 1 | W | may be written to |
| 2 | R | may be read |
| 3 | C | may be cached |
| 4-6 | G | granularity   |  |  | | --- | --- | | G |  | | 0 | byte accessible | | 1 | wyde accessible | | 2 | tetra accessible | | 3 | octa accessible | | 4 to 7 | reserved | |
| 7 | ~ | reserved |
| 8-15 | T | device type (rom, dram, eeprom, I/O, etc) |

## Key Memory

### Overview

Associated with each page of memory is a memory key. To access a page of memory the memory key must match with one of the keys in the applications keyset. The keyset is maintained in the keys CSRs. The key size of 20 bits is a minimum size recommended for security purposes.

Since each page of memory requires a 20-bit key and there are 16,384 pages in the system a 40kB memory is required to store keys (10 block rams).

### The SETKEY Instruction

To access the key memory, the SETKEY instruction is used. This instruction may set or get the current key for a memory page. For SETKEY the Rs1 register contains a value in the following format:

|  |  |  |  |
| --- | --- | --- | --- |
| 63 46 | 45 32 | 31 20 | 19 0 |
| ~ | Physical Page Number | ~ | Key Value |

More detail is present in the instruction description.

## Card Memory

### Overview

Also present in the memory system is Card memory. The card memory is a telescopic memory which reflects with increasing detail where in the memory system a pointer write has occurred. This is for the benefit of garbage collection systems. Card memory is updated when a pointer value is stored to memory. (The store pointer [STPTR](#_STPTR_–_Store) instruction is used to do this).

*Garbage collection has become more prevalent in many systems and the author feels it makes sense to provide some hardware support for this function.*

### Organization

Memory is divided into 512-byte card memory pages. Each card has a single bit recording whether a pointer store has taken place in the corresponding memory area. To cover a 256MB memory system 64kB card memory (16 block rams) is required at the outermost layer. The outer most 64kB card memory layer is itself divided into 64-bit regions and covered with a 512B memory. Note that each bit represents the pointer store status for a 64kB region. The 512B memory is further resolved into 64-byte regions and covered with an 8B (64 bit) memory or one octa-byte.

Table showing increasing resolution of telescopic memory. Note the layer 1 memory access is 32-bits at a time rather than 64-bit like the other layers. This is due to block ram multiplexing issues. The block rams can only adapt from 1 bit wide on one port to 32-bits wide on the other (dual ported ram is required). It is accessible as 128, 32-bit words where each bit in the word represents a 64kB memory area. Each bit in layer zero represents a 4MB region of memory.

|  |  |  |  |  |
| --- | --- | --- | --- | --- |
| Layer | Resolving Power | | | |
| 0 | 64 bits | 4MB regions | | |
| 1 | 4096 bits | | 64kB regions | |
| 2 | 524288 bits | | | 512B regions |

*The choice of 512 bytes comes from the desire to minimize the amount of card memory required. Card memory is implemented using dedicated block memory in an FPGA and there are only so many block memories available. The author would have preferred a better resolution, but it may consume to many memory blocks.*

### Operation

As a program progresses it writes pointer values to memory using the store pointer instruction. Storing a pointer triggers an update to all the layers of card memory corresponding to the main memory location written. A bit is set in each layer of the card memory system corresponding to the memory location of the pointer store.

The garbage collection system can very quickly determine where pointer stores have occurred and skip over memory that has not been modified.

### Accessing Card Memory (GCCLR)

Card memory is accessible with the [GCCLR](#_GCCLR_–_Garbage) instruction. The GCCLR instruction may be used to return the status of the card memory then clear it. The instruction has an ability to atomically access the memory reading current status and updating the memory as one operation.

*The author initially had the card memory memory-mapped into different sized areas for each layer of the card memory. However, accessing it as memory would be slow and it complicated the memory states of the core. Accessing the memory via an instruction is an order of magnitude faster than using the memory system.*

### Summary

Having dedicated card memory and instruction support is more expensive than managing the card memory via software alone. It should offer some improvement in terms of performance and software simplicity.

# Debugging Unit

## Overview

The RTF64 has several debug features including debug exceptions on address matches and instruction tracing. Instruction trace trigger registers are shared with the debug address registers. Which function is triggered on an address match is controlled in the debug control register.

## Instruction Tracing

Instruction tracing is enabled by setting the trace enable bit (bit 32 to 35) for the corresponding debug address match register. Tracing will begin when an address match occurs and continue until the trace buffer is full. The trace queue is 8kB in size allowing thousands of instructions to be traced.

## Trace Queue Entry Format

The trace queue stores both complete instruction pointer addresses and branch taken-not-taken (TNT) history. The low order two bits of the trace entry indicate the type of record stored by the entry. There are currently two record types. Record type zero is an instruction pointer address. Record type one is a history record for branches. As instructions are always tetra byte aligned the two least significant bits of the address are not stored.

|  |  |  |
| --- | --- | --- |
| 63 2 | | 1 0 |
|  | | Rectype2 |
| Instruction Pointer bits 2 to 63 | | 00 |
| 63 8 | 7 2 | 1 0 |
| Branch Taken-Not-Taken History56 | count | 01 |

Up to 56 bits of branch TNT history may be stored in a single record. The number of bits stored is recorded in bits 2 to 7 of the record. After four full branch TNT history records, the trace will record the current instruction address in whole.

## Trace Readback

A trace of instructions executed may be read back from the trace queue using the PEEKQ and POPQ instructions. The processor trace queue is accessible as queue number 14 and 15. Queue 14 contains the queue status (empty, data valid and data count). Queue 15 contains the raw history record. Software should peek queue 14 to see if data is available, then pop queue 15 to get the data record. Popping queue 15 also advances the queue status in queue number 14.

# Instruction Formats

## Length

Instructions vary in length; they may be 8 to 64 bits long.

## Constants

Constants which will not fit into the 13-bit constant field of an instruction may be encoded using the ADD instruction which directly supports constants up to sixty-four bits in size.

|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | | | | | | | | | | | | | | | | | | | Cause8 | | | | | | | | | 00h | BRK |
| r | | | | Funct3 | | | | | | | | | | Rs3 | | | | | | | | | | | | Rs2 | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 01h | {Reg3A} |
| r | | | | Funct5 | | | | | | | | | | | | | | Fmt3 | | | | | | | | Rs2 | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 02h | {Reg2A} |
| r | | | | 35 | | | | | | | | | | | | | | Fn3 | | | | | | | | Rs2 | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 02h | BMM |
| r | | | | 125 | | | | | | | | | | | | | | Fmt3 | | | | | | | | Func5b | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 02h | {R1} |
| r | | | | 135 | | | | | | | | | | | | | | Fmt3 | | | | | | | | ~ | | s2 | | | d2 | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 02h | MOV |
| r | | | | 24/255 | | | | | | | | | | | | sh4 | | | | | | | | | | Rs2 | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 02h | PTRDIF |
| r | | | | Funct3 | | | | | | | | | | Rs3 | | | | | | | | | | | | Rs2 | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 03h | {Reg3B} |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 04h | ADD |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 05h | SUBF |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 06h | MUL |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 08h | AND |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 09h | OR |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 0Ah | EOR |
| r | | | | Fmt3 | | | | | | | | | | | Funct4 | | | | | | | ~ | | | | Rs2 | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 0Ch | {SHIFT} |
| r | | | | Fmt3 | | | | | | | | | | | Funct4 | | | | | | | Const5..0 | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 0Ch | {SHIFT} |
| r | | Fn3 | | | | | | | | | | ~2 | | | | | Fmt3 | | | | | | | | | Rs2 | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 0Dh | {SET} |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 0Eh | MULU |
| Fn8 | | | r | | | | ~ | | Regno12 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 0Fh | CSR |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 10h | DIV |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 11h | DIVU |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 12h | DIVSU |
| r | | | | Funct5 | | | | | | | | | | | | | | Fmt3 | | | | | | | | Rs2 | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 13h | {Reg2B} |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 15h | MULF |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 16h | MULSU |
| Cst16 | | | | r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 17h | PERM |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 18h | REM |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 19h | REMU |
| r | | | | ~6 | | | | | | | | | | | | Constant7..0 | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 1Ah | BYTNDX |
| Cst8 | | | r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 1Bh | WYDNDX |
| r | | | | 0 | | | | | Bw6 | | | | | | | | | | | | Bo6 | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 1Ch | EXT |
| r | | | | 1 | | | | | Bw6 | | | | | | | | | | | | Bo6 | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 1Ch | EXTU |
| r | | | 0 | | | | | | Bw6 | | | | | | | | | | | | Bo6 | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 1Dh | DEP |
| r | | | 1 | | | | | | Bw6 | | | | | | | | | | | | Bo6 | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 1Dh | FLIP |
| r | | | C | | | | | | Bw6 | | | | | | | | | | | | Bo6 | | | | | | | | | | | | | | | C4..0 | | | | | | | | | Rd | | | | | | | | 1Eh | DEPI |
| r | | | 0 | | | | | | Bw6 | | | | | | | | | | | | Bo6 | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 1Fh | FFO |
| r | | | 1 | | | | | | ~2 | | | | Rs3 | | | | | | | | | | | Rs2 | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 1Fh | FFO |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 20h | REMSU |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 21h | DIVR |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 24h | SAND |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 25h | SOR |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 26h | SEQ |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 27h | SNE |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 28h | SLT |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 29h | SGE |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 2Ah | SLE |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 2Bh | SGT |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 2Ch | SLTU |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 2Dh | SGEU |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 2Eh | SLEU |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | mop3 | | Cd2 | | | | | | 2Fh | SGTU |
| r | | | | Constant4..0 | | | | | | | | | | | Rs1 | | | | | | | | | | | Rd | | | | | | | | 30h | ADD |
| r | | | | Constant4..0 | | | | | | | | | | | Rs1 | | | | | | | | | | | Rd | | | | | | | | 31h | OR |
| r | | | | Constant22..13 | | | | | | | | | | | | | | | | | | | | | | Rd | | | | | | | | 32h | ADD |
| r | | | | Rs2 | | | | | | | | | | | | Rs1 | | | | | | | | | | Rd | | | | | | | | 33h | ADD |
| r | | | | Rs2 | | | | | | | | | | | | Rs1 | | | | | | | | | | Rd | | | | | | | | 34h | OR |
| r | | | | Constant9..3 | | | | | | | | | | 36h | GCSUB |
| r | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 37h | GCSUB |
| Cnst63..32 | | | | | r | | | | Constant31..14 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rd | | | | | | | | 38-39h | ADDUI |
| Cnst63..32 | | | | | r | | | | Constant31..14 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rd | | | | | | | | 3A-3Bh | ANDUI |
| Cnst63..32 | | | | | r | | | | Constant31..14 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rd | | | | | | | | 3C-3Dh | ORUI |
| Cnst63..32 | | | | | r | | | | Constant31..14 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rd | | | | | | | | 3E-3Fh | AUIIP |
| Displacement21..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | c | | | | | Lk | 40h | BLR |
| Target31..2 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | c | | | | | m | 41h | JMP |
| Target31..2 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | c | | | | | m | 42h | JSR |
| r | | | Constant9..3 | | | | | | | | | | 43h | RTS |
| Constant13..3 | | | | | | | | | | | | | | | | | | | | | | | | | RO4 | | | | | Lk | | | 44h | RTL |
| Sema6 | | | | | | | | | | | | | | | | ~5 | | | | | | | | | RO5 | | | | | | | | 45h | RTE |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 46h | BEQ |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 47h | BNE |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 48h | BLT |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 49h | BGE |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 4Ah | BLE |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 4Bh | BGT |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 4Ch | BLTU |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 4Dh | BGEU |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 4Eh | BLEU |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 4Fh | BGTU |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 50h | BVC |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 51h | BVS |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 52h | BOD |
| Displacement11..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cst3 | | | | | | | Rs1 | | | | | | | | | Const4..0 | | | | | | | | 53h | BEQI |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 54h | BPS |
| Displacement13..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | Cs | | | | | | 55h | BRA |
| Displacement11..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | s | | C2 | | | | | Rs1 | | | | | | | | | | Const4..0 | | | | | | | | 58h | BBC / BBS |
| Displacement17..2 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5Eh | JSR |
| Disp6 | | | | | | | | | | Cs | | | | 5Fh | BT |
| Instr8 | | | | | | | | | | | | | | 60-6Fh | CI |
| r | | | | Funct5 | | | | | | | | | | | | | ~3 | | | | | | | | | Rs2 | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 7Ah | {OSR2} |
| r | | | | 2 | | | | | | | | | | | | | ~2 | | | | S | | | | | Rs2 | | | | | | | | | | Rs1 | | | | | | | | | DC3 | | IC2 | | | | | | 7Ah | CACHE |
| ~ | | | | 8 | | | | | | | | | | | | | PL8 | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | ~2 | Tm3 | | | | | | | 7Ah | REX |
| ~ | | | | Constant12..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | DC3 | | IC2 | | | | | | 7Bh | CACHE |
| **Memory** | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | Rd | | | | | | | | 80h | LDB |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | Rd | | | | | | | | 81h | LDBU |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | Rd | | | | | | | | 82h | LDW |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | Rd | | | | | | | | 83h | LDWU |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | Rd | | | | | | | | 84h | LDT |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | Rd | | | | | | | | 85h | LDTU |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | Rd | | | | | | | | 86h | LDO |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | Rd | | | | | | | | 87h | LDOR |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | 04 | | | | | | Lk | | 88h | LDO ra[n] |
| ~ | | | Constant11..3 | | | | | | | | | | | | | | | R | | | 43 | | | Cd | | | | | 88h | LDO cr[n] |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | 75 | | | | | | | | 88h | LDO epc |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | 295 | | | | | | | | 88h | LDO crall |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | Rd | | | | | | | | 89h | LEA |
| r | | | ~2 | | | | Rd | | | | | | | | 8Ah | POP |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | Pd | | | | | | | | 8Bh | PLDO |
| r | | | Constant11..3 | | | | | | | | | | | | | | | R | | | Fd | | | | | | | | 8Eh | FLDO |
| r | | | 0 | | | C | | | | S | | | Rs3 | | | | | | | | | | | | | Cnst4..0 | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 91h | LEA |
| r | | | 1 | | | Constant11..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 91h | LEA |
| r | | | C65 | | | | | | | S | | | Rs3 | | | | | | | | | | | | | Cnst4..0 | | | | | | | | | Rs1 | | | | | | | | | Fd | | | | | | | | 92h | FLDO |
| r | | | C65 | | | | | | | S | | | Rs3 | | | | | | | | | | | | | Cnst4..0 | | | | | | | | | Rs1 | | | | | | | | | Pd | | | | | | | | 93h | PLDO |
| ~2 | | | S2 | Mask31..6 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Mask5..1 | | | | | | | | 97h | LDM |
| r | | | | 0 | | | C | | | S | | | Rs3 | | | | | | | | | | | | | Cnst4..0 | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 98h | LDB |
| r | | | | 1 | | | Constant11..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 98h | LDB |
| r | | | | m | | | C | | | S | | | Rs3 | | | | | | | | | | | | | Cnst4..0 | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 99h | LDBU |
| r | | | | m | | | C | | | S | | | Rs3 | | | | | | | | | | | | | Cnst4..0 | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 9Ah | LDW |
| r | | | | m | | | C | | | S | | | Rs3 | | | | | | | | | | | | | Cnst4..0 | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 9Bh | LDWU |
| r | | | | m | | | C | | | S | | | Rs3 | | | | | | | | | | | | | Cnst4..0 | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 9Ch | LDT |
| r | | | | m | | | C | | | S | | | Rs3 | | | | | | | | | | | | | Cnst4..0 | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 9Dh | LDTU |
| r | | | | m | | | C | | | S | | | Rs3 | | | | | | | | | | | | | Cnst4..0 | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 9Eh | LDO |
| r | | | | m | | | C | | | S | | | Rs3 | | | | | | | | | | | | | Cnst4..0 | | | | | | | | | Rs1 | | | | | | | | | Rd | | | | | | | | 9Fh | LDOR |
| ~ | | | Rs2 | | | | | | | | | C11..8 | | | | | | R | | | Const7..3 | | | | | | | | A0h | STB |
| ~ | | | Rs2 | | | | | | | | | C11..8 | | | | | | R | | | Const7..3 | | | | | | | | A1h | STW |
| ~ | | | Rs2 | | | | | | | | | C11..8 | | | | | | R | | | Const7..3 | | | | | | | | A2h | STT |
| ~ | | | Rs2 | | | | | | | | | C11..8 | | | | | | R | | | Const7..3 | | | | | | | | A3h | STO |
|  | | | Rs2 | | | | | | | | | C11..8 | | | | | | R | | | Const7..3 | | | | | | | | A4h | STOC |
| ~ | | | Rs2 | | | | | | | | | C11..8 | | | | | | R | | | Const7..3 | | | | | | | | A5h | STPTR |
| ~ | | | 04 | | | | | Lk | | | | C11..8 | | | | | | R | | | Const7..3 | | | | | | | | A6h | STO |
| Const23..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A9h | PUSHC |
| ~3 | | | | | | | | Rs0 | | | | | | | | AAh | PUSH |
| ~ | | | Fs2 | | | | | | | | | C11..8 | | | | | | R | | | Const7..3 | | | | | | | | ABh | FSTO |
| ~ | | | Ps2 | | | | | | | | | C11..8 | | | | | | R | | | Const7..3 | | | | | | | | ACh | PSTO |
| Const11..8 | | | | | | | | | | | | | | | | | | | Rs3 | | | | | | | | | | | | | Rs2 | | | | | | | | | Rs1 | | | | | | | | | Const4..0 | | | | | | | | AEh | STM |
| C | | | Rs2 | | | | | | | | | Rs1 | | | | | | | | | Const7..3 | | | | | | | | B0h | STO |
| Const63..0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | B1h | PUSHC |
| Cnst14..7 | | | | | ~ | | C65 | | | | | | | | | S | | | Rs3 | | | | | | | | | | | | | Fs2 | | | | | | | | | Rs1 | | | | | | | | | Const4..0 | | | | | | | | B2h | FSTO |
| Cnst14..7 | | | | | ~ | | C65 | | | | | | | | | S | | | Rs3 | | | | | | | | | | | | | Ps2 | | | | | | | | | Rs1 | | | | | | | | | Const4..0 | | | | | | | | B3h | PSTO |
| ~2 | | S2 | | Mask31..6 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rs1 | | | | | | | | | Mask5..1 | | | | | | | | B7h | STM |
| Cnst14..7 | | | | | ~ | | C65 | | | | | | | | | S | | | Rs3 | | | | | | | | | | | | | Rs2 | | | | | | | | | Rs1 | | | | | | | | | Const4..0 | | | | | | | | B8h | STB |
| Cnst14..7 | | | | | ~ | | C65 | | | | | | | | | S | | | Rs3 | | | | | | | | | | | | | Rs2 | | | | | | | | | Rs1 | | | | | | | | | Const4..0 | | | | | | | | B9h | STW |
| Cnst14..7 | | | | | ~ | | C65 | | | | | | | | | S | | | Rs3 | | | | | | | | | | | | | Rs2 | | | | | | | | | Rs1 | | | | | | | | | Const4..0 | | | | | | | | BAh | STT |
| Cnst14..7 | | | | | ~ | | C65 | | | | | | | | | S | | | Rs3 | | | | | | | | | | | | | Rs2 | | | | | | | | | Rs1 | | | | | | | | | Const4..0 | | | | | | | | BBh | STO |
| Cnst14..7 | | | | | ~ | | C65 | | | | | | | | | S | | | Rs3 | | | | | | | | | | | | | Rs2 | | | | | | | | | Rs1 | | | | | | | | | Const4..0 | | | | | | | | BCh | STOC |
| Cnst14..7 | | | | | ~ | | C65 | | | | | | | | | S | | | Rs3 | | | | | | | | | | | | | Rs2 | | | | | | | | | Rs1 | | | | | | | | | Const4..0 | | | | | | | | BDh | STPTR |
| Cnst14..7 | | | | | ~ | | C65 | | | | | | | | | S | | | Rs3 | | | | | | | | | | | | | 04 | | | | | | Lk | | | Rs1 | | | | | | | | | Const4..0 | | | | | | | | BEh | STO |
| **Posit Arithmetic** | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| ~3 | | | | | | | | | | | r | | | Prs3 | | | | | | | | | | | | | Prs2 | | | | | | | | | Prs1 | | | | | | | | | Prs0 | | | | | | | | E1h | PFDP |
| ~3 | | | | | | | | | | | r | | | Funct5 | | | | | | | | | | | | | Prs2 | | | | | | | | | Prs1 | | | | | | | | | Prd | | | | | | | | E2h | {PST2} |
| ~3 | | | | | | | | | | | r | | | 15 | | | | | | | | | | | | | Funct5 | | | | | | | | | Prs1 | | | | | | | | | Prd | | | | | | | | E2h | {PST1} |
| ~ | | | | | | | | | | | r | | | Prs3 | | | | | | | | | | | | | Prs2 | | | | | | | | | Prs1 | | | | | | | | | Prd | | | | | | | | E4h | PMA |
| ~ | | | | | | | | | | | r | | | Prs3 | | | | | | | | | | | | | Prs2 | | | | | | | | | Prs1 | | | | | | | | | Prd | | | | | | | | E5h | PMS |
| ~ | | | | | | | | | | | r | | | Prs3 | | | | | | | | | | | | | Prs2 | | | | | | | | | Prs1 | | | | | | | | | Prd | | | | | | | | E6h | PNMA |
| ~ | | | | | | | | | | | r | | | Prs3 | | | | | | | | | | | | | Prs2 | | | | | | | | | Prs1 | | | | | | | | | Prd | | | | | | | | E7h | PNMS |
| ~ | | | | | | | | | | | r | | | Prs3 | | | | | | | | | | | | | Prs2 | | | | | | | | | Prs1 | | | | | | | | | Prd | | | | | | | | E8h | PMIN |
| ~ | | | | | | | | | | | r | | | Prs3 | | | | | | | | | | | | | Prs2 | | | | | | | | | Prs1 | | | | | | | | | Prd | | | | | | | | E9h | PMAX |
| EAh | NOP |
| **Floating Point** | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| r | | Funct5 | | | | | | | | | Frs1 | | | | | | | | | Frd | | | | | | | | F1h | {FLT1} |
| Rm3 | | | | | | | | | | r | | | | Funct5 | | | | | | | | | | | | | Frs2 | | | | | | | | | Frs1 | | | | | | | | | Frd | | | | | | | | F2h | {FLT2} |
| Rm3 | | | | | | | | | | r | | | | 15 | | | | | | | | | | | | | Funct5 | | | | | | | | | Frs1 | | | | | | | | | Frd | | | | | | | | F2h | {FLT1} |
| Rm3 | | | | | | | | | | r | | | | Frs3 | | | | | | | | | | | | | Frs2 | | | | | | | | | Frs1 | | | | | | | | | Frd | | | | | | | | F4h | FMA |
| Rm3 | | | | | | | | | | r | | | | Frs3 | | | | | | | | | | | | | Frs2 | | | | | | | | | Frs1 | | | | | | | | | Frd | | | | | | | | F5h | FMS |
| Rm3 | | | | | | | | | | r | | | | Frs3 | | | | | | | | | | | | | Frs2 | | | | | | | | | Frs1 | | | | | | | | | Frd | | | | | | | | F6h | FNMA |
| Rm3 | | | | | | | | | | r | | | | Frs3 | | | | | | | | | | | | | Frs2 | | | | | | | | | Frs1 | | | | | | | | | Frd | | | | | | | | F7h | FNMS |

# Opcode Maps

## Root Level

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| **ALU** | | | | | | | | |
| 00000 | BRK | {R3A} | {R2A} | {R3B} | ADD | SUBF | MUL |  |
| 00001 | AND | OR | EOR |  | {SHIFT} | {SET} | MULU | CSR |
| 00010 | DIV | DIVU | DIVSU | {R2B} |  | MULF | MULSU | PERM |
| 00011 | REM | REMU | BYTNDX | WYDNDX | EXT | DEP | DEPI | FFO |
| 00100 | REMSU | DIVR | CHK |  | SAND | SOR | SEQ | SNE |
| 00101 | SLT | SGE | SLE | SGT | SLTU | SGEU | SLEU | SGTU |
| 00110 | ADD | OR | ADD | ADD | OR |  | GCSUB7 | GCSUB |
| 00111 | ADDUI | ADDUI | ANDUI | ANDUI | ORUI | ORUI | AUIIP | AUIIP |
| **Branch Unit** | | | | | | | | |
| 01000 | JLR | JMP | JSR | RTS | RTL | RTE | BEQ | BNE |
| 01001 | BLT | BGE | BLE | BGT | BLTU | BGEU | BLEU | BGTU |
| 01010 | BVC | BVS | BOD | BEQI | BPS | BRA | BEQZ | BNEZ |
| 01011 | BBC / BBS |  |  |  |  |  | JSR | BT |
|  | | | | | | | | |
| 01100 | CI | CI | CI | CI | CI | CI | CI | CI |
| 01101 | CI | CI | CI | CI | CI | CI | CI | CI |
| 01110 |  |  |  |  |  |  |  |  |
| 01111 |  |  | {OSR2} | CACHE |  |  |  |  |
| **Memory Unit** | | | | | | | | |
| 10000 | LDB | LDBU | LDW | LDWU | LDT | LDTU | LDO | LDOR |
| 10001 | LDO | LEA | POP | PLDO |  |  | FLDO |  |
| 10010 | LDO | LEA\* | FLDO\* | PLDO\* |  |  |  | LDM |
| 10011 | LDB\* | LDBU\* | LDW\* | LDWU\* | LDT\* | LDTU\* | LDO\* | LDOR\* |
| 10100 | STB | STW | STT | STO | STOC | STPTR | STO lk |  |
| 10101 |  | PUSHC | PUSH | FSTO | PSTO |  |  |  |
| 10110 | STO | PUSHC | FSTO\* | PSTO\* |  |  |  | STM |
| 10111 | STB\* | STW\* | STT\* | STO\* | STOC\* | STPTR\* | STO\*lk |  |
| 11000 |  |  |  |  |  |  |  |  |
| 11001 |  |  |  |  |  |  |  |  |
| 11010 |  |  |  |  |  |  |  |  |
| 11011 |  |  |  |  |  |  |  |  |
| **Floating Point / Posit Arithmetic Unit** | | | | | | | | |
| 11100 |  |  | {PST2} |  | PMA | PMS | PNMA | PNMS |
| 11101 |  |  | NOP |  |  |  |  |  |
| 11110 |  |  | {FLT2} |  | FMA | FMS | FNMA | FNMS |
| 11111 |  |  |  |  |  |  |  |  |

## {R3A} Triadic Integer Register Ops

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
|  | MIN | MAX | MAJ | MUX | ADD | SUB | CMOVENZ | FLIP |

## {R3B} Triadic Integer Register Ops

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
|  | AND | OR | EOR | DEP | EXT | EXTU | BLEND | RGF |

## {R2A} Dyadic Integer Register Ops

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| 00 | AND | OR | EOR | BMM | ADD | SUB | MUL |  |
| 01 | NAND | NOR | ENOR |  | {R1} | MOV | MULU | MULH |
| 10 | DIV | DIVU | DIVSU | REM | REMU | REMSU | MULSU | PERM |
| 11 | PTRDIF | | BYTNDX | WYDNDX | MULF | MULSUH | MULUH | RGF |

## {R2B} Dyadic Integer Register Ops

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| 00 | CHK | DIF |  |  |  |  |  |  |
| 01 |  |  |  |  |  |  |  |  |
| 10 |  |  |  |  |  |  |  |  |
| 11 |  |  |  |  |  |  |  |  |

## {R1} Monadic Integer Register Ops

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| 00 | CNTLZ | CNTLO | CNTPOP | COM | NOT | NEG |  |  |
| 01 |  |  |  | TST |  |  |  |  |
| 10 |  |  |  |  |  |  |  |  |
| 11 |  |  |  |  |  |  |  |  |

## Fmt3 For Dyadic and MonadicOps

|  |  |
| --- | --- |
| Fmt3 | Size of Operation |
| 0 | octa |
| 1 | tetra |
| 2 | wyde |
| 3 | byte |

## {SET} Fn3 Set Integer Register Ops

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
|  | SEQ | SNE | SAND | SOR | SLT | SGE | SLTU | SGEU |

## Floating-Point Monadic Ops – {FLT1} Funct5

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| 00 | FMOV | FRSQRTE | FTOI | ITOF |  |  | FSIGN | FMAN |
| 01 |  | FS2D | FS2Q | FD2Q | FSTAT | FSQRT | ISNAN | FINITE |
| 10 | FTX | FCX | FEX | FDX | FRM | TRUNC | FSYNC | FRES |
| 11 | FSIG | FD2S | FQ2S | FQ2D |  |  | FCLASS | UNORD |

## Floating-Point Dyadic Ops – {FLT2} Funct5

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| 00 | SCALEB | {FLT1} | FMIIN | FMAX | FADD | FSUB |  |  |
| 01 | FMUL | FDIV | FREM | FNXT | FAND | FOR |  |  |
| 10 | FCMP | FSEQ | FSLT | FSLE |  |  |  | FFDP |
| 11 | CPYSGN | SGNINV | SGNAND | SGNOR | SGNXOR | SGNXNOR | FCLASS | FRGF |

## Posit Arithmetic Monadic Ops – {FLT1} Funct5

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| 00 | PMOV | PRSQRTE | PTOI | ITOP |  |  | PSIGN | PMAN |
| 01 |  | PS2D | PS2Q | PD2Q | PSTAT | PSQRT | PISNAN | PFINITE |
| 10 | PTX | PCX | PEX | PDX |  | PTRUNC | PSYNC | PRES |
| 11 | PSIG | PD2S | PQ2S | PQ2D |  |  | FCLASS | PUNORD |

## Posit Arithmetic Dyadic Ops – {PST2} Funct5

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| 00 |  | {PST1} |  |  | PADD | PSUB |  |  |
| 01 | PMUL | PDIV | PREM |  |  |  |  |  |
| 10 | PCMP |  |  |  |  |  |  |  |
| 11 |  |  |  |  |  |  | PCLASS | PRGF |

## {OSR2} Funct5

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|  | 000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
| 00 | LLAL | LLAH | CACHE | GCSUB | LPAL | LPAH |  |  |
| 01 | PUSHQ | POPQ | PEEKQ | STATQ | SETKEY | GCCLR |  |  |
| 10 | REX | PFI | WAI |  |  |  |  |  |
| 11 | SETTO | GETTO | GETZL |  | MVMAP | MVSEG | TLBRW |  |

# ALU Operations

## Features

Compare results registers may be operands to some instructions such as shift and add operations. The least significant bit of the compare results register (the C flag) is used. It is only possible to shift a compare results register value to the left. (Shifting to the right would result in a zero value).

## Summary

Almost all ALU operations except for compare and bit have the capacity to update cr0 with results status. The compare and bit instruction may update any compare result register with status.

The following example shows the power of results merging operations and the utility of the set instructions and other operations.

**Example**: convert an ascii character to upper case

|  |
| --- |
| toUpper:  sge $cr0,$a0,#'a'  sle.and $cr0,$a0,#'z'  asl $t1,$cr0.C,#5  sub $a0,$a0,$t1  ret |

There are no branches in the code. The code is short enough that it may be implemented as a macro.

**Example:** extended precision addition:

|  |
| --- |
| add. $a2,$a0,$a1 ; add low, record carry  add $a5,$a3,a4 ; add high  add $a5,$a5,cr0.C ; add carry from low |

## ADD[.] – Addition

**Description**:

Add two operand values and place the result in the target register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. There is also a form of this instruction which sums the values of three registers (Rs1, Rs2, and Rs3).

The status result of the addition may optionally be copied to cr0.

**Formats Supported**: R2, R3, RI

**R2 Supported Formats**: .b .w, .t, .o

|  |  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | 45 | | Fmt3 | | | Rs2 | Rs1 | | Rd | 02h | ADD |
| r | 43 | Rs3 | | | | Rs2 | Rs1 | | Rd | 01h | ADD |
| r | Constant12..0 | | | | | | Rs1 | | Rd | 04h | ADD |
| r | Cst4..0 | | Rs1 | | Rd | 50h | ADD |
| r | Constant22..13 | | | | Rd | 52h | ADD |
| r | Rs2 | | | Rs1 | Rd | 53h | ADD |

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## ADDUI[.] – Add Upper Immediate

**Description**:

Add an immediate value to the general-purpose register Rd and place the result into the destination register Rd. The immediate constant is composed of 13 bits of zeros on the right-hand side and 51 constant bits for bits 13 to 63.

**Formats Supported**: ADDUI

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

**Notes:**

## AND[.] – Bitwise ‘And’

**Description**:

Bitwise ‘And’ operand values and place the result in the target register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. The immediate value is one extended to the width of the machine. There is another form of this instruction which will bitwise and together three registers (Rs1, Rs2, Rs3).

The status result of the bitwise and may optionally be copied to cr0.

**Formats Supported**: R2, R2B, R3, RI

**R2 Supported Formats**: .b .w, .t, .o

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | 05 | | Fmt3 | Rs2 | | Rs1 | Rd | 02h | AND |
| r | 05 | | Fmt3 | ~3 | Cs2 | Rs1 | Rd | 13h | AND |
| r | 03 | Rs3 | | Rs2 | | Rs1 | Rd | 03h | AND |
| r | Constant12..0 | | | | | Rs1 | Rd | 08h | AND |

**Execution Units**: ALU

**Clock Cycles**: 0.5

**Exceptions**: none

## ANDUI[.] – And Upper Immediate

**Description**:

Bitwise and an immediate value to the general-purpose register Rd and place the result into the destination register Rd. The immediate constant is composed of 13 bits of ones on the right-hand side and 51 constant bits for bits 13 to 63.

**Formats Supported**: ADDUI

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

**Notes:**

## ASL[.] – Arithmetic Shift Left

**Description**:

Left shift one operand value by a second operand value and place the result in the target register. Zeros are shifted into the least significant bits. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value.

This shift may shift either a value in integer register Rs1 or the carry flag of register Cs1.

**Formats Supported**: RR, RI6

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Fmt3 | 04 | 0 | Rs2 | Rs1 | | Rd | 0Ch | ASL |
| r | Fmt3 | 04 | 1 | Rs2 | ~3 | Cs1 | Rd | 0Ch | ASL |
| r | Fmt3 | 84 | Const5..0 | | Rs1 | | Rd | 0Ch | ASL |

**Execution Units**: ALU

**Clock Cycles**: 0.5

**Exceptions**: none

**Example**: convert an ascii character to upper case

|  |
| --- |
| toUpper:  sge $cr0,$a0,#'a'  sle.and $cr0,$a0,#'Z'  asl $t1,$cr0,#5  sub $a0,$a0,$t1  ret |

There are no branches in the code. The code is short enough that it may be implemented as a macro.

## ASR[.] – Arithmetic Shift Right

**Description**:

Right shift one operand value by a second operand value while preserving the sign bit and place the result in the target register. The sign bit is preserved as the shift takes place. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value.

**Formats Supported**: RR, RI6

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Fmt3 | 44 | 0 | Rs2 | Rs1 | Rd | 0Ch | ASR |
| r | Fmt3 | 124 | Const5..0 | | Rs1 | Rd | 0Ch | ASR |

**Execution Units**: ALU

**Clock Cycles**: 0.5

**Exceptions**: none

## AUIIP – Add Upper Immediate to IP

**Description**:

Add an immediate value to the instruction pointer register and place the result into the destination register Rd. The immediate constant is composed of 13 bits of zeros on the right-hand side and 51 constant bits for bits 13 to 63. This instruction may be used to form instruction pointer relative addresses.

**Formats Supported**: AUIIP

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

**Notes:**

## BFCHG[.] – Bitfield Change

**Description**:

A bitfield in the source is inverted, the result is copied to the target register. There are two forms of this instruction, one uses registers to specify the offset and width, the other uses immediate constants supplied in the instruction to specify the offset and width. The width specified should be one less than the desired width.

**Instruction Format**: BFR

A bitfield in the source specified by Rs1 is inverted, the result is copied to the target register. Rs2 specifies the bit offset. Rs3 specifies the bit width.

**Instruction Format**: BFI

A bitfield in the source specified by Rs1 is inverted, the result is copied to the target register. Bo specifies the bit offset. Bw specifies the bit width. Bo and Bw are constants supplied in the instruction.

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | 73 | | Rs3 | | Rs2 | Rs1 | Rd | 01h | FLIP |
| r | 1 | Bw6 | | Bo6 | | Rs1 | Rd | 1Dh | FLIP |

**Clock Cycles**: 1

**Execution Units:** Integer ALU #0 only

**Exceptions**: none

## BFCLR[.] – Bitfield Clear

**Description**:

This is an alternate mnemonic for the [DEP](#_DEP_–_Bitfield) instruction where the source register is assumed to be x0. A bitfield in the source is cleared, the result is copied to the target register. There are two forms of this instruction, one uses registers to specify the offset and width, the other uses immediate constants supplied in the instruction to specify the offset and width. The width specified should be one less than the desired width.

**Instruction Format**: R3

A bitfield in the source specified by Rs1 is cleared, the result is copied to the target register. Rs2 specifies the bit offset. Rs3 specifies the bit width.

**Instruction Format**: BFI

A bitfield in the source specified by Rs1 is cleared, the result is copied to the target register. Bo specifies the bit offset. Bw specifies the bit width. Bo and Bw are constants supplied in the instruction.

|  |  |  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | 33 | | Rs3 | | Rs2 | 0 | | | Rd | 03h | | DEP |
| r | 0 | Bw6 | | Bo6 | | | 0 | Rd | | 1Dh | DEP | |

**Clock Cycles**: 0.25

**Execution Units:** Integer ALU #0 only

**Exceptions**: none

**Notes**:

Normally Rs3 is a register which is the same as the target register Rd.

## BIT[.] – Bitwise ‘And’

**Description**:

This is an alternate mnemonic of the [AND](#_AND[.]_–_Bitwise) instruction where the target register is $x0. Bitwise ‘And’ two operand values and place the resulting status in a compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value.

The Z flag of the compare result register is set if the result is zero. The N flag of the result register is set if the most significant bit of the result is set. The O flag of the result register is set if the least significant bit of the result is set.

**R2 Supported Formats**: .b .w, .t, .o

Example:

BIT. x0,x10,#$20 ; check bit five of register x10

BEQ cr0,target ; branch if bit is clear

**Execution Units**: Integer ALU

**Clock Cycles**: 1

**Exceptions**: none

## BLEND[.] – Blend Colors

**Description**:

This instruction blends two colors whose values are in Rs1 and Rs2 according to an alpha value in Rs3. The resulting color is placed in register Rd. The alpha value is an eight-bit value assumed to be a binary fraction less than one. The color values in Rs1 and Rs2 are assumed to be RGB888 format colors. The result is a RGB888 format color. The high order eight bits of the result register are set to the high order eight bits of Rs1. Note that a close approximation to 1.0 – alpha is used.

**Instruction Format**: R3

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 63 | Rs3 | Rs2 | Rs1 | Rd | 01h | BLEND |

**Operation**: Rd = (Rs1 \* alpha) + (Rs2 \* ~alpha)

**Clock Cycles**: 1

## BMM[.] – Bit Matrix Multiply

BMM Rd, Rs1, Rs2

**Description**:

The BMM instruction treats the bits of register Rs1 and register Rs2 as an 8x8 matrix and performs a bit matrix multiply of the two registers and stores the result in the target register. An alternate mnemonic for this instruction is MOR.

**Instruction Format**: Integer R2

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 35 | Fn3 | Rs2 | Rs1 | Rd | 02h | BMM |

|  |  |
| --- | --- |
| Fn3 | Function |
| 0 | MOR |
| 1 | MXOR |
| 2 | MORT (MOR transpose) |
| 3 | MXORT (MXOR transpose) |
| 4 to 7 | reserved |

**Operation**:

for I = 0 to 7

for j = 0 to 7

Rt.bit[i][j] = (Ra[i][0]&Rb[0][j]) | (Ra[i][1]&Rb[1][j]) | … | (Ra[i][15]&Rb[15][j])

**Clock Cycles:** 1

**Execution Units:** ALU #0 only

**Exceptions**: none

**Notes**:

The bits are numbered with bit 63 of a register representing I,j = 0,0 and bit 0 of the register representing I,j = 7,7.

## BYTNDX[.] – Byte Index

**Description:**

This instruction searches Rs1 for a byte specified by Rs2 or an immediate value and places the index of the byte into the target register Rd. If the byte is not found -1 is placed in the target register. A common use would be to search for a null byte. The index result may vary from -1 to +7. The index of the first found byte is returned (closest to zero).

**Instruction Format:** R2

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 265 | Fmt3 | Rs2 | Rs1 | Rd | 02h | BYTNDX |

**R2 Supported Formats**: .b .w, .t, .o

**Clock Cycles:** 1

**Execution Units:** Integer ALU

**Operation:**

Rd = Index of (Rs2 in Rs1)

**Exceptions:** none

## CHK – Check Register Against Bounds

**Description**:

A register is compared to two values. If the register is outside of the bounds defined by Rs2 and Rs3 or an immediate value, then an exception will occur. Rs1 must be greater than or equal to Rs2 and Rs1 must be less than Rs3 or the immediate.

**Instruction Format**: CHK, CHKI

**Supported Formats**: .b .w, .t, .o, .h, .bv, .wv, .tv, .ov, .hv

**Clock Cycles**: 1

**Execution Units:** Integer Unit

**Exceptions**: bounds check

**Notes**:

The system exception handler will typically transfer processing back to a local exception handler.

## CMOVENZ[.] – Conditional Move

**Description:**

This instruction moves from either Rs1 or Rs3 depending on the state of the condition register and places the result in a target register. If the condition register is true then the value from Rs1 is moved to Rd otherwise the value from Rs3 is moved to Rd.

**Instruction Format:** Integer R1

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | 63 | Rs3 | 03 | Cs2 | Rs1 | Rd | 01h | CMOVENZ |

**R1 Supported Formats**:

**Clock Cycles:** 1

**Execution Units:** Integer ALU

**Operation:**

Rd = ~Rs1

**Exceptions:** none

## CMP – Compare

**Description**:

Compare two operand values and store the relationship in the target compare result register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value.

Flags are set in the compare result register as if a subtract operation were performed between operands. If the result is zero the Z flag is set. If the signed result is less than zero then the N flag is set. The carry flag C is set on unsigned overflow. The overflow flag V is set on signed overflow. Parity P is set if the exclusive or of all result bits is a one. The odd flag, O, is set if the result is odd. The remaining bits of the result register are unused.

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| N | V | O | P | ~ | ~ | Z | C |

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Meaning |
| 0 | C | Carry flag, set if operation overflows |
| 1 | Z | Zero flag, set if result is zero |
| 2 | ~ | reserved |
| 3 | ~ | reserved |
| 4 | P | Parity (exclusive or of all result bits) |
| 5 | O | Odd, set if result is odd |
| 6 | V | Overflow, set if signed result overflows |
| 7 | N | Negative, set if signed result is less than zero |

The compare instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

Example: compute a0 == a1 and a2 == a3 and branch

CMP.CPY c0,a0,a1

CMP.AND c0,a2,a3

BEQ c0,target

**Formats Supported**: R2, RI

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ~ | 75 | Fmt3 | Rs2 | | Rs1 | mop3 | Cd2 | 02h | CMP |
| ~ | 75 | Fmt3 | ~3 | Cs2 | Rs1 | mop3 | Cd2 | 13h | CMP |
| ~ | Constant12..0 | | | | Rs1 | mop3 | Cd2 | 07h | CMP |

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: Integer ALU

**Clock Cycles**: 1

## CNTLO[.] – Count Leading Ones

**Description**:

Count the number of leading ones (starting at the MSB) in Rs1 and place the count in the target register.

**Instruction Format**: R1

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 125 | Fmt3 | 15 | Rs1 | Rd | 02h | CNTLO |

**R1 Supported Formats**: .b .w, .t, .o

**Clock Cycles**: 1

**Execution Units:** Integer ALU

**Exceptions**: none

## CNTLZ[.] – Count Leading Zeros

**Description**:

Count the number of leading zeros (starting at the MSB) in Rs1 and place the count in the target register.

**Instruction Format**: R1

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 125 | Fmt3 | 05 | Rs1 | Rd | 02h | CNTLZ |

**R1 Supported Formats**: .b .w, .t, .o

**Clock Cycles**: 1

**Execution Units:** Integer ALU

**Exceptions**: none

## CNTPOP[.] – Count Population

**Description**:

Count the number of one bits in Rs1 and place the count in the target register.

**Instruction Format**: R1

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 125 | Fmt3 | 25 | Rs1 | Rd | 02h | CNTPOP |

**R1 Supported Formats**: .b .w, .t, .o

**Clock Cycles**: 1

**Execution Units:** Integer ALU

**Exceptions**: none

## COM[.] – One’s Complement

**Description:**

This instruction takes the one’s complement of a register and places the result in a target register. This is almost the same operation as exclusive or’ing with minus one, however the operation size may be set to operate on only a byte, wyde, tetra or octa value.

**Instruction Format:** Integer R1

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 125 | Fmt3 | 35 | Rs1 | Rd | 02h | COM |

**R1 Supported Formats**: .b .w, .t, .o

**Clock Cycles:** 1

**Execution Units:** Integer ALU

**Operation:**

Rd = ~Rs1

**Exceptions:** none

## CSR – Control and Status Access

**Description**:

The CSR instruction group provides access to control and status registers in the core. For the read-write operation the current value of the CSR is placed in the target register Rd then the CSR is updated from register Rs1. The CSR read / update operation is an atomic operation.

**Instruction Format**: CSR

|  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- |
| Fn3 | Om3 | Regno8 | Rs1 | Rd | 0Fh | CSR |

|  |  |  |
| --- | --- | --- |
| Op3 |  | Operation |
| 0 | CSRRD | Only read the CSR, no update takes place, Rs1 should be R0. |
| 1 | CSRRW | Both read and write the CSR |
| 2 | CSRRS | Read CSR then set CSR bits |
| 3 | CSRRC | Read CSR then clear CSR bits |
| 4 |  | reserved |
| 5 | CSRRWI | Read and Write CSR with immediate |
| 6 | CSRRSI | Read and set using immediate |
| 7 | CSRRCI | Read and clear using immediate |

CSRRS and CSRRC operations are only valid on registers that support the capability.

The OM3 field is reserved to specify the operating mode. Note that registers cannot be accessed by a lower operating mode.

|  |  |  |  |
| --- | --- | --- | --- |
| Regno8 |  | Access | Description |
| 01 | HARTID | R | hardware thread identifier (core number) |
| 02 | TICK | R | tick count, counts every cycle from reset |
| 30-37 | TVEC | RW | trap vector handler address |
| 48 | EPC | RW | exceptioned pc, pc value at point of exception |
| 44 | STATUSL | RWSC | status register, contains interrupt mask, operating level |
| 45 | STATUSH | RW | status register bits 64 to 127 |
| 80-BF | CODE | RW | code buffers |
| F0 | INFO | R | Manufacturer name |
| F1 | “ | R | “ |
| F2 | “ | R | cpu class |
| F3 | “ | R | “ |
| F4 | “ | R | cpu name |
| F5 | “ | R | “ |
| F6 | “ | R | model number |
| F7 | “ | R | serial number |
| F8 | “ | R | cache sizes instruction (bits 32 to 63), data (bits 0 to 31) |

**Execution Units:** Integer, the instruction may be available on only a single execution unit (not supported on all available integer units).

**Clock Cycles**: 1

**Exceptions**: privilege violation attempting to access registers outside of those allowed for the operating mode.

## DEP[.] – Bitfield Deposit

**Description**:

The target register Rd is used as the source data. A bitfield whose value is contained in Rs1 is inserted into the source data by copying low order bits from Rs1 shifted to the left. The result is placed in the target register Rd. There are two forms of this instruction, one uses registers to specify the offset and width, the other uses immediate constants supplied in the instruction to specify the offset and width. The width specified should be one less than the desired width.

**Instruction Format**: R3, BFI

**Instruction Format**: BFI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 0 | Bw6 | Bo6 | Rs1 | Rd | 1Dh | DEP |

**Clock Cycles**: 1

**Execution Units: Integer** ALU #0 Only

**Exceptions**: none

## DEPI[.] – Bitfield Deposit Immediate

**Description**:

The target register Rd is used as the source data. A bitfield whose value is a constant specified in the Rs1 field of the instruction is inserted into the source data by copying low order bits from the constant shifted to the left. The bitfield may not be wider than six bits. Use multiple instructions to achieve a wider field width, or load a register with the value first then use the registered form of the instruction. The result is placed in the target register Rd. The width specified should be one less than the desired width.

This instruction may be used to clear or set a bitfield.

**Instruction Format**: DEPI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | C | Bw6 | Bo6 | C | Rd | 1Eh | DEP |

**Clock Cycles**: 1

**Execution Units: Integer** ALU #0 Only

**Exceptions**: none

## DIF[.] – Difference

**Description:**

This instruction computes the difference between two signed values in registers Rs1 and Rs2 and places the result in a target Rd register. The difference is calculated as the absolute value of Rs1 minus Rs2.

**Instruction Format:** R2

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 255 | Fmt3 | Rs2 | Rs1 | Rd | 02h | DIF |

**Supported Formats**: .b .w, .t, .o

**Clock Cycles:** 1

**Execution Units:** Integer

**Operation:**

Rd = Abs(Rs1 - Rs2)

**Exceptions**: none

## DIV[.] – Division

**Description**:

Divide two operand values and place the result in the target register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, an immediate value. Both operands are treated as signed values.

**Formats Supported**: R2, RI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 165 | Fmt3 | Rs2 | Rs1 | Rd | 02h | DIV |
| r | Constant12..0 | | | Rs1 | Rd | 10h | DIV |

**Execution Units**: ALU

**Clock Cycles**: 67

**Exceptions**: none

## DIVR[.] – Division

**Description**:

This instruction is supplied as division is not commutative. Divide two operand values and place the result in the target register. The first operand must be an immediate value. The second operand must be a register specified by the Rs2 field of the instruction. Both operands are treated as signed values. This instruction allows a constant to be divided by a register value “reverse” to how the DIV instruction works.

**Formats Supported**: RI

**Execution Units**: ALU

**Clock Cycles**: 67

**Exceptions**: none

## DIVU[.] – Division Unsigned

**Description**:

Divide two operand values and place the result in the target register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, an immediate value. Both operands are treated as unsigned values.

**Formats Supported**: R2, RI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 175 | Fmt3 | Rs2 | Rs1 | Rd | 02h | DIVU |
| r | Constant12..0 | | | Rs1 | Rd | 11h | DIVU |

**Execution Units**: ALU

**Clock Cycles**: 67

**Exceptions**: none

## ENOR[.] – Bitwise Exclusive Nor

**Description**:

Perform a bitwise exclusive or operation between two operands then invert the result. Operands must be in registers.

**Instruction Format**: R2

**R2 Supported Formats**: .b .w, .t, .o

**Clock Cycles**: 1

**Execution Units: Integer ALU**

**Scalar Operation**

Rd = ~(Rs1 ^ Rs2)

## EOR[.] – Bitwise Exclusive ‘Or’

**Description**:

Bitwise exclusive ‘Or’ two operand values and place the result in the target register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. The immediate value is zero extended to the left from bit 14 to the machine width.

The status result of the exclusive or may optionally be copied to cr0.

**Formats Supported**: R2, RI

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | 25 | | Fmt3 | Rs2 | | Rs1 | Rd | 02h | EOR |
| r | 25 | | Fmt3 | ~3 | Cs2 | Rs1 | Rd | 13h | EOR |
| r | 23 | Rs3 | | Rs2 | | Rs1 | Rd | 03h | EOR |
| r | Constant12..0 | | | | | Rs1 | Rd | 0Ah | EOR |

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: Integer ALU

**Clock Cycles**: 1

**Exceptions**: none

## EXT[.] –Extract Bitfield

**Description**:

A bitfield is extracted from the source by shifting the source to the right and ‘and’ masking. The result is sign extended to the width of the machine. This instruction may be used to sign extend a value from an arbitrary bit position. There are two forms of this instruction, one uses registers to specify the offset and width, the other uses immediate constants supplied in the instruction to specify the offset and width. The width specified should be one less than the desired width.

**Instruction Format**: BFR

A bitfield in the source specified by Rs1 is extracted, the result is copied to the target register. Rs2 specifies the bit offset. Rs3 specifies the bit width.

**Instruction Format**: BFI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 0 | Bw6 | Bo6 | Rs1 | Rd | 1Ch | EXT |

A bitfield in the source specified by Rs1 is extracted, the result is copied to the target register. Bo specifies the bit offset. Bw specifies the bit width. Bo and Bw are constants supplied in the instruction.

**Clock Cycles**: 0.25

**Execution Units:** Integer ALU #0 Only

**Exceptions**: none

**Notes**:

## EXTU[.] –Extract Unsigned Bitfield

**Description**:

A bitfield is extracted from the source by shifting the source to the right and ‘and’ masking. The result is zero extended to the width of the machine. This instruction may be used to zero extend a value from an arbitrary bit position. There are two forms of this instruction, one uses registers to specify the offset and width, the other uses immediate constants supplied in the instruction to specify the offset and width. The width specified should be one less than the desired width.

**Instruction Format**: BFR

A bitfield in the source specified by Rs1 is extracted, the result is copied to the target register. Rs2 specifies the bit offset. Rs3 specifies the bit width.

**Instruction Format**: BFI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 1 | Bw6 | Bo6 | Rs1 | Rd | 1Ch | EXTU |

A bitfield in the source specified by Rs1 is extracted, the result is copied to the target register. Bo specifies the bit offset. Bw specifies the bit width. Bo and Bw are constants supplied in the instruction.

**Clock Cycles**: 0.25

**Execution Units:** Integer ALU #0 Only

**Exceptions**: none

**Notes**:

## FFO[.] –Find First One

**Description**:

A bitfield contained in Rs1 is searched beginning at the most significant bit to the least significant bit for a bit that is set. The index into the bitfield of the bit that is set is stored in Rd. If no bits are set, then Rd is set equal to -1. There are two forms of this instruction, one uses registers to specify the offset and width, the other uses immediate constants supplied in the instruction to specify the offset and width. The width specified should be one less than the desired width.

**Instruction Format**: BF

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | 0 | Bw6 | | Bo6 | | Rs1 | Rd | 1Fh | FFO |
| r | 1 | ~2 | Rs3 | | Rs2 | Rs1 | Rd | 1Fh | FFO |

**Clock Cycles**:

**Execution Units:** Integer

**Exceptions**: none

## FLIP[.] – Flip Bits

**Description**:

A bitfield in the destination is bitwise exclusively or’d, with a source value in Rs1. The result is copied bask to the destination register. There are two forms of this instruction, one uses registers to specify the offset and width, the other uses immediate constants supplied in the instruction to specify the offset and width. The width specified should be one less than the desired width.

**Instruction Format**: BFR

A bitfield in the source specified by Rs1 is exclusive or’d with the target, the result is copied to the target register. Rs2 specifies the bit offset. Rs3 specifies the bit width.

**Instruction Format**: BFI

A bitfield in the source specified by Rs1 is exclusive or’d with the target, the result is copied to the target register. Bo specifies the bit offset. Bw specifies the bit width. Bo and Bw are constants supplied in the instruction.

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | 73 | | Rs3 | | Rs2 | Rs1 | Rd | 01h | FLIP |
| r | 1 | Bw6 | | Bo6 | | Rs1 | Rd | 1Dh | FLIP |

**Clock Cycles**: 1

**Execution Units:** Integer ALU #0 only

**Exceptions**: none

## LMI – Load Middle Immediate

**Description**:

Load an immediate value into the destination register Rd. The immediate constant is composed of 13 bits of zeros on the right-hand side, 26 constant bits for bits 13 to 38, and bit 38 of the constant is sign extended to 64 bits. The destination register must be either x1 or x2.

**Formats Supported**: LUI

|  |  |  |  |
| --- | --- | --- | --- |
| Constant38..16 | Rd | 48h-4Fh | LMI |

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

**Notes:**

## LSR[.] – Logical Shift Right

**Description**:

Right shift one operand value by a second operand value and place the result in the target register. Zeros are shifted into the most significant bits. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value.

**Formats Supported**: RR, RI6

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Fmt3 | 14 | ~ | Rs2 | Rs1 | Rd | 0Ch | LSR |
| r | Fmt3 | 94 | Const5..0 | | Rs1 | Rd | 0Ch | LSR |

**Execution Units**: ALU

**Clock Cycles**: 0.5

**Exceptions**: none

## LUI – Load Upper Immediate

**Description**:

Load an immediate value into the destination register Rd bits 39 to 63. Bits 0 to 38 of the destination register are not affected. The destination register must be either x1 or x2.

**Formats Supported**: LUI

|  |  |  |  |
| --- | --- | --- | --- |
| Constant63..41 | Rd | 50h-53h | LUI |

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

**Notes:**

## MAJ[.] – Majority Logic

**Description**:

Combine three operand values using majority logic and place the result in the target register. All three operands must be in registers.

**Formats Supported**: R3

**Operation:**

Rd = (Rs1 & Rs2) | (Rs1 & Rs3) | (Rs2 & Rs3)

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## MAX[.] – Maximum of Three Values

**Description**:

Find the maximum of three values and place the result in the target register. All three operands must be in registers. To find the maximum of two values use a source register twice.

**Formats Supported**: R3

**Operation:**

if (Rs1 > Rs2 and Rs1 > Rs3)

Rd = Rs1

else if (Rs2 > Rs3)

Rd = Rs2

else

Rd = Rs3

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## MIN[.] – Minimum of Three Values

**Description**:

Find the minimum of three values and place the result in the target register. All three operands must be in registers. To find the minimum of two values use a source register twice.

**Formats Supported**: R3

**Operation:**

if (Rs1 < Rs2 and Rs1 < Rs3)

Rd = Rs1

else if (Rs2 < Rs3)

Rd = Rs2

else

Rd = Rs3

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## MOV[.] – Move Register to Register

**Description**:

This instruction moves from one register to another register. This instruction may be used to move between different types of registers. The source and destination registers are identified by seven-bit tags. The Rd and Rs1 fields of the instructions are extended by d2 and s2 fields to seven bits.

|  |  |  |
| --- | --- | --- |
| Tag | Associated Register |  |
| 0 to 31 | GP0 to GP31 | general purpose registers |
| 32 to 63 | FP0 to FP31 | floating point registers |
| 64 to 95 | PS0 to PS31 | posit arithmetic registers |
| 96 to 97 | RA0, RA1 | return address registers |
| 98 to 102 | reserved |  |
| 103 | EIP | exception instruction pointer |
| 104 to 111 | reserved |  |
| 112 to 115 | CR0 to CR3 | compare result registers |
| 116 to 123 | reserved |  |
| 124 | reserved |  |
| 125 | CR <all> | all compare results registers |
| 126 | reserved | not used |
| 127 | none | instruction without a target |

**Formats Supported**: R2

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | 135 | Fmt3 | ~ | s2 | d2 | Rs1 | Rd | 02h | MOV |

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## MUL[.] – Multiplication

**Description**:

Multiply two operand values and place the result in the target register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. Both operands are treated as signed values.

**Formats Supported**: RR, RI

**Execution Units**: ALU

**Clock Cycles**: 0.5

**Exceptions**: none

## MULU[.] – Multiplication Unsigned

**Description**:

Multiply two operand values and place the result in the target register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. Both operands are treated as unsigned values. Unsigned multiplication is commonly used to calculate array indexes.

**Formats Supported**: RR, RI

**Execution Units**: ALU

**Clock Cycles**: 0.5

**Exceptions**: none

## NEG[.] – Negate

**Description:**

This instruction negates the value in register Rs1 and places the result in target register Rd.

**Instruction Format: R1**

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 125 | Fmt3 | 55 | Rs1 | Rd | 02h | NEG |

**Supported Formats**: .b .w, .t, .o

**Clock Cycles:** 1

**Execution Units:** Integer ALU

**Operation:**

Rd = -Rs1

**Exceptions**: none

**Notes**:

## NOR[.] – Bitwise Nor

**Description**:

Perform a bitwise or operation between two operands then invert the result. Both operands must be in registers.

**Instruction Format**: R2

**Supported Formats**: .b .w, .t, .o

**Clock Cycles**: 1

**Execution Units: Integer ALU**

**Exceptions**: none

## NOT[.] – Logical Not

**Description:**

This instruction takes the logical ‘not’ value of a register and places the result in a target register. If the source register contains a non-zero value, then a zero is loaded into the target. Otherwise if the source register contains a zero a one is loaded into the target register.

**Instruction Format: R1**

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 125 | Fmt3 | 45 | Rs1 | Rd | 02h | NOT |

**R1 Supported Formats**: .b .w, .t, .o

**Clock Cycles:** 1

**Execution Units:** Integer ALU

**Operation:**

Rd = !Rs1

**Exceptions**: none

**Notes**:

## OR[.] – Bitwise ‘Or’

**Description**:

Bitwise ‘Or’ two operand values and place the result in the target register, updating status flags. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. The immediate value is zero extended to the left from bit 13 to the machine width.

The status result of the inclusive or may optionally be copied to cr0.

**Formats Supported**: R2, RI

|  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | 15 | | Fmt3 | | | Rs2 | Rs1 | Rd | 02h | OR |
| r | 13 | Rs3 | | | | Rs2 | Rs1 | Rd | 03h | OR |
| r | Constant12..0 | | | | | | Rs1 | Rd | 09h | OR |
| r | Const4..0 | | Rs1 | Rd | 51h | OR |
| r | Rs2 | | Rs1 | Rd | 54h | OR |

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: Integer ALU

**Clock Cycles**: 1

**Exceptions**: none

## ORUI[.] – Or Upper Immediate

**Description**:

Bitwise or an immediate value to the general-purpose register Rd and place the result into the destination register Rd. The immediate constant is composed of 13 bits of zeros on the right-hand side and 51 constant bits for bits 13 to 63.

**Formats Supported**: ADDUI

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

**Notes:**

## PERM[.] – Permute Bytes

**Description**:

This instruction allows any combination of bytes in a source register to be copied to a target register. The low order twenty-four bits of register Rs2 or a 12-bit immediate constant are used to identify which source bytes are copied to the destination. The twenty-four-bit value is composed of eight three-bit fields. Field S0 indicates the source byte for target byte position 0. S1 indicates the source byte for target byte position 1. S2 to S7 work similarly for the remaining target bytes. There are many interesting possibilities with this instruction. A single source byte could be copied to all target byte positions for instance. Or the order of bytes in a word could be reversed.

**Formats Supported**: R2

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | 0 | S3 | S2 | S1 | S0 | Rs1 | Rd | 17h | PERM right |
| r | 1 | S7 | S6 | S5 | S4 | Rs1 | Rd | 17h | PERM left |

**Formats Supported**: R2

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## PTRDIF[.] – Difference Between Pointers

**Description**:

Subtract two values then shift the result right. Both operands must be in a register. The right shift is provided to accommodate common object sizes. It may still be necessary to perform a divide operation after the PTRDIF to obtain an index into odd sized or large objects. Sc may vary from zero to fifteen.

**Instruction Format**: Integer R2

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 24/255 | Sc4 | Rs2 | Rs1 | Rd | 02h | PTRDIF |

**Operation**:

Rd = Abs(Rs1 – Rs2) >> Sc

**Clock Cycles**: 1

**Execution Units: Integer**

**Exceptions**:

None

## ROL[.] – Rotate Left

**Description**:

Rotate left one operand value by a second operand value and place the result in the target register, updating status flags. The most significant bits are placed in the least significant bits. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value.

The first operand for the instruction may also be the carry bit of compare results register Cs1.

**Formats Supported**: RR, RI6

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Fmt3 | 24 | 0 | Rs2 | Rs1 | | Rd | 0Ch | ROL |
| r | Fmt3 | 24 | 1 | Rs2 | ~3 | Cs1 | Rd | 0Ch | ROL |
| r | Fmt3 | 104 | Const5..0 | | Rs1 | | Rd | 0Ch | ROL |

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## ROR[.] – Rotate Right

**Description**:

Rotate right one operand value by a second operand value and place the result in the target register, updating status flags. The least significant bits are placed in the most significant bits. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value.

**Formats Supported**: RR, RI6

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Fmt3 | 34 | 0 | Rs2 | Rs1 | Rd | 0Ch | ROR |
| r | Fmt3 | 114 | Const5..0 | | Rs1 | Rd | 0Ch | ROR |

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SAND – Set if A And B

**Description**:

Combine two operands using a logical ‘and’ operation and place the result in the target compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. The immediate value is sign extended to the left from bit 13 to the machine width. A branch instruction may make use of the comparison result.

This instruction will set the compare results register as follows:

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Value Set |
| 0 | C | 1 if both operands are non-zero, otherwise 0 |
| 1 | Z | 1 if both operands are non-zero, otherwise 0 |
| 2 | ~ | 0 |
| 3 | ~ | 0 |
| 4 | P | 1 if both operands are non-zero, otherwise 0 |
| 5 | O | 1 if both operands are non-zero, otherwise 0 |
| 6 | V | 0 |
| 7 | N | 0 |

The set instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several set comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: R2, RI

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SEQ – Set if Equal

**Description**:

Test two operand values for equality and place the result in the target compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. The immediate value is sign extended to the left from bit 13 to the machine width. A branch instruction may make use of the comparison result.

This instruction will set the compare results register as follows:

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Value Set |
| 0 | C | 1 if operands are equal, otherwise 0 |
| 1 | Z | 1 if operands are equal, otherwise 0 |
| 2 | ~ | 0 |
| 3 | ~ | 0 |
| 4 | P | 1 if operands are equal, otherwise 0 |
| 5 | O | 1 if operands are equal, otherwise 0 |
| 6 | V | 0 |
| 7 | N | 0 |

The set instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several set comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: R2, RI

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SGE – Set if Greater Than or Equal

**Description**:

Test if the first operand is greater than or equal to the second one and place the result in the target compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. The immediate value is sign extended to the left from bit 13 to the machine width. Both operands are treated as signed values. A branch instruction may make use of the comparison result. The R2 register form of this instruction may also be used to determine less than or equal ([SLE](#_SLE[.]_–_Set)) by swapping the operands.

This instruction will set the compare results register as follows:

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Value Set |
| 0 | C | 1 if first operand greater than or equal to second, otherwise 0 |
| 1 | Z | 1 if first operand greater than or equal to second, otherwise 0 |
| 2 | ~ | 0 |
| 3 | ~ | 0 |
| 4 | P | 1 if first operand greater than or equal to second, otherwise 0 |
| 5 | O | 1 if first operand greater than or equal to second, otherwise 0 |
| 6 | V | 0 |
| 7 | N | 0 |

The set instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several set comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: R2, RI

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SGEU – Set if Greater Than or Equal Unsigned

**Description**:

Test if the first operand is greater than or equal to the second one and place the result in the target compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. The immediate value is sign extended to the left from bit 13 to the machine width. Both operands are treated as unsigned values. A branch instruction may make use of the comparison result. The R2 register form of this instruction may also be used to determine less than or equal ([SLEU](#_SLEU_–_Set)) by swapping the operands.

This instruction will set the compare results register as follows:

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Value Set |
| 0 | C | 1 if first operand greater than or equal to second, otherwise 0 |
| 1 | Z | 1 if first operand greater than or equal to second, otherwise 0 |
| 2 | ~ | 0 |
| 3 | ~ | 0 |
| 4 | P | 1 if first operand greater than or equal to second, otherwise 0 |
| 5 | O | 1 if first operand greater than or equal to second, otherwise 0 |
| 6 | V | 0 |
| 7 | N | 0 |

The set instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several set comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: R2, RI

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SGT – Set if Greater Than

**Description**:

Test if the first operand is greater than or equal to the second one and place the result in the target compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand must be an immediate value. The immediate value is sign extended to the left from bit 13 to the machine width. Both operands are treated as signed values. A branch instruction may make use of the comparison result. There is no R2 register form of this instruction. The [SLT](#_SLT[.]_–_Set) instruction may be used to detect greater than by swapping operands.

This instruction will set the compare results register as follows:

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Value Set |
| 0 | C | 1 if first operand greater than to second, otherwise 0 |
| 1 | Z | 1 if first operand greater than to second, otherwise 0 |
| 2 | ~ | 0 |
| 3 | ~ | 0 |
| 4 | P | 1 if first operand greater than to second, otherwise 0 |
| 5 | O | 1 if first operand greater than to second, otherwise 0 |
| 6 | V | 0 |
| 7 | N | 0 |

The set instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several set comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: R2, RI

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SGTU – Set if Greater Than Unsigned

**Description**:

Test if the first operand is greater than or equal to the second one and place the result in the target compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand must be an immediate value. The immediate value is sign extended to the left from bit 13 to the machine width. Both operands are treated as unsigned values. A branch instruction may make use of the comparison result. There is no R2 register form of this instruction. The [SLTU](#_SLTU_–_Set) instruction may be used to detect greater than by swapping operands.

This instruction will set the compare results register as follows:

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Value Set |
| 0 | C | 1 if first operand greater than to second, otherwise 0 |
| 1 | Z | 1 if first operand greater than to second, otherwise 0 |
| 2 | ~ | 0 |
| 3 | ~ | 0 |
| 4 | P | 1 if first operand greater than to second, otherwise 0 |
| 5 | O | 1 if first operand greater than to second, otherwise 0 |
| 6 | V | 0 |
| 7 | N | 0 |

The set instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several set comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: R2, RI

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SLE – Set if Less Than or Equal

**Description**:

Test if the first operand is less than or equal to the second one and place the result in the target compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand must be an immediate value. The immediate value is sign extended to the left from bit 13 to the machine width. Both operands are treated as signed values. A branch instruction may make use of the comparison result. There is no R2 register form of this instruction. Instead the [SGE](#_SGE[.]_–_Set) instruction may be used with the operands swapped.

This instruction will set the compare results register as follows:

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Value Set |
| 0 | C | 1 if first operand less than second, otherwise 0 |
| 1 | Z | 1 if first operand less than second, otherwise 0 |
| 2 | ~ | 0 |
| 3 | ~ | 0 |
| 4 | P | 1 if first operand less than second, otherwise 0 |
| 5 | O | 1 if first operand less than second, otherwise 0 |
| 6 | V | 0 |
| 7 | N | 0 |

The set instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several set comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: RI

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SLEU – Set if Less Than or Equal Unsigned

**Description**:

Test if the first operand is less than or equal to the second one and place the result in the target compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand must be an immediate value. The immediate value is sign extended to the left from bit 13 to the machine width. Both operands are treated as signed values. A branch instruction may make use of the comparison result. There is no R2 register form of this instruction. Instead the [SGEU](#_SGEU[.]_–_Set) instruction may be used with the operands swapped.

This instruction will set the compare results register as follows:

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Value Set |
| 0 | C | 1 if first operand less than second, otherwise 0 |
| 1 | Z | 1 if first operand less than second, otherwise 0 |
| 2 | ~ | 0 |
| 3 | ~ | 0 |
| 4 | P | 1 if first operand less than second, otherwise 0 |
| 5 | O | 1 if first operand less than second, otherwise 0 |
| 6 | V | 0 |
| 7 | N | 0 |

The set instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several set comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: RI

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SLT – Set if Less Than

**Description**:

Test if the first operand is less than the second one and place the result in the target compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. The immediate value is sign extended to the left from bit 13 to the machine width. Both operands are treated as signed values. A branch instruction may make use of the comparison result.

This instruction will set the compare results register as follows:

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Value Set |
| 0 | C | 1 if first operand less than second, otherwise 0 |
| 1 | Z | 1 if first operand less than second, otherwise 0 |
| 2 | ~ | 0 |
| 3 | ~ | 0 |
| 4 | P | 1 if first operand less than second, otherwise 0 |
| 5 | O | 1 if first operand less than second, otherwise 0 |
| 6 | V | 0 |
| 7 | N | 0 |

The set instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several set comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: R2, RI

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SLTU – Set if Less Than Unsigned

**Description**:

Test if the first operand is less than the second one and place the result in the target compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. The immediate value is sign extended to the left from bit 13 to the machine width. Both operands are treated as unsigned values. A branch instruction may make use of the comparison result.

This instruction will set the compare results register as follows:

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Value Set |
| 0 | C | 1 if first operand less than second, otherwise 0 |
| 1 | Z | 1 if first operand less than second, otherwise 0 |
| 2 | ~ | 0 |
| 3 | ~ | 0 |
| 4 | P | 1 if first operand less than second, otherwise 0 |
| 5 | O | 1 if first operand less than second, otherwise 0 |
| 6 | V | 0 |
| 7 | N | 0 |

The set instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several set comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: R2, RI

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SNE – Set if Not Equal

**Description**:

Test two operand values for inequality and place the result in the target compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. The immediate value is sign extended to the left from bit 13 to the machine width. A branch instruction may make use of the comparison result.

This instruction will set the compare results register as follows:

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Value Set |
| 0 | C | 1 if operands are not equal, otherwise 0 |
| 1 | Z | 1 if operands are not equal, otherwise 0 |
| 2 | ~ | 0 |
| 3 | ~ | 0 |
| 4 | P | 1 if operands are not equal, otherwise 0 |
| 5 | O | 1 if operands are not equal, otherwise 0 |
| 6 | V | 0 |
| 7 | N | 0 |

The set instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several set comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: R2, RI

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SOR – Set if A Or B

**Description**:

Combine two operands using a logical ‘or’ operation and place the result in the target compare results register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. The immediate value is sign extended to the left from bit 13 to the machine width. A branch instruction may make use of the comparison result.

This instruction will set the compare results register as follows:

|  |  |  |
| --- | --- | --- |
| Bit | Flag | Value Set |
| 0 | C | 1 if both operands are non-zero, otherwise 0 |
| 1 | Z | 1 if both operands are non-zero, otherwise 0 |
| 2 | ~ | 0 |
| 3 | ~ | 0 |
| 4 | P | 1 if both operands are non-zero, otherwise 0 |
| 5 | O | 1 if both operands are non-zero, otherwise 0 |
| 6 | V | 0 |
| 7 | N | 0 |

The set instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several set comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: R2, RI

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SUB[.] – Subtraction

**Description**:

Subtract two operand values and place the result in the target register. Both operands must be in registers specified by the Rs1 and Rs2 fields of the instruction. There is no RI immediate form of this instruction. Subtracting an immediate value can be done with the ADD instruction.

The status result of the subtraction may optionally be copied to cr0.

**Formats Supported**: R2

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## SUBF[.] – Subtraction from Immediate

**Description**:

Subtract two operand values and place the result in the target register. The first operand must be an immediate value specified in the instruction the second value is specified by the Rs1 field of the instruction. There is no RR form for this instruction. Register based subtract from can be accomplished by swapping operands to the SUB instruction.

The status result of the subtraction may optionally be copied to cr0.

**Formats Supported**: RI

**Execution Units**: ALU

**Clock Cycles** 1

**Exceptions**: none

## SXB[.] –Sign Extend Byte

**Description**:

This is an alternate mnemonic for the bitfield extract (EXT) operation.

**Instruction Format**: BFI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 0 | 86 | 06 | Rs1 | Rd | 1Ch | EXT |

A bitfield in the source specified by Rs1 is extracted, the result is copied to the target register. Bo specifies the bit offset. Bw specifies the bit width. Bo and Bw are constants supplied in the instruction.

**Clock Cycles**: 1

**Execution Units:** Integer ALU #0 Only

**Exceptions**: none

**Notes**:

## SXW[.] –Sign Extend Wyde

**Description**:

This is an alternate mnemonic for the bitfield extract (EXT) operation.

**Instruction Format**: BFI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 0 | 166 | 06 | Rs1 | Rd | 1Ch | EXT |

A bitfield in the source specified by Rs1 is extracted, the result is copied to the target register. Bo specifies the bit offset. Bw specifies the bit width. Bo and Bw are constants supplied in the instruction.

**Clock Cycles**: 1

**Execution Units:** Integer ALU #0 Only

**Exceptions**: none

**Notes**:

## SXT[.] –Sign Extend Tetra

**Description**:

This is an alternate mnemonic for the bitfield extract (EXT) operation.

**Instruction Format**: BFI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 0 | 326 | 06 | Rs1 | Rd | 1Ch | EXT |

A bitfield in the source specified by Rs1 is extracted, the result is copied to the target register. Bo specifies the bit offset. Bw specifies the bit width. Bo and Bw are constants supplied in the instruction.

**Clock Cycles**: 1

**Execution Units:** Integer ALU #0 Only

**Exceptions**: none

**Notes**:

## TST – Test Against Zero

**Description**:

Compare an operand value against zero and place the resulting status in a compare results register. The operand must be in a register specified by the Rs1 field of the instruction.

The Z flag of the compare result register is set if the operand is zero. The N flag of the result register is set if the most significant bit of the operand is set. The O flag of the result register is set if the least significant bit of the operand is set.

The TST instruction features results merging, where the current value in the result register is logically combined with the new result. This allows several TST operations to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Formats Supported**: R1

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ~ | 125 | Fmt3 | 115 | Rs1 | mop3 | Cd2 | 02h | BIT |

**R2 Supported Formats**: .b .w, .t, .o

Example:

TST.CPY cr1,x10 ;

TST.AND cr1,x11 ; and bit six

BT cr1,target ; branch if both x10 and x11 are zero

**Execution Units**: Integer ALU

**Clock Cycles**: 1

**Exceptions**: none

## WYDNDX[.] – Wyde Index

**Description:**

This instruction searches Rs1, which is treated as an array of four wydes, for a wyde value specified by Rs2 or an immediate value and places the index of the wyde into the target register Rd. If the wyde is not found -1 is placed in the target register. A common use would be to search for a null wyde. The index result may vary from -1 to +3. The index of the first found wyde is returned (closest to zero).

**Instruction Format:** R2

**R2 Supported Formats**: .b .w, .t, .o

**Clock Cycles:** 1

**Execution Units:** Integer ALU

**Operation:**

Rd = Index of (Rs2 in Rs1)

**Exceptions:** none

## XOR[.] – Bitwise Exclusive ‘Or’

**Description**:

This is an alternate mnemonic for the [EOR](#_EOR_–_Bitwise) function. Bitwise exclusive ‘Or’ two operand values and place the result in the target register. The first operand must be in a register specified by the Rs1 field of the instruction. The second operand may be either a register specified by the Rs2 field of the instruction, or an immediate value. The immediate value is zero extended to the left from bit 14 to the machine width.

The status result of the exclusive or may optionally be copied to cr0.

**Formats Supported**: R2, RI

**R2 Supported Formats**: .b .w, .t, .o

**Execution Units**: Integer ALU

**Clock Cycles**: 1

**Exceptions**: none

## ZXB[.] –Zero Extend Byte

**Description**:

This is an alternate mnemonic for the bitfield extract (EXTU) operation.

**Instruction Format**: BFI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 1 | 86 | 06 | Rs1 | Rd | 1Ch | EXTU |

A bitfield in the source specified by Rs1 is extracted, the result is copied to the target register. Bo specifies the bit offset. Bw specifies the bit width. Bo and Bw are constants supplied in the instruction.

**Clock Cycles**: 1

**Execution Units:** Integer ALU #0 Only

**Exceptions**: none

**Notes**:

## ZXW[.] –Zero Extend Wyde

**Description**:

This is an alternate mnemonic for the bitfield extract (EXTU) operation.

**Instruction Format**: BFI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 1 | 166 | 06 | Rs1 | Rd | 1Ch | EXT |

A bitfield in the source specified by Rs1 is extracted, the result is copied to the target register. Bo specifies the bit offset. Bw specifies the bit width. Bo and Bw are constants supplied in the instruction.

**Clock Cycles**: 1

**Execution Units:** Integer ALU #0 Only

**Exceptions**: none

**Notes**:

## ZXT[.] –Zero Extend Tetra

**Description**:

This is an alternate mnemonic for the bitfield extract (EXTU) operation.

**Instruction Format**: BFI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 1 | 326 | 06 | Rs1 | Rd | 1Ch | EXTU |

A bitfield in the source specified by Rs1 is extracted, the result is copied to the target register. Bo specifies the bit offset. Bw specifies the bit width. Bo and Bw are constants supplied in the instruction.

**Clock Cycles**: 1

**Execution Units:** Integer ALU #0 Only

**Exceptions**: none

**Notes**:

# Memory Operations

## Overview

RTF64 is a load / store architecture. All memory data is accessed via load and store instructions separate from other operations such as ALU or FPU operations. The load / store paradigm comes from RISC machines and is highly effective at simplifying the instruction interface to memory. If an issue arises during the operation of a load or store the instruction may exception and may easily be restarted.

## Address Modes

There are three address modes for loads and stores. Address modes include register indirect with displacement, register indirect with displacement relative to the instruction pointer, and scaled indexed address mode.

## Instruction Variants

There are separate instructions to load data into or store data from the general-purpose register file, floating-point register file, or posit arithmetic register file. Each of the register file variants supports different load and store operation sizes.

## FLDO[.] – Float Load Octa (64 bits)

**Description**:

Data is loaded into Frd from the memory address which is the sum of Rs1 and an immediate value or the sum of Rs1 and Rs2 scaled by one or eight.

**Formats Supported**: RR,RI

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Constant12..0 | | | | | Rs1 | Frd | 8Eh | FLDO |
| r | | ~2 | S | Rs3 | 145 | Rs1 | Frd | 8Fh | FLDO |

**Operation:**

Rt = Memory32[d+Ra]

or

Rt = Memory32[Ra+Rb]

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## FSTO – Float Store Octet (64 bits)

**Description**:

Data from Frs2 is stored to the memory address which is either the sum of Rs1 and an immediate value or the sum of Rs1 and Rs3. Both register indirect with displacement and indexed addressing are supported. Rs3 may be scaled by either one or eight before use.

**Formats Supported**: STR, STI

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ~ | Constant12..5 | | | Frs2 | Rs1 | Const4..0 | ABh | FSTO |
| ~3 | | S | Rs3 | Frs2 | Rs1 | 115 | AFh | FSTO |

**Flags Affected**: none

**Operation:**

Memory64[d+Rs1] = Frs2

or

Memory64[d+Rs1+Rs3\*Sc] = Frs2

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## LDB[.] – Load Byte (8 bits)

**Description**:

Data is loaded from the memory address which is the sum of Rs1 and an immediate value or the sum of Rs1 and Rs2. The instruction pointer may be added to the sum of Rs1 and the immediate to form instruction pointer relative addresses. The value loaded is sign extended from bit 7 to the machine width.

**Formats Supported**: RR,RI

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | Constant12..3 | | | R | Rd | 88h | LDB |
| Const14..7 | r | C2 | S | Rs3 | | | Cnst4..0 | Rs1 | | Rd | 80h | LDB |

**Operation:**

Rd = Memory8[d+Ra]

or

Rd = Memory8[Ra+Rb]

or

Rd = Memory8[d+Ra+IP]

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## LDBU[.] – Load Byte Unsigned (8 bits)

**Description**:

Data is loaded from the memory address which is the sum of Rs1 and an immediate value or the sum of Rs1 and Rs2.. The value loaded is zero extended from bit 7 to the machine width.

**Formats Supported**: RR,RI

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Constant12..0 | | | | | Rs1 | Rd | 81h |  |
| r | Constant12..0 | | | | | Rs1 | Rd | 99h | IP rel |
| r | | ~2 | S | Rs3 | 15 | Rs1 | Rd | 8Fh | indexed |

**Operation:**

Rd = Memory8[d+Ra]

or

Rd = Memory8[Ra+Rb]

or

Rd = Memory8[d+Ra+IP]

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## LDO[.] – Load Octa (64 bits)

**Description**:

Data is loaded into Rd or the tagged register from the memory address which is the sum of Rs1 and an immediate value or the sum of Rs1 and Rs2 scaled by one or eight.

**Formats Supported**: RR,RI

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Constant12..0 | | | | | Rs1 | Rd | 86h | LDO |
| r | Constant12..0 | | | | | Rs1 | Tag4..0 | 88h | LDO |
| r | | ~2 | S | Rs3 | 65 | Rs1 | Rd | 8Fh | LDO |
| r | | ~2 | S | Rs3 | 85 | Rs1 | Tag4..0 | 8Fh | LDO |

|  |  |  |
| --- | --- | --- |
| Tag6..0 | Associated Register |  |
| 96 to 97 | RA0, RA1 | return address registers |
| 98 to 102 | reserved |  |
| 103 | EIP | exception instruction pointer |
| 104 to 111 | reserved |  |
| 112 to 115 | CR0 to CR3 | compare result registers |
| 116 to 123 | reserved |  |
| 124 | reserved |  |
| 125 | CR <all> | all compare results registers |
| 126 | reserved | not used |
| 127 | none | instruction without a target |

**Operation:**

Rd = Memory32[d+Rs1]

or

Rd = Memory32[Rs1+Rs3\*Sc]

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## LDOR[.] – Load Octa (64 bits) and Reserve

**Description**:

Data is loaded from the memory address which is the sum of Rs1 and an immediate value or the sum of Rs1 and Rs2 scaled by one or eight. Additionally, a reservation is placed on the load address.

**Formats Supported**: RR,RI

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Constant12..0 | | | | | Rs1 | Rd | 87h | LDOR |
| r | | ~2 | S | Rs3 | 75 | Rs1 | Rd | 8Fh | LDOR |

**Operation:**

Rd = Memory32[d+Rs1]

or

Rd = Memory32[Rs1+Rs3\*Sc]

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## LDT[.] – Load Tetra (32 bits)

**Description**:

Data is loaded from the memory address which is the sum of Rs1 and an immediate value or the sum of Rs1 and Rs2 scaled by one or four. The value loaded is sign extended from bit 31 to the machine width.

**Formats Supported**: RR,RI

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Constant12..0 | | | | | Rs1 | Rd | 84h | LDT |
| r | | ~2 | S | Rs3 | 45 | Rs1 | Rd | 8Fh | LDT |

**Operation:**

Rt = Memory32[d+Ra]

or

Rt = Memory32[Ra+Rb]

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## LDTU[.] – Load Tetra Unsigned (32 bits)

**Description**:

Data is loaded from the memory address which is the sum of Rs1 and an immediate value or the sum of Rs1 and Rs2 scaled by one or four. The value loaded is zero extended from bit 31 to the machine width.

**Formats Supported**: RR,RI

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Constant12..0 | | | | | Rs1 | Rd | 85h | LDTU |
| r | | ~2 | S | Rs3 | 55 | Rs1 | Rd | 8Fh | LDTU |

**Operation:**

Rt = Memory32[d+Ra]

or

Rt = Memory32[Ra+Rb]

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## LDW[.] – Load Wyde (16 bits)

**Description**:

Data is loaded from the memory address which is the sum of Rs1 and an immediate value or the sum of Rs1 and Rs2 scaled by one or two. The value loaded is sign extended from bit 15 to the machine width.

**Formats Supported**: RR,RI

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Constant12..0 | | | | | Rs1 | Rd | 82h | LDW |
| r | | ~2 | S | Rs3 | 25 | Rs1 | Rd | 8Fh | LDW |

**Operation:**

Rt = Memory16[d+Ra]

or

Rt = Memory16[Ra+Rb]

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## LDWU[.] – Load Wyde Unsigned (16 bits)

**Description**:

Data is loaded from the memory address which is the sum of Rs1 and an immediate value or the sum of Rs1 and Rs2 scaled by one or two. The value loaded is sign extended from bit 15 to the machine width.

**Formats Supported**: RR,RI

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Constant12..0 | | | | | Rs1 | Rd | 83h | LDWU |
| r | | ~2 | S | Rs3 | 35 | Rs1 | Rd | 8Fh | LDWU |

**Operation:**

Rt = Memory16[d+Ra]

or

Rt = Memory16[Ra+Rb\*Sc]

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## LEA[.] – Load Effective Address

**Description**:

This instruction computes the effective address and loads into target register Rd. An effective address may also be calculated by the ADD instruction however, LEA is executed by the address generation logic.

The status result of the calculation may optionally be copied to cr0.

**Formats Supported**: R2, RI

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | ~2 | S | Rs3 | 95 | Rs1 | Rd | 8Fh | LEA |
| r | Constant12..0 | | | | Rs1 | Rd | 89h | LEA |

**Execution Units**: ALU

**Clock Cycles**: 1

**Exceptions**: none

## PLDO[.] – Posit Load Octa (64 bits)

**Description**:

Data is loaded into posit register Prd from the memory address which is the sum of Rs1 and an immediate value or the sum of Rs1 and Rs2 scaled by one or eight.

**Formats Supported**: RR,RI

|  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| r | Constant12..0 | | | | | Rs1 | Prd | 96h | LDO |
| r | | ~2 | S | Rs3 | 225 | Rs1 | Prd | 8Fh | LDO |

**Operation:**

Rt = Memory32[d+Ra]

or

Rt = Memory32[Ra+Rb]

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## PSTO – Posit Store Octet (64 bits)

**Description**:

Data from Ps2 is stored to the memory address which is either the sum of Rs1 and an immediate value or the sum of Rs1 and Rs3. Both register indirect with displacement and indexed addressing are supported. Rs3 may be scaled by either one or eight before use.

**Formats Supported**: STR, STI

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ~ | Constant12..5 | | | Prs2 | Rs1 | Const4..0 | B3h | STO |
| ~3 | | S | Rs3 | Prs2 | Rs1 | 195 | AFh | STO |

**Flags Affected**: none

**Operation:**

Memory64[d+Rs1] = Prs2

or

Memory64[d+Rs1+Rs3\*Sc] = Prs2

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## STB – Store Byte (8 bits)

**Description**:

Data from Rs2 is stored to the memory address which is either the sum of Rs1 and an immediate value or the sum of Rs1 and Rs3. Both register indirect with displacement and indexed addressing are supported.

**Formats Supported**: STR, STI

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ~ | Constant12..5 | | | Rs2 | Rs1 | Const4..0 | A0h | STB |
| ~3 | | S | Rs3 | Rs2 | Rs1 | 05 | AFh | STB |

**Flags Affected**: none

**Operation:**

Memory8[d+Rs1] = Rs2

or

Memory8[d+Rs1+Rs3\*Sc] = Rs2

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## STM – Store Multiple Registers

**Description**:

This instruction stores multiple registers to memory at the address which is the sum of Rs1 and an immediate constant, beginning with the register specified in Rs2 and continuing upwards for the immediate count specified in the Rs3 field of the instruction.

**Instruction Formats**: LM

**Clock Cycles**: 4 minimum depending on memory access time

## STO – Store Octet (64 bits)

**Description**:

Data from Rs2 is stored to the memory address which is either the sum of Rs1 and an immediate value or the sum of Rs1 and Rs3. Both register indirect with displacement and indexed addressing are supported. Rs3 may be scaled by either one or eight before use.

**Formats Supported**: STR, STI

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ~ | Constant12..5 | | | Rs2 | Rs1 | Const4..0 | A3h | STO |
| ~ | Constant12..5 | | | Tag4..0 | Rs1 | Const4..0 | A8h | STO |
| ~3 | | S | Rs3 | Rs2 | Rs1 | 35 | AFh | STO |
| ~3 | | S | Rs3 | Tag4..0 | Rs1 | 85 | AFh | STO |

|  |  |  |
| --- | --- | --- |
| Tag6..0 | Associated Register |  |
| 96 to 97 | RA0, RA1 | return address registers |
| 98 to 102 | reserved |  |
| 103 | EIP | exception instruction pointer |
| 104 to 111 | reserved |  |
| 112 to 115 | CR0 to CR3 | compare result registers |
| 116 to 123 | reserved |  |
| 124 | reserved |  |
| 125 | CR <all> | all compare results registers |
| 126 | reserved | not used |
| 127 | none | instruction without a target |

**Flags Affected**: none

**Operation:**

Memory64[d+Rs1] = Rs2

or

Memory64[d+Rs1+Rs3\*Sc] = Rs2

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## STOC – Store Octet (64 bits) and Clear Reservation

**Description**:

Conditionally store data from Rs2 to memory if an address reservation if present. If no reservation is present the Z bit of cr0 will be cleared and the store will not be done. Otherwise the Z bit of cr0 will be set. The address is either the sum of Rs1 and an immediate value or the sum of Rs1 and Rs3. Both register indirect with displacement and indexed addressing are supported. Rs3 may be scaled by either one or eight before use. Additionally, a reservation set on the address is cleared.

**Formats Supported**: STR, STI

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ~ | Constant12..5 | | | Rs2 | Rs1 | Const4..0 | A4h | STO |
| ~3 | | S | Rs3 | Rs2 | Rs1 | 45 | AFh | STO |

**Flags Affected**: none

**Operation:**

Memory64[d+Rs1] = Rs2

or

Memory64[d+Rs1+Rs3\*Sc] = Rs2

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## STT – Store Tetra (32 bits)

**Description**:

Data from Rs2 is stored to the memory address which is either the sum of Rs1 and an immediate value or the sum of Rs1 and Rs3. Both register indirect with displacement and indexed addressing are supported. Rs3 may be scaled by either one or four before use.

**Formats Supported**: STR, STI

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ~ | Constant12..5 | | | Rs2 | Rs1 | Const4..0 | A2h | STT |
| ~3 | | S | Rs3 | Rs2 | Rs1 | 25 | AFh | STT |

**Flags Affected**: none

**Operation:**

Memory32[d+Rs1] = Rs2

or

Memory32[d+Rs1+Rs3\*Sc] = Rs2

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## STW – Store Wyde (16 bits)

**Description**:

Data from Rs2 is stored to the memory address which is either the sum of Rs1 and an immediate value or the sum of Rs1 and Rs3. Both register indirect with displacement and indexed addressing are supported. Rs3 may be scaled by either one or two before use.

**Formats Supported**: STR, STI

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ~ | Constant12..5 | | | Rs2 | Rs1 | Const4..0 | A1h | STW |
| ~3 | | S | Rs3 | Rs2 | Rs1 | 15 | AFh | STW |

**Flags Affected**: none

**Operation:**

Memory16[d+Rs1] = Rs2

or

Memory16[d+Rs1+Rs3\*Sc] = Rs2

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

## STPTR – Store Pointer (64 bits)

**Description**:

A pointer value is stored to the memory address which is either the sum of Rs1 and an immediate value or the sum of Rs1 and Rs3. Both register indirect with displacement and indexed addressing are supported. Rs3 may be scaled by either one or eight before use. Store pointer activates the card memory associated with garbage collection.

**Formats Supported**: STR, STI

|  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ~ | Constant12..5 | | | Rs2 | Rs1 | Const4..0 | A5h | STO |
| ~3 | | S | Rs3 | Rs2 | Rs1 | 55 | AFh | STO |

**Flags Affected**: none

**Operation:**

Memory64[d+Ra] = Rs

or

Memory64[d+Ra+Rb\*Sc] = Rs

**Execution Units**: Mem

**Clock Cycles**: 4 if data is in the cache.

**Exceptions**: none

# Flow Control (Branch Unit) Operations

## ARTS – Alternate Return from Subroutine

**Description**:

Transfer program execution to an address which is an offset from the call address stored in return address register #1 (ra1). The return address register will have been previously set by a subroutine call (JSR) operation. Also add a constant to the stack pointer. This instruction, unlike other return operations, does not affect semaphores.

**Formats Supported**: RTS

|  |  |  |  |
| --- | --- | --- | --- |
| Constant14..0 | RO9 | 11 | 24h |

The constant field is shifted left three times and zero extended before being added to the stack pointer.

The RO9 field specifies an offset in words for the return point from the calling instruction. Typically, this value would be one to cause a return to the next instruction. The RO9 field is shifted left twice before being added to the return address register (ra1). To skip over more words at the return site, adjust the RO9 field accordingly. This may be useful to skip over inline parameters or a short code sequence, perhaps for exception handling. Up to 2kB may be skipped over.

**Flags Affected**: none

**Operation:**

PC = ra1 + RO6\*4

SP = SP + Constant \* 8

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

**Notes**:

## BCC – Branch if Carry Clear

**Description**:

This instruction branches to the target address if the C flag is clear in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

**Operation:**

If (!Cr.C)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BCS – Branch if Carry Set

**Description**:

This instruction branches to the target address if the C flag is set in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

**Operation:**

If (Cr.C)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BEQ – Branch if Equal to Zero

**Description**:

This instruction branches to the target address if the Z flag is set in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

|  |  |  |  |
| --- | --- | --- | --- |
| Target23..2 | Cd2 | 28h | BEQ |

**Operation:**

If (Cr.Z)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BF – Branch if False

**Description**:

This instruction is an alternate mnemonic for the BCC instruction. This instruction branches to the target address if the C flag is clear in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB. One of the set instructions will set or clear the carry flag based on the result of the set comparison.

**Formats Supported**: BR

**Operation:**

If (!Cr.C)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BGE – Branch if Greater Than or Equal

**Description**:

This is an alternate mnemonic for the [BPL](#_BPL_–_Branch) instruction. This instruction branches to the target address if the N flag is clear in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

**Operation:**

If (!Cr.N)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BLE – Branch if Less Than or Equal

**Description**:

This instruction tests two flags (Z and N) at the same time. This instruction branches to the target address if the N flag is set or the Z flag is set in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

**Operation:**

If (Cr.N or Cr.Z)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BLT – Branch if Less Than

**Description**:

This is an alternate mnemonic for the [BMI](#_BMI_–_Branch) instruction. This instruction branches to the target address if the N flag is set in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

**Operation:**

If (Cr.N)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BMI – Branch if Minus

**Description**:

This instruction branches to the target address if the N flag is set in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

**Operation:**

If (Cr.N)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BNE – Branch if Not Equal to Zero

**Description**:

This instruction branches to the target address if the Z flag is clear in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

**Operation:**

If (!Cr.Z)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BOD – Branch if Odd

**Description**:

This instruction branches to the target address if the O flag is set in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

**Operation:**

If (Cr.O)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BPL – Branch if Plus

**Description**:

This instruction branches to the target address if the N flag is clear in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

**Operation:**

If (!Cr.N)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BVC – Branch if Overflow Clear

**Description**:

This instruction branches to the target address if the V flag is clear in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

**Operation:**

If (!Cr.V)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BPS – Branch if Parity Set

**Description**:

This instruction branches to the target address if the P flag is set in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

**Operation:**

If (Cr.P)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BT – Branch if True

**Description**:

This instruction is an alternate mnemonic for the BCS instruction. This instruction branches to the target address if the C flag is set in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB. One of the set instructions will set or clear the carry flag based on the result of the set comparison.

**Formats Supported**: BR

**Operation:**

If (Cr.C)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BVS – Branch if Overflow Set

**Description**:

This instruction branches to the target address if the V flag is set in the specified condition register, otherwise program execution continues with the next instruction. The target address is an absolute address. The target field is loaded into the low order 24 bits of the instruction pointer. The remaining bits of the instruction pointer are unchanged. The branch range is 16MB.

**Formats Supported**: BR

**Operation:**

If (Cr.V)

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

## BRK – Break

**Description**:

This instruction initiates the processor debug routine. The processor enters debug mode. The cause code register is set to the value specified in the instruction. Interrupts are disabled and register set #31 is selected. The instruction pointer is reset to the contents of tvec[5] and instructions begin executing. There should be a jump instruction placed at the break vector location. The address of the BRK instruction is stored in the EPC register.

**Formats Supported**: BRK

|  |  |  |  |
| --- | --- | --- | --- |
| Constant16 | Cause8 | 00h | BRK |

**Operation:**

PMSTACK = (PMSTACK << 4) | 10

RSSTACK = (RSSTACK << 5) | 31

CAUSE = Const8

EPC = PC

PC = tvec[5]

**Execution Units**: Branch

**Clock Cycles**:

**Exceptions**: none

**Notes**:

## JLR – Jump to Leaf Subroutine

**Description**:

Store the address of the JLR instruction in the specified return address register (ra0 or ra1) then jump to the address specified in the instruction plus an optional index register value. The address specified is an absolute address. The address range is 24 bits or 16MB. The resulting calculated address is always instruction word aligned.

The return address register is assumed to be ra0 if not otherwise specified. The JLR instruction does not require space in branch predictor tables.

This instruction is performance enhancing over using the JSR instruction as it does not have to write a value to memory.

**Formats Supported**: JLR

|  |  |  |  |  |
| --- | --- | --- | --- | --- |
| Target23..2 | c | Lk1 | 30h | JLR |

**Flags Affected**: none

**Operation:**

Ra = IP

if (‘c’ set)

IP = {IP[63:24],Target[23:2],00} + Cn

else

IP = {IP[63:24],Target[23:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

**Notes**:

## JMP – Jump

**Description**:

Jumps to a target address. The address specified is an absolute address plus an optional index register value. The address range is 32 bits 4GB. The resulting calculated address is always instruction four byte aligned.

**Formats Supported**: JMP

**Flags Affected**: none

**Operation:**

if (‘c’ set)

IP = {IP[63:32],Target[31:2],00} + Cn

else

IP = {IP[63:32],Target[31:2],00}

**Execution Units**: Branch

**Clock Cycles**: 0.5

**Exceptions**: none

**Notes**:

## JSR – Jump to Subroutine

**Description**:

Store the address of the JSR instruction on the stack then jump to the address specified in the instruction plus an optional index register value. The address specified is an absolute address. The address range is 32 bits or 4GB. The resulting calculated address is always four byte aligned.

The JSR instruction does not require space in branch predictor tables.

**Formats Supported**: JSR

**Flags Affected**: none

**Operation:**

SP = SP - 8

Memory64[SP] = IP

if (‘c’ set)

IP = {IP[63:32],Target[31:2],00} + Cn

else

IP = {IP[63:32],Target[31:2],00}

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

**Notes**:

## PFI – Poll for Interrupt

**Description**:

The poll for interrupt instruction polls the interrupt status lines and performs an interrupt service if an interrupt is present. Otherwise the PFI instruction is treated as a NOP operation. Polling for interrupts is performed by managed code. PFI provides a means to process interrupts at specific points in running software.

**Instruction Format: OSR2**

**Clock Cycles**:

**Execution Units: Branch**

## RET – Return from Subroutine

**Description**:

This instruction is an alternate mnemonic for the [RTS](#_RTS_–_Return) instruction.

**Formats Supported**: RTS

**Flags Affected**: none

**Operation:**

**Execution Units**: Branch

**Clock Cycles**:

**Exceptions**: none

**Notes**:

## REX – Redirect Exception

**Description**:

This instruction redirects an exception from an operating mode to a lower operating mode. This instruction if successful jumps to the target exception handler and does not return. If this instruction fails execution will continue with the next instruction.

This instruction may fail if exceptions are not enabled at the target level.

The location of the target exception handler is found in the trap vector register for that operating mode (tvec[xx]).

The cause (cause) and bad address (badaddr) registers of the originating mode are copied to the corresponding registers in the target mode.

**Instruction Format**: REX

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| ~ | 8 | PL8 | Rs1 | ~2 | Tm3 | 7Ah | REX |

|  |  |
| --- | --- |
| Tm3 |  |
| 0 | redirect to user mode |
| 1 | redirect to supervisor mode |
| 2 | redirect to hypervisor mode |
| 3 | redirect to machine mode |
| 4 | redirect to interrupt mode |
| 5 to 7 | not used |

**Clock Cycles**: 4

**Execution Units: Branch**

Example:

|  |
| --- |
| REX 1 ; redirect to supervisor handler  ; If the redirection failed, exceptions were likely disabled at the target level.  ; Continue processing so the target level may complete its operation.  RTE ; redirection failed (exceptions disabled ?) |

**Notes**:

Since all exceptions are initially handled in debug mode the debug handler must check for disabled lower mode exceptions.

## RTD – Return from Debug Mode

**Description**:

This instruction is an alternate mnemonic for the [RTE](#_RTE_–_Return) instruction. Restore the previous interrupt enable setting, register set and operating mode and transfer program execution back to the address in the debug exception link register. One of sixty-four semaphore registers may also be cleared. Semaphore register zero is always cleared by this instruction.

**Formats Supported**: RET

|  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- |
| ~10 | Sema6 | ~ | RO4 | ~ | 35h | RTE |

**Flags Affected**: none

**Operation:**

PMSTACK = PMSTACK >> 4

RSSTACK = RSSTACK >> 5

Semaphore[0] = 0

Semaphore[Sema6] = 0

PC = EPC

**Execution Units**: Branch

**Clock Cycles**:

**Exceptions**: none

**Notes**:

## RTE – Return from Exception

**Description**:

Restore the previous interrupt enable setting, register set and operating level and transfer program execution back to the address in the exception address register (EPC). One of sixty-four semaphore registers may also be cleared. Semaphore register zero is always cleared by this instruction.

This instruction may be encoded to return a short distance past the exception address point. This may be useful to return to the next instruction or return to a point past inline parameters. The RO4 field specifies a return offset in terms of instruction words.

There is really only a single instruction to return from any mode for an exception. Although there are several additional mnemonics. The processor “knows” which exception address to return to according to the register set currently in use.

**Formats Supported**: RTE

|  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- |
| ~10 | Sema6 | ~ | RO4 | ~ | 35h | RTE |

The constant field is not used.

**Flags Affected**: none

**Operation:**

PMSTACK = PMSTACK >> 4

RSSTACK = RSSTACK >> 5

Semaphore[0] = 0

Semaphore[Sema6] = 0

PC = EPC

**Execution Units**: Branch

**Clock Cycles**:

**Exceptions**: none

**Notes**:

## RTH – Return from Hypervisor Mode Subroutine

**Description**:

This instruction is an alternate mnemonic of the [RTE](#_RTE_–_Return) instruction.

Restore the previous interrupt enable setting, register set and operating mode and transfer program execution back to the address in the hyper-visor exception link register. One of sixty-four semaphore registers may also be cleared. Semaphore register zero is always cleared by this instruction.

**Formats Supported**: RTE

**Flags Affected**: none

**Operation:**

**Execution Units**: Branch

**Clock Cycles**:

**Exceptions**: none

**Notes:**

## RTI – Return from Interrupt Subroutine

**Description**:

This instruction is an alternate mnemonic for the [RTE](#_RTE_–_Return) instruction. Restore the previous interrupt enable setting, register set and operating level and transfer program execution back to the address in the exception address register (EPC). One of sixty-four semaphore registers may also be cleared. Semaphore register zero is always cleared by this instruction.

**Formats Supported**: RTE

|  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- |
| ~10 | Sema6 | ~ | RO4 | ~ | 35h | RTE |

The constant field is not used.

**Flags Affected**: none

**Operation:**

PMSTACK = PMSTACK >> 4

RSSTACK = RSSTACK >> 5

Semaphore[0] = 0

Semaphore[Sema6] = 0

PC = EPC

**Execution Units**: Branch

**Clock Cycles**:

**Exceptions**: none

**Notes**:

## RTL – Return from Leaf Subroutine

**Description**:

Transfer program execution to an address which is the sum of a value stored in a return register (ra0) and an offset (RO4) specified in the instruction. The return address register will have been previously set by a subroutine call JLR operation. Also add a constant to the stack pointer. This instruction, unlike other return operations, does not affect semaphores. The assembler assumes ra0 with an offset of one word is used unless otherwise specified.

The RO4 field is used to return to a point past the normal return point of the next instruction. This is useful in some circumstances such as the presence of inline subroutine parameters or exception handling code.

**Formats Supported**: RTS

|  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- |
| ~ | Constant12..0 | 31 | RO4 | Lk1 | 34h | RTL |

The constant field is shifted left three times and zero extended before being added to the stack pointer. The stack pointer may then be adjusted by up to 64kB.

The RO4 field specifies an offset in words for the return point from the calling instruction. Typically, this value would be one to cause a return to the next instruction. The RO4 field is shifted left twice before being added to the return address register (ra0). To skip over more words at the return site, adjust the RO4 field accordingly. This may be useful to skip over inline parameters or a short code sequence, perhaps for exception handling.

**Flags Affected**: none

**Operation:**

PC = Ra + RO4 \* 4

SP = SP + Constant \* 8

**Examples:**

RTL ; return (to 4[ra0]) from the subroutine

RTL #$200 ; return and add $200 to the stack pointer

RTL 4[ra1],#$400 ; return using ra1 instead of ra0, add onto stack pointer

RTL 20[ra0],#$30 ; return 20 bytes past calling address, adjust stack pointer by $30

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

**Notes**:

## RTM – Return from Machine Mode Subroutine

**Description**:

This instruction is an alternate mnemonic for the [RTE](#_RTE_–_Return) instruction. Restore the previous interrupt enable setting, register set and operating level and transfer program execution back to the address in the exception address register (EPC). One of sixty-four semaphore registers may also be cleared. Semaphore register zero is always cleared by this instruction.

**Formats Supported**: RTE

|  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- |
| ~10 | Sema6 | ~ | RO4 | ~ | 35h | RTE |

The constant field is not used.

**Flags Affected**: none

**Operation:**

PMSTACK = PMSTACK >> 4

RSSTACK = RSSTACK >> 5

Semaphore[0] = 0

Semaphore[Sema6] = 0

PC = EPC

**Execution Units**: Branch

**Clock Cycles**:

**Exceptions**: none

**Notes**:

## RTS – Return from Subroutine

**Description**:

Transfer program execution to an address which is the sum of the return address popped off the stack and an offset (RO4) specified in the instruction. The return address will have been previously pushed by a subroutine call JSR operation. Also add a constant to the stack pointer. This instruction, unlike other return operations, does not affect semaphores. The assembler assumes ra0 with an offset of one word is used unless otherwise specified.

The RO4 field is used to return to a point past the normal return point of the next instruction. This is useful in some circumstances such as the presence of inline subroutine parameters or exception handling code.

**Formats Supported**: RTS

|  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- |
| ~ | Constant12..0 | 31 | RO4 | ~1 | 33h | RTS |

The constant field is shifted left three times and zero extended before being added to the stack pointer. The stack pointer may then be adjusted by up to 64kB.

The RO4 field specifies an offset in words for the return point from the calling instruction. Typically, this value would be one to cause a return to the next instruction. The RO4 field is shifted left twice before being added to the popped address. To skip over more words at the return site, adjust the RO4 field accordingly. This may be useful to skip over inline parameters or a short code sequence, perhaps for exception handling.

**Flags Affected**: none

**Operation:**

PC = Memory[SP]64 + RO4 \* 4

SP = SP + Constant \* 8

**Examples:**

RTS ; return (to 4[ra0]) from the subroutine

RTS #$200 ; return and add $200 to the stack pointer

RTS 20,#$30 ; return 20 bytes past calling address, adjust stack pointer by $30

**Execution Units**: Branch

**Clock Cycles**: 1

**Exceptions**: none

**Notes**:

## RTU – Return from User Mode Exception

**Description**:

This instruction is an alternate mnemonic for the [RTE](#_RTE_–_Return) instruction. Restore the previous interrupt enable setting, register set and operating level and transfer program execution back to the address in the exception address register (EPC). One of sixty-four semaphore registers may also be cleared. Semaphore register zero is always cleared by this instruction.

**Formats Supported**: RTE

|  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- |
| ~10 | Sema6 | ~ | RO4 | ~ | 35h | RTE |

The constant field is not used.

**Flags Affected**: none

**Operation:**

PMSTACK = PMSTACK >> 4

RSSTACK = RSSTACK >> 5

Semaphore[0] = 0

Semaphore[Sema6] = 0

PC = EPC

**Execution Units**: Branch

**Clock Cycles**:

**Exceptions**: none

**Notes**:

## SRET – Return from Supervisor Mode Subroutine

**Description**:

This instruction is an alternate mnemonic for the [RTE](#_RTE_–_Return) instruction. Restore the previous interrupt enable setting, register set and operating level and transfer program execution back to the address in the exception address register (EPC). One of sixty-four semaphore registers may also be cleared. Semaphore register zero is always cleared by this instruction.

**Formats Supported**: RTE

|  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- |
| ~10 | Sema6 | ~ | RO4 | ~ | 35h | RTE |

The constant field is not used.

**Flags Affected**: none

**Operation:**

PMSTACK = PMSTACK >> 4

RSSTACK = RSSTACK >> 5

Semaphore[0] = 0

Semaphore[Sema6] = 0

PC = EPC

**Execution Units**: Branch

**Clock Cycles**:

**Exceptions**: none

**Notes**:

## WAI – Wait for Interrupt

**Description**:

The WAI instruction waits for an interrupt to occur stopping the processor clock until an interrupt occurs. This instruction is like the PFI instruction except that it stops and waits for an interrupt whereas PFI does not wait. WAI does not check for a non-maskable (NMI) interrupt or a reset (RST).

**Formats Supported**: OSR2

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| r | 12h5 | ~3 | ~ | ~ | Rd | 7Ah | WAI |

**Flags Affected**: none

**Operation:**

If (IRQ)

Cause Code = 50h | IRQ Level

PMSTACK = (PMSTACK << 4) | 8

EPC = PC + 4

PC = tvec[5]

Else

PC = PC (clock stopped)

**Execution Units**: Fetch stage

**Clock Cycles**:

**Exceptions**: none

**Notes**:

## WFI – Wait for Interrupt

**Description**:

WFI is an alternate mnemonic for the [WAI](#_WAI_–_Wait) instruction.

**Formats Supported**: OSR2

**Flags Affected**: none

**Operation:**

**Execution Units**: Fetch stage

**Clock Cycles**:

**Exceptions**: none

**Notes**

# Posit Arithmetic Instructions

## PABS – Posit Absolute Value

**Description:**

Take the absolute value of a posit number in register Prs1 and places the result into target register Prd. No rounding of the number occurs.

**Instruction Format: PST1**

**Clock Cycles: 1**

**Execution Units:** Posit Arithmetic

## PADD – Posit addition

**Description:**

Add two posit numbers in registers Prs1 and Prs2 or a short immediate value and place the result into target register Prd. The result is rounded.

**Instruction Format: PST2**

**Clock Cycles: 6**

**Execution Units:** Posit Arithmetic

## PCMP - Posit Compare

**Description:**

The register compare instruction compares two registers as posit values and sets the compare result register as a result.

The compare instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Instruction Format: PST2**

**Clock Cycles:** 0.5

**Execution Units:** Posit Arithmetic

**Operation:**

if Prs1 < Prs2

Cr.N = 1

else if Prs1 = Prs2

Cr.Z = 1

else

Cr.N = 0

Cr.Z = 0

if unordered (Prs1, Prs2)

Cr.V = 1

else

Cr.V = 0

## PDIV – Posit Divide

**Description:**

Divide two posit numbers in registers Prs1 and Prs2 and place the result into target register Prd.

**Instruction Format: PST2**

**Clock Cycles: 28 (est).**

**Execution Units:** Posit Arithmetic

## PMUL – Posit Multiplication

**Description:**

Multiply two posit numbers in registers Prs1 and Prs2 and place the result into target register Prd.

**Instruction Format: PST2**

**Clock Cycles: 7**

**Execution Units:** Posit Arithmetic

## PSUB – Posit Subtraction

**Description:**

Subtract two posit numbers in registers Prs1 and Prs2 and place the result into target register Prd.

**Instruction Format: PST2**

**Clock Cycles: 6**

**Execution Units:** Posit Arithmetic

# Floating Point Instructions

## Overview

The floating-point unit provides basic floating-point operations including addition, subtraction, multiplication, division, square root, and float to integer and integer to float conversions. The core contains two identical floating-point units. Only 64-bit precision floating-point operations are supported. The core features results caching, if the same operation is performed on the same values as is present in the cache then the result is returned in a single clock cycle.

The rounding mode is normally specified directly in the instruction. However, if the instruction indicates to use dynamic rounding mode then the rounding mode in the floating-point control and status register is used.

**Representation**

The floating-point format is like an IEEE-754 representation for double precision. Briefly,

**64-bit Precision Format:**

|  |  |  |  |
| --- | --- | --- | --- |
| 63 | 62 | 61 52 | 51 0 |
| SM | SE | Exponent | Mantissa |

SM – sign of mantissa

SE – sign of exponent

The exponent and mantissa are both represented as two’s complement numbers, however the sign bit of the exponent is inverted.

|  |  |
| --- | --- |
| SeEEEEEEEEEE |  |
| 11111111111 | Maximum exponent |
| …. |  |
| 01111111111 | exponent of zero |
| …. |  |
| 00000000000 | Minimum exponent |

The exponent ranges from -1023 to +1024

### Short Immediates

Some floating-point operations allow a short immediate format to be used as the second operand. These instructions include FADD, FSUB, FCMP, FMUL, FDIV, FSEQ, FSNE, FSLT, FSLE. The short immediate format assumes a positive number with four bits for the exponent and four for the mantissa. The range of these numbers is 2-7 to 28 with four bits of precision. The short immediate is converted into a 52-bit floating-point number before use.

|  |  |  |  |
| --- | --- | --- | --- |
|  | 7 | 6 4 | 3 0 |
| 0 | SE | Exp. | Mant. |

## FABS – Floating Absolute Value

**Description:**

Take the absolute value of a floating-point number in register Frs1 and places the result into target register Frd. The sign bit (bit 63) of the register is set to zero. No rounding of the number occurs.

**Instruction Format: FLT1**

**Clock Cycles: 1**

**Execution Units:** Floating Point

## FADD – Floating point addition

**Description:**

Add two floating point numbers in registers Frs1 and Frs2 or a short immediate value and place the result into target register Frd. The result is rounded according to the selected rounding mode in the instruction. If the rounding mode is encoded as 7 then the rounding mode used in the floating-point status register is used.

**Instruction Format: FLT2**

**Clock Cycles: 6**

**Execution Units:** Floating Point

## FCLASS – Classify Value

**Description**:

FCLASS classifies the value in register Frs1 and returns the information as a bit vector in the integer register Rd.

|  |  |
| --- | --- |
| Bit | Meaning |
| 0 | 1 = negative infinity |
| 1 | 1 = negative number |
| 2 | 1 = negative subnormal number |
| 3 | 1 = negative zero |
| 4 | 1 = positive zero |
| 5 | 1 = positive subnormal number |
| 6 | 1 = positive number |
| 7 | 1 = positive infinity |
| 8 | 1 = signalling nan |
| 9 | 1 = quiet nan |
| 10 to 62 | not used |
| 63 | 1 = negative, 0 = positive number |

## FCMP - Float Compare

**Description:**

The register compare instruction compares two registers as floating-point values and sets the compare result register as a result.

The compare instructions feature results merging, where the current value in the result register is logically combined with the new result. This allows several comparisons to be easily cascaded for use with a branch operation. The direct copy merging operation is the default and does not need to be specified in assembler code.

|  |  |  |
| --- | --- | --- |
| MOp3 | Mnemonic |  |
| 0 | .CPY | Cd = compare result |
| 1 | .OR | Cd = Cd | compare result |
| 2 | .AND | Cd = Cd & compare result |
| 3 | .ORCM | Cd = Cd | ~compare result |
| 4 | .ANDCM | Cd = Cd & ~compare result |
| 5 to 7 |  | reserved |

**Instruction Format: FLT2**

**Clock Cycles:** 0.5

**Execution Units:** Floating Point

**Operation:**

if Frs1 < Frs2

Cr.N = 1

else if Frs1 = Frs2

Cr.Z = 1

else

Cr.N = 0

Cr.Z = 0

if unordered (Frs1, Frs2)

Cr.V = 1

else

Cr.V = 0

## FCX – Clear Floating-Point Exceptions

**Description:**

This instruction clears floating point exceptions. The Exceptions to clear are identified as the bits set in the union of integer register Rs1 and an immediate field in the instruction. Either the immediate or Rs1 should be zero.

**Instruction Format: FLT1**

**Execution Units:** All Floating Point

**Operation:**

**Exceptions:**

|  |  |
| --- | --- |
| Bit | Exception Enabled |
| 0 | global invalid operation clears the following:   * division of infinities * zero divided by zero * subtraction of infinities * infinity times zero * NaN comparison * division by zero |
| 1 | overflow |
| 2 | underflow |
| 3 | divide by zero |
| 4 | inexact operation |
| 5 | summary exception |

## FDX – Floating Disable Exceptions

**Description:**

This instruction disables floating point exceptions. The Exceptions disabled are identified as the bits set in the union of integer register Rs1 and an immediate field in the instruction. Either the immediate or Rs1 should be zero. Exceptions will not be disabled until the instruction commits and the state of the machine is updated. This instruction should be followed by a synchronization instruction (FSYNC) to ensure that following floating point operations recognize the disabled exceptions.

|  |  |
| --- | --- |
| Bit | Exception Enabled |
| 0 | global invalid operation clears the following:   * division of infinities * zero divided by zero * subtraction of infinities * infinity times zero * NaN comparison * division by zero |
| 1 | overflow |
| 2 | underflow |
| 3 | divide by zero |
| 4 | inexact operation |
| 5 | summary exception |

**Instruction Format: FXX**

**Clock Cycles: 2**

**Execution Units:** Floating Point

## FDIV – Floating point divide

**Description:**

Divide two floating point numbers in registers Frs1 and Frs2 and place the result into target register Frd.

**Instruction Format: FLT2**

**Clock Cycles: 28 (est).**

**Execution Units:** Floating Point

## FEX – Floating Enable Exceptions

**Description:**

This instruction enables floating point exceptions. The Exceptions enabled are identified as the bits set in the union of integer register Rs1 and an immediate field in the instruction. Either the immediate or Rs1 should be zero. Exceptions won’t be enabled until the instruction commits, and the state of the machine is updated. This instruction should be followed by a synchronization instruction (FSYNC) to ensure that following floating point operations recognize the enabled exceptions.

**Instruction Format: FXX**

**Clock Cycles: 2**

**Execution Units:** Floating Point

## FINITE – Number is Finite

**Description:**

Test the value in Frs1 to see if it’s a finite number and return Z=1 or Z = 0 in compare result register Crt.

**Instruction Format: FLT1**

**Clock Cycles: 1**

**Execution Units:** Floating Point

**Example**:

finite $cr1,$f7

## FMUL – Floating point multiplication

**Description:**

Multiply two floating point numbers in registers Frs1 and Frs2 and place the result into target register Frd.

**Instruction Format: FLT2**

**Clock Cycles: 7**

**Execution Units:** Floating Point

## FNABS – Floating Negative Absolute Value

**Description:**

Take the negative absolute value of the floating-point number in register Fs1 and place the result into target register Frd. The sign bit (bit 63) of the register is set to one. No rounding of the number occurs.

**Instruction Format: FLT1**

**Clock Cycles: 1**

**Execution Units:** Floating Point

## FNEG – Floating Negative Value

**Description:**

Negate the value of the floating-point number in register Frs1 and place the result into target register Frd. The sign bit (bit 63) of the register is inverted. No rounding of the number occurs.

**Instruction Format: FLT1**

**Clock Cycles: 1**

**Execution Units:** Floating Point

## FRES – Reciprocal Estimate

**Description:**

This function uses a 1024 entry 16-bit precision lookup table to create a piece-wise approximation of the reciprocal and linear interpolation to approximate the reciprocal of the value in Frs1. The value is returned in Frd as a 64-bit floating-point value. The value returned is accurate to about eight bits.

**Instruction Format: FLT1**

**Clock Cycles: 5**

**Execution Units:** Floating Point

## FRSQRTE – Float Reciprocal Square Root Estimate

**Description:**

Estimate the reciprocal of the square root of the number in register Frs1 and place the result into target register Frd.

**Instruction Format: FLT1**

**Clock Cycles: 5**

**Execution Units:** Floating Point

**Notes**:

The estimate is only accurate to about 3%. The estimate is performed in single precision (32-bit) floating point, then converted to a 64-bit format. That means that input values must in the range of a 32-bit floating point number. Values outside of this range will return infinity or zero as a result.

Taking the reciprocal square root of a negative number results in a Nan output.

## FSIGN – Floating Sign

**Description:**

FSIGN returns a value indicating the sign of the floating-point number. If the value is zero, the target register is set to zero. If the value is negative the target register is set to the floating-point value -1.0. Otherwise the target register is set to the floating-point value +1.0. No rounding of the result occurs.

**Instruction Format: FLT1**

**Clock Cycles: 1**

**Execution Units:** Floating Point

## FSQRT – Floating point square root

**Description:**

Take the square root of the floating-point number in register Frs1 and place the result into target register Frd. The sign bit (bit 63) of the register is set to zero. This instruction can generate NaNs.

**Instruction Format: FLT1**

**Clock Cycles: 64 (est).**

**Execution Units:** Floating Point

## FSUB – Floating point subtraction

**Description:**

Subtract two floating-point numbers in registers Frs1 and Frs2 and place the result into target register Frd.

**Instruction Format: FLT2**

**Clock Cycles: 6**

**Execution Units:** Floating Point

## FSYNC -Synchronize

**Description**:

All floating-point instructions before the FSYNC are completed and committed to the architectural state before floating-point instructions after the FSYNC are issued. This instruction is used to ensure that the machine state is valid before subsequent instructions are executed.

**Instruction Format**: FSYNC

**Clock Cycles**: varies depending on queue contents

## FTOI – Floating Convert to Integer

**Description:**

Convert the floating-point value in Frs1 into an integer and place the result into a target register. The target register may be either another floating-point register or an integer register. If the result overflows the value placed in the target is a maximum integer value. Note that the result in the target register is no longer of a floating-point representation.

**Instruction Format: FLT1**

**Clock Cycles: 3**

**Execution Units:** Floating Point

## FTRUNC – Truncate Value

**Description**:

The FTRUNC instruction truncates off the fractional portion of the number leaving only a whole value. For instance, ftrunc(1.5) equals 1.0. Ftrunc does not change the representation of the number. To convert a value to an integer in a fixed-point representation see the FTOI instruction.

**Instruction Format**: FLT1

**Clock Cycles**: 1

**Execution Units:** Floating Point

## ISNAN – Is Not a Number

**Description:**

Test the value in Frs1 to see if it’s a nan (not a number) and return true Z=1 or false Z=0 in compare result register Cd.

**Instruction Format: FLT1**

**Clock Cycles: 1**

**Execution Units:** Floating Point

**Example**:

isnan $cr1,$f7

## ITOF – Convert Integer to Float

**Description:**

Convert the integer value in Rs1 into a floating-point value and place the result into target register Ft. Rs1 is from either the floating-point register set or the integer register set, Frd is in the floating-point register set. Some precision of the integer converted may be lost if the integer is larger than 52 bits. 64-bit precision floating-point values only have a precision of 52 bits.

**Instruction Format: FLT1**

**Clock Cycles: 3**

**Execution Units:** Floating Point

# Operating Systems Support

## Overview

There is a hodgepodge of instructions that are used primarily by the operating system. These instructions have been collected as the operating systems support group of instructions. Most of them can not be executed by user code.

## CACHE – Cache Command

CACHE Cmd, [Rn]

**Description:**

This instruction commands the cache controller to perform an operation. Commands are summarized in the command table below. Commands may be issued to both the instruction and data cache at the same time. The address of the cache line to be invalidated is passed in Rs1 if needed.

**Instruction Formats**: CACHE

**Commands:**

|  |  |  |
| --- | --- | --- |
| IC2 | Mne. | Operation |
| 0 | NOP | no operation |
| 1 | invline | invalidate line associated with given address |
| 2 | invall | invalidate the entire cache (address is ignored) |
|  |  |  |

|  |  |  |
| --- | --- | --- |
| DC3 | Mne. | Operation |
| 0 | NOP | no operation |
| 1 | enable | enable cache (instruction cache is always enabled) |
| 2 | disable | not valid for the instruction cache |
| 3 | invline | invalidate line associated with given address |
| 4 | invall | invalidate the entire cache (address is ignored) |
|  |  |  |

Notes:

## GCCLR – Garbage Collect Clear Memory

**Description**:

The GCCLR instruction returns the current status of card memory associated with garbage collection identified by Rs1 in Rd then sets the memory to the value in Rs2 (normally x0) in preparation for a subsequent garbage collection pass. The garbage collector can tell very quickly whether a store pointer has occurred in a memory region. It needs only examine two bytes at the innermost layer. The GCCLR instruction returns the status 64 bits at a time.

**Rs1 Format:**

|  |  |  |  |
| --- | --- | --- | --- |
| 61 32 | 31 | 30 28 | 27 0 |
| ~ | P | Layer | Address27..0 |

The P bit causes the instruction to preserve the current contents of the card and allows reading the memory without clearing it.

|  |  |  |
| --- | --- | --- |
| Layer | Size | Address field (64-bit chunks) |
| 0 | 2 bytes | 0 |
| 1 | 256 bytes | 0 to 31 |
| 2 | 256kB | 0 to 32767 |
|  |  |  |

**Instruction Format**: GCCLR

**Exceptions**: none

## GCSUB – Garbage Collect Subtract

**Description**:

Subtract Rs2 or an immediate value from Rs1 and place the result in the destination register Rd. Also clear the garbage collect interrupt enable bit in the user interrupt enable CSR (CSR $004) and load a lockout count into an internal instruction count register. Once the lockout count has expired the interrupt enable bit will be set enabling GC interrupts. The value loaded into the lockout count is four plus the value in Rs2 or the immediate value shift right twice.

**Instruction Format**: R2, RI

**Exceptions:** none

## MVMAP – Move Mapping Register

**Description**:

MVMAP instruction is used for mapping memory pages into the address space of a task.

MVMAP works in a manner like the CSR instruction, but is applied for mapping register access only. Register Rs2 indirectly identifies the map register to access. Note that Rs2 is an integer register that contains the map register number. Rs1 identifies new source data for the map register, and Rd specifies the register to put the current map register value into. New source data and the current data in the map register are swapped in an atomic fashion.

Specifying Rs1 as x0 causes the map move operation to only output the current map value without updating it.

The Rs2 field specifies a 32-bit value broken into two fields. The low order twelve bits are a map register number for a given task. Bits 16 to 20 specify the task number for which the map is updated. The mapping register is only nine bits wide. Upper bits from the source register are ignored.

### Rs2 Value Format

|  |  |  |  |
| --- | --- | --- | --- |
| 31 21 | 20 16 | 15 12 | 11 0 |
| ~ | ASID | ~ | Virtual Page Number |

### Rs1 / Rd Value Format

|  |  |  |  |
| --- | --- | --- | --- |
| 31 21 | 20 16 | 15 14 | 13 0 |
| ~ | ~ | ~ | Physical Page Number |

**Instruction Format**: OSR2

**Execution Units**: OSU

**Exceptions**: none

## MVSEG – Move Segment Register

**Description**:

MVSEG works in a manner like the CSR instruction, but is applied for segment register access only. Register Rs2 indirectly identifies the segment register to access. Note that Rs2 is an integer register that contains the segment register number. Rs1 identifies source data for the segment register, and Rd specifies the register to put the current segment register value into. New source data and the current data in the segment register are swapped in an atomic fashion.

*The MVSEG instruction works in an indirect fashion so that the segment register specified may come from a variable, possibly in a loop. There could also be a direct move to / from segment register instruction, but segment register manipulation is governed by the OS and infrequently done.*

**Instruction Format**: MVSEG

**Exceptions**: none

## PEEKQ – Peek at Queue

**Description**:

This instruction returns the top value into Rd from the hardware queue specified in Rs1. The hardware queue position is not advanced. Unused value bits should read as zero. Used the STATQ instruction to get the queue status.

**Instruction Format**: PEEKQ

**Exceptions:** none

## PFI – Poll for Interrupt

**Description**:

This instruction causes the processor to check for the presence of an interrupt then perform interrupt processing if an interrupt is present. Otherwise program execution continues with the next instruction. Interrupts do no have to be enabled for the PFI instruction to perform interrupt processing. Effectively PFI temporarily enables interrupts for the duration of the instruction.

**Instruction Format**: PFI

**Exceptions**: none

## POPQ – Pop from Queue

**Description**:

This instruction pops a value into Rd from the hardware queue specified in Rs1. The hardware queue position is advanced. Unused value bits should read as zero. To check the queue status, use the STATQ instruction.

|  |
| --- |
| 63 0 |
| Value |

Value: the value that was pushed to the queue

**Instruction Format**: PUSHQ

**Exceptions:** none

## PUSHQ – Push on Queue

**Description**:

This instruction pushes an N-bit value in Rs1 onto the hardware queue specified in Rs2. Where N is implementation defined between 1 and 64 bits. To check the queue status, use the STATQ instruction.

**Instruction Format**: PUSHQ

**Exceptions:** none

## SETKEY – Set Memory Key

**Description**:

The SETKEY instruction both gets the current key and sets the memory key for a given physical page of memory. Memory keys are stored in their own dedicated memory, one key for each memory page in the system. Register Rs1 is used to specify both which key to set and the key value to set. The lower 20 bits of Rs1 specify the key value, bits 32 to 45 specify the memory page. The current value of the key is returned in Rd.

It is possible to simply retrieve the key without setting it. If the G bit is set in Rs1 the key will be retrieved without setting it.

*The author feels that using a dedicated instruction for the purpose of setting a memory key is worth it for the conveyance of meaning in software programs. While the key memory could be part of the memory map and accessible with load / store instructions it is not implemented as such.*

*There is room to increase the size of the key or the number of memory pages in the register value format.*

**Rs1 Format:**

|  |  |  |  |  |
| --- | --- | --- | --- | --- |
| 63 46 | 45 32 | 31 | 30 20 | 19 0 |
| ~ | Physical Page Number | G | ~ | Key Value |

**Instruction Format**: OSR2

**Exceptions**: none

Example:

|  |
| --- |
| LDI $a1,#$040080000000  SETKEY $a0,$a1 ; get key for page $400 |

## STATQ – Get Status of Queue

**Description**:

This instruction returns a queue status value into Rd from the hardware queue specified in Rs1. The hardware queue position is not advanced. Unused value bits should read as zero.

|  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- |
| 63 | 62 | 61 54 | 53 48 | 47 0 | 9 0 |
| Qe | Dv | ~ | ~ | ~ | Data Count |

Fields

Qe: queue empty.If set, this bit indicates that the queue is empty.

Dv: data valid. If this bit is set it indicates that valid data is present at the queue.

Dc: data count: The number of items left in the queue

**Instruction Format**: POPQ

**Exceptions:** none

## TLBRW – Read / Write TLB

**Description**:

This instruction both reads and writes the TLB. Which translation entry to update comes from the value in Rs1. The update value comes from the value in Rs2. Rs2 contains the virtual page number, ASID, and physical page number. The current value of the entry selected by Rs1 is copied to Rd. The TLB will be written only if bit 63 of Rs1 is set.

The entry number for Rs1 comes from virtual address bits 14 to 23.

Page numbers are in terms of a 16kB page size.

Rs1 Value Format

|  |  |  |  |
| --- | --- | --- | --- |
| 63 | 62 12 | 11 10 | 9 0 |
| w | ~ | way | entry no |

Rs2/Rd Value Format

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
| 63 56 | 55 | 54 | 53 | 5248 | 47 32 | 31 20 | 19 0 |
| ASID | G | D | A | ~ | Virtual Page Number | ~ | Physical Page Number |

|  |  |  |
| --- | --- | --- |
| Bits |  | Meaning |
| 0 to 19 | PPN | Physical page number |
| 20 to 31 | ~ | reserved (expansion of physical page number) |
| 32 to 49 | VPN | Virtual page number high address order bits 24 to 39 |
| 48 to 52 | ~ | reserved (expansion of virtual page number) |
| 53 | A | Accessed, set if translation was used |
| 54 | D | Dirty, set if a write occurred to the page |
| 55 | G | Global, global translation indicator |
| 56 to 63 | ASID | ASID address space identifier |

**Instruction Format**: OSR2

**Exceptions:** none

# Assembler

## Overview

The assembler is a flexible and powerful macro assembler capable of generating several types of output file from the input source. The assembler is flexible in its recognition of mnemonics. There are many alternate mnemonics to the standard set that are recognized to aid porting software from other architectures.

## Recognized Mnemonics

## Register Names

Registers may optionally be preceded by a ‘$’ as in $a0. But the assembler will recognize register names without the ‘$’ as well.

The assembler will recognize the general-purpose registers either by their ABI usage names or by the name ‘xn’ as in ‘x0’, ‘x1’, ‘x2’ and so on. General-purpose registers given specific usages in the ABI may also be referenced by that set of names. For instance, the first argument register may be referenced as ‘a0’ which is also ‘x20’. The names of the registers recognized by the assembler are shown in the table below.

|  |  |  |
| --- | --- | --- |
| Register | Description / Suggested Usage | Saver |
| x0 | always reads as zero (hardware) |  |
| x2 | constant building / temporary (cb) |  |
| x3-x9 | temporaries (t0-t6) | caller |
| x10-x19 | register variables (s0-s9) | callee |
| x20-x27 | function arguments (a0-a7) | caller |
| x28 | thread pointer (tp) |  |
| x29 | global data pointer (gp) | callee |
| x30 | base / frame pointer (fp) | callee |
| x31 | current stack pointer (sp) | callee |
|  |  |  |
| cr0-cr3 | compare results |  |
| ra0 | return address register |  |
| ra1 | alternate return address register |  |
| cn | code index register |  |
|  |  |  |
| eip | exceptioned instruction pointer |  |

$cr refers to all the compare results registers as an aggregate.

$cr0 to $cr3 refer to individual compare results registers. These registers may be referenced by load and store operations, but it is more efficient to load or store all the registers at the same time by referencing $cr in the load / store instruction.

The assembler will interpret $ra as if it were $ra0 when specifying the return address register.

## Pseudo Operations

The assembler supports an assortment of pseudo ops which give commands to the assembler as it assembles programs. Pseudo ops may be preceded with a dot ‘.’ as in ‘.include’.

|  |  |  |
| --- | --- | --- |
| .align < num > | align the current segment according to the specified number. This causes the assembler to output bytes until the requested alignment is reached. |  |
| .bss | indicates the current section is uninitialized data |  |
| .code | indicates that current section is a code section, the code section may also include read-only data. |  |
| .data | indicates the current section is pre-initialized data |  |
| .include <file path> | includes another file for processing |  |
| .message | .message displays a message to the console as the assembler is working. It can be used as a debug aid. |  |
| .org | sets the origin point (address) for the current section |  |
| .rodata | indicates that the current section is read-only-data. |  |
|  |  |  |
|  |  |  |
|  |  |  |

### .align

Align the current segment according to the specified number. This causes the assembler to output bytes until the requested alignment is reached. The output byte depends on the section. For code segments a NOP byte is output to allow the code to be executable. An align may then be placed in the middle of a section of code, this would be done to align loops for instance. For other sections, a zero byte is output.

Example: aligning code

|  |
| --- |
| or $t5,$t5,$t3 ; t5 = max in ASID  ; Align code to fit loop onto cache line (NOPs output)  **align 16**  .0001:  mvmap. $a0,$x0,$t2 ; get map entry into a0  beq $cr0,.empty0 ; is it empty?  add $t2,$t2,#1  cmp $cr0,$t2,$t5  bltu $cr0,.0001 |

### .bss

Specifies the current section is uninitialized data. The section will not contain any data and will not occupy space in the output file. However, address values are still calculated for the .bss section as if it were present.

### .org

.org will set the section’s address the first time it is encountered in a section. After the first encounter the .org directive will fill the section with zero bytes up until the specified origin point is reached. Care should be taken if specifying org more than once in a section as it can result in the generation of large files.

## Defining Constants

Constants may be defined in any section using one of the constant definition pseudo ops. Multiple constants may be listed under the same pseudo-op by separating them with commas. Strings of constants may be defined by enclosing them in double quotes. Individual character constants can be defined by enclosing the value in single quotes.

|  |  |
| --- | --- |
| Pseudo-op | Usage |
| dcb | define constant bytes |
| dcw | define constant wydes (16 bit values) |
| dct | define constant tetra-bytes |
| dco | define constant octa-bytes |
| dch | define constant hexi-byte |

Example: the following example defines a table of wyde values, used to classify characters.

|  |
| --- |
| public rodata  dcw 0  \_\_ctyptbl:  dcw 0, \_BB, \_BB, \_BB, \_BB, \_BB, \_BB, \_BB  dcw \_BB, \_CN, \_CN, \_CN, \_CN, \_CN, \_BB, \_BB  dcw \_BB, \_BB, \_BB, \_BB, \_BB, \_BB, \_BB, \_BB  dcw \_BB, \_BB, \_BB, \_BB, \_BB, \_BB, \_BB, \_BB  dcw \_SP, \_PU, \_PU, \_PU, \_PU, \_PU, \_PU, \_PU  dcw \_PU, \_PU, \_PU, \_PU, \_PU, \_PU, \_PU, \_PU  dcw XDI, XDI, XDI, XDI, XDI, XDI, XDI, XDI  dcw XDI, XDI, \_PU, \_PU, \_PU, \_PU, \_PU, \_PU  dcw \_PU, XUP, XUP, XUP, XUP, XUP, XUP, \_UP  dcw \_UP, \_UP, \_UP, \_UP, \_UP, \_UP, \_UP, \_UP  dcw \_UP, \_UP, \_UP, \_UP, \_UP, \_UP, \_UP, \_UP  dcw \_UP, \_UP, \_UP, \_PU, \_PU, \_PU, \_PU, \_PU  dcw \_PU, XLO, XLO, XLO, XLO, XLO, XLO, \_LO  dcw \_LO, \_LO, \_LO, \_LO, \_LO, \_LO, \_LO, \_LO  dcw \_LO, \_LO, \_LO, \_LO, \_LO, \_LO, \_LO, \_LO  dcw \_LO, \_LO, \_LO, \_PU, \_PU, \_PU, \_PU, \_BB  endpublic |

Example: the following example defines a character string for an error message.

|  |
| --- |
| msgNumTooBig dcb "Number is too big",CR,0 |

## Macros

### Overview

Macros allow text associated with the name to be substituted wherever the macro name is found. Macros must be defined before they are referenced in the source text. Macros may be defined with parameters and accept arguments whenever the macro is invoked.

Macros allow short sequences of code that are repeated in several places to be implemented like a subroutine call that does not have a return instruction because it is placed inline with existing code. The macro may be used to copy source to another place in the program. Macros are referenced by their name in the source text and when encountered by the assembler, the assembler will expand out the macro name to its constituent body.

### Nesting

Macros may contain references to other macros. Care must be taken as expanded macros could be become quite large.

### Macro Definition Syntax

Macros start with the ‘macro’ keyword followed by the name of the macro and end with the ‘endm’ keyword. The code sample below shows basic macro usage.

|  |
| --- |
| macro mSleep(tm)  ldi $a0,#5 ; FMTK Sleep() function  ldi $a1,#tm  brk #240  endm |

Parameters

Parameters to a macro follow the macro name and are listed enclosed in round brackets separated by commas.

### Macro Instances

Each time a macro is instanced it is given an instance number to allow the generation of local variable and label names. The instance number is represented with an ‘@’ symbol in the text. The following example shows the usage of the instance number.

### Local Labels

Labels local to the macro may be generated using the macro instance number indicator ‘@’ after the label name.

|  |
| --- |
| macro mWaitForFocus  .WFF1@:  mov $t2,$a1  mHasFocus  tst $a1  bne .HasFocus@  ldi $a0,#26 ; FMTK\_IO  mov $a1,$t2  ldi $a2,#9 ; peekchar function  brk #240  cmp $a0,#$14 ; CTRL-T  bne .WFF2@  ; eat up the CTRL-T  ldi $a0,#26 ; FMTK\_IO  mov $a1,$t2  ldi $a2,#8 ; getchar function  brk #240  ldi $a0,#21 ; switch IO Focus  brk #240  jmp .WFF1@  .WFF2@:  mSleep(1)  jmp .WFF1@  .HasFocus@:  endm |

## Linking Capabilities

The assembler is capable of source code linking. Source code library files may be included using the ‘.include’ directive. The assembler will eventually filter out unreferenced pieces of code and data that are declared public so, the final version of the output only includes the sources used. The assembler loads all the source into memory and builds one master buffer full of all the included source code. Memory usage may be substantial depending on the sources included in the build. The assembler sorts and groups the same sections from different files together. All the code is placed at the beginning, followed by all the read-only data, followed by pre-initialized data. It is placed in a format to aid the program loader.

# PIT – Programmable Interval Timer

## Overview

Many systems have at least one timer. The timing device may be built into the cpu, but it is frequently a separate component on its own. The programmable interval timer has many potential uses in the system. It can perform several different timing operations including pulse and waveform generation, along with measurements. While it is possible to manage timing events strictly through software it is quite challenging to perform in that manner. A hardware timer comes into play for the difficult to manage timing events. A hardware timer can supply precise timing. In the test system there are two groups of three timers. Timers are often grouped together in a single component. The PIT is a 32-bit peripheral as that is all that is needed. The PIT while powerful turns out to be one of the simpler peripherals in the system.

## System Usage

One programmable timer component, which includes three timers, is used to generate the system time slice interrupt and timing controls for system garbage collection. The second timer component is used to aid the paged memory management unit. There is a free timing channel on the second timer component.

Each PIT is given a 256-byte memory range to respond to for I/O access. As is typical for I/O devices part of the address range is not decoded to conserve hardware.

PIT#1 is located at $FFFFFFFDC11xx

PIT#2 is located at $FFFFFFFDC12xx

## Registers

The PIT has 12 registers addressed as 32-bit I/O cells. It occupies 64 consecutive I/O locations. All registers are read-write except for the current counts which are read-only. The control registers all refer to a single control register which is accessible at three different addresses. Current count, max count and on time are all 32-bit accessible; all 32 bits must be read or written. The control register is byte accessible. It is possible to update only part of the control register. There is a separate byte for control information for each counter in the control register.

|  |  |  |  |
| --- | --- | --- | --- |
| Regno | Access | Moniker | Purpose |
| 00 | R | CC0 | Current Count |
| 04 | RW | MC0 | Max count |
| 08 | RW | OT0 | On Time |
| 0C | RW | CTRL | Control |
| 10 | R | CC1 | Current Count |
| 14 | RW | MC1 | Max count |
| 18 | RW | OT1 | On Time |
| 1C | RW |  | Control – this is the same register as 0C |
| 20 | R | CC2 | Current Count |
| 24 | RW | MC2 | Max count |
| 28 | RW | OT2 | On Time |
| 2C | RW |  | Control – this is the same register as 0C |
| 30 to 3C |  |  | reserved |
|  |  |  |  |

### Control Register

The control register is split into four independent bytes. Three of the bytes control the individual timers. The fourth byte is unused. All three timer control bytes work in the same fashion so only one is described below. The very same control register is accessible at three different I/O locations. By locating all three timer controls in a single register it is possible to precisely synchronize timing operations.

|  |  |  |  |
| --- | --- | --- | --- |
| 31 24 | 23 16 | 15 8 | 7 0 |
| not used | Timer#2 Control | Timer#1 Control | Timer#0 Control |

#### Timer Control Byte

|  |  |  |
| --- | --- | --- |
| Bit |  | Purpose |
| 0 | LD | setting this bit will load max count into current count, this bit automatically resets to zero. |
| 1 | CE | count enable, if 1 counting will be enabled, if 0 counting is disabled and the current count register holds its value. On counter underflow this bit will be reset to zero causing the count to halt unless auto-reload is set. |
| 2 | AR | auto-reload, if 1 the max count will automatically be reloaded into the current count register when it underflows. |
| 3 | XC | external clock, if 1 the counter is clocked by an external clock source. The external clock source must be of lower frequency than the clock supplied to the PIT. The PIT contains edge detectors on the external clock source and counting occurs on the detection of a positive edge on the clock source. |
| 4 | GE | gating enable, if 1 an external gate signal will also be required to be active high for the counter to count, otherwise if 0 the external gate is ignored. Gating the counter using the external gate may allow pulse-width measurement. |
| 5 to 7 | ~ | not used, reserved |
|  |  |  |

### Current Count

This register reflects the current count value for the timer. The value in this register will change by counting downwards whenever a count signal is active. The current count may be automatically reloaded at underflow if the auto reload bit (bit #2) of the control byte is set. The current count may also be force loaded to the max count by setting the load bit (bit #0) of the counter control byte.

### Max Count

This register holds onto the maximum count for the timer. It is loaded by software and otherwise does not change. When the counter underflows the current count may be automatically reloaded from the max count register.

### On Time

The on-time register determines the output pulse width of the timer. The timer output is low until the on-time value is reached, at which point the timer output switches high. The timer output remains high until the counter reaches zero at which point the timer output is reset back to zero. So, the on time reflects the length of time the timer output is high. The timer output is low for max count minus the on-time clock cycles.

## Programming

The PIT is a memory mapped i/o device. The PIT is programmed using 32-bit load and store instructions (LDT and STT). Byte loads and stores (LDB, STB) may be used for control register access. It must reside in the non-cached address space of the system.

# PIC – Programmable Interrupt Controller

## Overview

The programmable interrupt controller manages interrupt sources in the system and presents an interrupt signal to the cpu. If two interrupts occur at the same time the controller resolves which interrupt the cpu sees. While the CPU’s interrupt input is only level sensitive the PIC may process interrupts that are either level or edge sensitive. the PIC is a 32-bit I/O device.

## System Usage

There is just a single interrupt controller in the system. It supports 31 different interrupt sources plus a non-maskable interrupt source.

PIC#1 is located at $FFDC0Fxx.

### Priority Resolution

Interrupts have a fixed priority relationship with interrupt #1 having the highest priority and interrupt #31 the lowest. Note that interrupt priorities are only effective when two interrupts occur at the same time.

## Registers

The PIC contains 40 registers spread out through a 256 byte I/O region. All registers are 32-bit and only 32-bit accessible. There are two different means to control interrupt sources. One is a set of registers that works with bit masks enabling control of multiple interrupt sources at the same time using single I/O accesses. The other is a set of control registers, one for each interrupt source, allowing control of interrupts on a source by source basis.

|  |  |  |  |
| --- | --- | --- | --- |
| Regno | Access | Moniker | Purpose |
| 00 | R | CAUSE | interrupt cause code for currently interrupting source |
| 04 | RW | RE | request enable, a 1 bit indicates interrupt requesting is enabled for that interrupt, a 0 bit indicates the interrupt request is disabled. |
| 08 | W | ID | Disables interrupt identified by low order five data bits. |
| 0C | W | IE | enables interrupt identified by low order five data bits |
| 10 |  |  | reserved |
| 14 | W | RSTE | resets the edge-sense circuit for edge sensitive interrupts, 1 bit for each interrupt source. This register has no effect on level sensitive sources. This register automatically resets to zero. |
| 18 | W | TRIG | software trigger of the interrupt specified by the low order five data bits. |
| 20 | W | ESL | The low bit for edge sensitivity selection. ESL and ESH combine to form a two bit select of the edge sensitivity.   |  |  | | --- | --- | | ESH,EHL | Sensitivity | | 00 | level sensitive interrupt | | 01 | positive edge sensitive | | 10 | negative edge sensitive | | 11 | either edge sensitive | |
| 24 | W | ESH | The high bit for edge sensitivity selection |
| 80 | RW | CTRL0 | control register for interrupt #0 |
| 84 | RW | CTRL1 | control register for interrupt #1 |
| … |  | … |  |
| FC | RW | CTRL31 | control register for interrupt #31 |

### Control Register

All the control registers are identical for all interrupt sources, so only the first control register is described here.

|  |  |  |
| --- | --- | --- |
| Bits |  |  |
| 0 to 7 | CAUSE | The cause code associated with the interrupt; this register is copied to the cause register when the interrupt is selected. |
| 8 to 11 | IRQ | This register determines which signal lines of the cpu are activated for the interrupt. |
| 12 | IE | This is the interrupt enable bit, 1 enables the interrupt, 0 disables it. This is the same bit reflected in the IE register. |
| 13,14 | ES | This bit controls edge sensitivity for the interrupt 00 = level, 01 = pos. edge sensitive, 10 = neg. edge sensitive. 11 = either edge sensitive. These same bits are present in the ESL, ESH registers. |
| 15 to 31 |  | reserved |

# UART – Universal Asynchronous Receiver / Transmitter

## Overview

A UART component (Universal Asynchronous Transmitter / Receiver) is used for the asynchronous transmission and reception of data. Asynchronous referring to the lack of a clock signal during transmission or reception.

uart6551 is a WDC6551 register compatible uart. The uart is a 32-bit peripheral device. It may be used as an eight-bit peripheral by connecting the high order 24-bit data input lines to ground, and grounding select lines one to three.

Baud rate is controlled by clock divider which assumes a 200MHz baud reference clock input. If a different clock frequency is used, then the divider table will need to be updated. The baud rate may also be controlled via a clock divider register. This register is 24 bits so gives a minimum frequency of 11.92 Hz assuming a 200MHz clock. (200MHz / 2^24).

## Special Features

* WDC6551 register compatibility

## System Usage

The uart is located at $FFDC0A0x

## Registers

There are only four registers in the design. The function of the low order eight bits of the registers matches the 6551 function. The controller honors byte lane selects so only the portion of the register selected is written.

|  |  |  |
| --- | --- | --- |
| Reg | Moniker | Description |
| 00 | UART\_TRB | Transmit and receive buffer. Data written is transmitted, on a read data available is read. Also reads / writes the clock multiplier if access to clock multiplier is enabled. |
| 04 | UART\_STAT | Status Register. Returns status bits on a read, a write of any value will cause a reset of some of the command register bits |
| 08 | UART\_CMD | Command register |
| 0C | UART\_CTRL | Control register |

### UART\_TRB

This register is 32-bits wide of which only the lower eight bits are used to transmit or receive data by the uart. Data written to the register is transmitted. A register read returns data received by the uart. When the fifo’s are enabled writing to this register writes to the transmit fifo. Reading this register reads the receive fifo. If clock divider access is enabled (via control register bit 31) then this register allows modifying or reading the clock divider value. Writing a clock divider value to this register automatically switches the function back to transmit / receive.

### UART\_STAT

Uart status register. Writing any value to the status register resets some of the uart’s command bits.

|  |  |  |
| --- | --- | --- |
| Bit | Status |  |
| 0 | Parity Error | 1 = parity error occurred, 0 = no error |
| 1 | Framing Error | 1 = framing error |
| 2 | Overrun | 1 = overrun |
| 3 | Rx Full | 1 = receiver data available |
| 4 | Tx Empty | 1 = open slot in transmit fifo |
| 5 | DCD | 0 = data carrier present |
| 6 | DSR | 0 = data set ready |
| 7 | IRQ | 1 = irq occurred |
|  | **Additional Line Status Byte** | |
| 8 | reserved |  |
| 9 | reserved |  |
| 10 | reserved |  |
| 11 | reserved |  |
| 12 | Break received | 1 if a break signal is received |
| 13 | Tx Full | 1 = transmit fifo full |
| 14 | reserved |  |
| 15 | G Rcv Err | 1 = global receiver error (set if any error status is set) |
|  | **Additional Modem Status Byte** | |
| 16 | CTS | 1 = CTS line changed state |
| 17 | DSR | 1 = DSR line changed state |
| 18 | RI | 1 = RI line changed state |
| 19 | DCD | 1 = DCD line changed state |
| 20 | CTS | CTS state |
| 21 | reserved |  |
| 22 | RI | RI state |
| 23 | reserved |  |
|  | **IRQ Status** | |
| 24,25 | zero | these two bits are zero |
| 26 to 28 | IRQENC | encoded irq value (0 to 7) |
| 29 to 30 | reserved |  |
| 31 | irq | IRQ is set |

### UART\_CMD

|  |  |  |
| --- | --- | --- |
| Bit |  |  |
| 0 | DTR | output 1 = low, 0 = high |
| 1 | RxIe | receiver interrupt enable 0 = enabled, 1 = disabled |
| 2,3 | RTS Control |  |
|  | 00 | output RTS high |
|  | 01 | output RTS low, enable transmit interrupt |
|  | 10 | output RTS low, |
|  | 11 | output RTS low, send a break signal |
| 4 | LLB | 1 = local loopback (receiver echo) |
| 5 to 7 | Parity Control |  |
|  | 000 | no parity |
|  | 001 | odd parity |
|  | 011 | even parity |
|  | 101 | transmit mark parity (parity error disabled) |
|  | 111 | transmit space parity (parity error disabled) |
| 8 | LSIe | line status change interrupt enable 1 = enabled |
| 9 | MSIe | modem status change interrupt enable 1 = enabled |
| 10 | RxToIe | receiver timeout interrupt enable 1 = enabled |
| 11 to 31 | reserved |  |

### UART\_CTRL

|  |  |  |
| --- | --- | --- |
| Bit |  |  |
| 0 to 3 | Baud Rate |  |
|  | |  |  | | --- | --- | | 0000 | Use 16x external clock | | 0001 | 50 | | 0010 | 75 | | 0011 | 109.92 | | 0100 | 134.58 | | 0101 | 150 | | 0110 | 300 | | 0111 | 600 | | 1000 | 1200 | | 1001 | 1800 | | 1010 | 2400 | | 1011 | 3600 | | 1100 | 4800 | | 1101 | 7200 | | 1110 | 9600 | | 1111 | 19200 | | This table is expanded using an extra control bit #27. |
| 4 | Rx clock source | 1 = external, 0 = baud rate generator |
| 5,6 | Word length   |  |  | | --- | --- | | 00 | 8 | | 01 | 7 | | 10 | 6 | | 11 | 5 | | code for word length in bits |
| 7 | Stop Bit   |  |  | | --- | --- | | 0 | 1 | | 1 | 1 if 8 bits and parity | | 1 | 1.5 if 5 bits and no parity | | 1 | 2 otherwise | |  |
| 8 to 15 | reserved | do not use |
| 16 | Fifo enable | 1 = fifo’s enabled |
| 17 | Rx Fifo Clear | 1 = clear receiver fifo |
| 18 | Tx Fifo Clear | 1 = clear transmit fifo |
| 19 | reserved |  |
| 20,21 | Transmit Threshold   |  |  | | --- | --- | | 0 | 1 byte | | 1 | ¼ full | | 2 | ½ full | | 3 | ¾ full | | Threshold for DMA signal activation  If the transit fifo count is less than the threshold then a DMA transfer is triggered. |
| 22, 23 | Receive Threshold   |  |  | | --- | --- | | 0 | 1 byte | | 1 | ¼ full | | 2 | ½ full | | 3 | ¾ full | | Threshold for DMA signal activation. If the receive fifo count is greater than the threshold then a DMA transfer is triggered. |
| 24 | hwfc | 1 = automatic hardware flow control |
| 25 | reserved |  |
| 26 | dmaEnable | 1 = dma enabled |
| 27 | Baud Rate bit 4   |  |  | | --- | --- | | 10000 | 38400 | | 10001 | 57600 | | 10010 | 115200 | | 10011 | 230600 | | 10100 | 460800 | | 10101 | 921600 | | 10110 | reserved | | 10111 | reserved | | 11xxx | reserved | | Extended baud rate selection bit, used in combination with bits 0 to 3. |
| 28,29 | reserved |  |
| 30 | selDV | 1 = use clock divider register, 0 = use baud table |
| 31 | accessDV | 1 = access clock divider via TRB register, 0 = normal TRB operation |

Selecting the clock divider register as the baud source allows any programmable baud rate.

# Step-by-Step

## Instruction Fetch:

### Cycle 1:

Update compare results from previous cycle

Check for external interrupts

Compute the linear address from the current pc and base register

Clear registered instruction decodes, set instruction decode as illegal

### Cycle 2:

First wait cycle for page map lookup

### Cycle 3:

Second wait cycle for page map lookup

### Cycle 4:

Check for a cache hit on physical address

If cache miss, begin cache line fetch operation

### Cycle 5:

Align instruction retrieved from cache

## Decode Stage:

### Cycle 1:

set decoded instructions as legal

set target register, register write flags

set source register Rs1

## Register Fetch Stage:

### Cycle 1:

First wait cycle for register lookup

Select special registers onto ib register

### Cycle 2:

Second wait cycle for register lookup

### Cycle 3:

Load ALU/FPU operand registers from register file output

## Execute Stage:

### Cycle 1:

Directly execute simpler instructions

Branches go back to IFETCH stage

CSR register reads

### Cycle 2:

First math stage, sign adjust operands

Wait for an access to the page map ram or TLB to complete (2 cycles).

### Cycle 3:

Second math stage, wait for math operation to complete (divide, remainder), sign correct results and capture result

## Memory Stage:

### Cycle 1:

Compute linear address from effective address and base register

Check for access rights violation if in user mode

Shift select lines and data lines into position in temp regs

### Cycle 2:

Wait for page mapping ram access

### Cycle 3:

Wait for page mapping ram access

### Cycle 4:

Check for keyed memory access violation

### Cycle 5:

Begin memory access

### Cycle 6:

Check for ack, finish memory access, increment effective address for next potential cycle

State will transition to IFETCH, WRITEBACK, or DATA\_ALIGN depending on access alignment and operation type.

### Cycle 7 to 12:

same as cycle 1 to 6, needed if unaligned access crosses word boundary

### Cycle 13 to 18:

same as cycle 1 to 6, needed when bus is 32-bit and access crosses word boundary

### Data Align Cycle

needed to align input data for load operations

## Writeback:

### Cycle 1:

Set compare results bus

Exception illegal instructions

Update the CSR register file

Update register files.

# Glossary

## ATC

ATC stands for address translation cache. This buffer is used to cache address translations for fast memory access in a system with an mmu capable of performing address translations. The address translation cache is more commonly known as the TLB.

## Burst Access

A burst access is several bus accesses that occur rapidly in a row in a known sequence. If hardware supports burst access the cycle time for access to the device is drastically reduced. For instance, dynamic RAM memory access is fast for sequential burst access, and somewhat slower for random access.

## BTB

An acronym for Branch Target Buffer. The branch target buffer is used to improve the performance of a processing core. The BTB is a table that stores the branch target from previously executed branch instructions. A typical table may contain 1024 entries. The table is typically indexed by part of the branch address. Since the target address of a branch type instruction may not be known at fetch time, the address is speculated to be the address in the branch target buffer. This allows the machine to fetch instructions in a continuous fashion without pipeline bubbles. In many cases the calculated branch address from a previously executed instruction remains the same the next time the same instruction is executed. If the address from the BTB turns out to be incorrect, then the machine will have to flush the instruction queue or pipeline and begin fetching instructions from the correct address.

## Card Memory

A card memory is a memory reserved to record the location of pointer stores in a garbage collection system. The card memory is much smaller than main memory; there may be card memory entry for a block of main memory addresses. Card memory covers memory in 128 to 512-byte sized blocks. Usually a byte is dedicated to record the pointer store status even though a bit would be adequate, for performance reasons. The location of card memory to update is found by shifting the pointer value to the right some number of bits (7 to 9 bits) and then adding the base address of the table. The update to the card memory needs to be done with interrupts disabled.

## FPGA

An acronym for Field Programmable Gate Array. FPGA’s consist of a large number of small RAM tables, flip-flops, and other logic. These are all connected with a programmable connection network. FPGA’s are ‘in the field’ programmable, and usually re-programmable. An FPGA’s re-programmability is typically RAM based. They are often used with configuration PROM’s so they may be loaded to perform specific functions.

HDL

An acronym that stands for ‘Hardware Description Language’. A hardware description language is used to describe hardware constructs at a high level.

## Instruction Bundle

A group of instructions. It is sometimes required to group instructions together into bundle. For instance, all instructions in a bundle may be executed simultaneously on a processor as a unit. Instructions may also need to be grouped if they are oddball in size for example 41 bits, so that they can be fit evenly into memory. Typically, a bundle has some bits that are global to the bundle, such as template bits, in addition to the encoded instructions.

## Instruction Pointers

A processor register dedicated to addressing instructions in memory. It is also often called a program counter. The program counter got its name because it usually increments (or counts) automatically after an instruction is fetched. In early machines in some rare cases the program counter did not count in a sequential binary fashion, but instead used other forms of a counter such as a grey counter or linear feedback shift register. In some machines the program counter addresses bundles of instructions rather than individual instructions. This is common with some stack machines where multiple instructions are packed into a memory word.

## ISA

An acronym for Instruction Set Architecture. The group of instructions that an architecture supports. ISA’s are sometimes categorized at extreme edges as RISC or CISC. RTF64 falls somewhere in between with features of both RISC and CISC architectures.

## Keyed Memory

A memory system that has a key associated with each page to protect access to the page. A process must have a matching key in its key list in order to access the memory page. The key is often 20 bits or larger. Keys for pages are usually cached in the processor for performance reasons. The key may be part of the paging tables.

## Linear Address

A linear address is the resulting address from a virtual address after segmentation has been applied.

## Physical Address

A physical address is the final address seen by the memory system after both segmentation and paging have been applied to a virtual address. One can think of a physical address as one that is “physically” wired to the memory.

## Physical Memory Attributes

Memory usually has several characteristics associated with it. In the memory system there may be several different types of memory, rom, static ram, dynamic ram, eeprom, memory mapped I/O devices, and others. Each type of memory device is likely to have different characteristics. These characteristics are called the physical memory attributes. Physical memory attributes are associated with address ranges that the memory is located in. There may be a hardware unit dedicated to verifying software is adhering to the attributes associated with the memory range. The hardware unit is called a physical memory attributes checker (PMA checker).

## Program Counter

A processor register dedicated to addressing instructions in memory. It is also often and perhaps more aptly called an instruction pointer. The program counter got its name because it usually increments (or counts) automatically after an instruction is fetched. In early machines in some rare cases the program counter did not count in a sequential binary fashion, but instead used other forms of a counter such as a grey counter or linear feedback shift register. In some machines the program counter addresses bundles of instructions rather than individual instructions. This is common with some stack machines where multiple instructions are packed into a memory word.

## ROB

An acronym for ReOrder Buffer. The re-order buffer allows instructions to execute out of order yet update the machine’s state in order by tracking instruction state and variables. In FT64 the re-order buffer is a circular queue with a head and tail pointers. Instructions at the head are committed if done to the machine’s state then the head advanced. New instructions are queued at the buffer’s tail as long as there is room in the queue. Instructions in the queue may be processed out of the order that they entered the queue in depending on the availability of resources (register values and functional units).

## RSB

An acronym that stands for return stack buffer. A buffer of addresses used to predict the return address which increases processor performance. The RSB is usually small, typically 16 entries. When a return instruction is detected at time of fetch the RSB is accessed to determine the address of the next instruction to fetch. Predicting the return address allows the processing core to continuously fetch instructions in a speculative fashion without bubbles in the pipeline. The return address in the RSB may turn out to be detected as incorrect during execution of the return instruction, in which case the pipeline or instruction queue will need to be flushed and instructions fetched from the proper address.

## SIMD

An acronym that stands for ‘Single Instruction Multiple Data’. SIMD instructions are usually implemented with extra wide registers. The registers contain multiple data items, such as a 128-bit register containing four 32-bit numbers. The same instruction is applied to all the data items in the register at the same time. For some applications SIMD instructions can enhance performance considerably.

## **Stack Pointer**

A processor register dedicated to addressing stack memory. Sometimes this register is assigned by convention from the general register pool. This register may also sometimes index into a small dedicated stack memory that is not part of the main memory system. Sometimes machines have multiple stack pointers for different purposes, but they all work on the idea of a stack. For instance, in Forth machines there are typically two stacks, one for data and one for return addresses.

## Telescopic Memory

A memory system composed of layers where each layer contains simplified data from the topmost layer downwards. At the topmost layer data is represented verbatim. At the bottom layer there may be only a single bit to represent the presence of data. Each layer of the telescopic memory uses far less memory than the layer above. A telescopic memory could be used in garbage collection systems. Normally however the extra overhead of updating multiple layers of memory is not warranted.

## TLB

TLB stands for translation look-aside buffer. This buffer is used to store address translations for fast memory access in a system with an mmu capable of performing address translations.

## Vector Length (VL register)

The vector length register controls the maximum number of elements of a vector that are processed. The vector length register may not be set to a value greater than the number of elements supported by hardware. Vector registers often contain more elements than are required by program code. It would be wasteful to process all elements when only a few are needed. To improve the processing performance only the elements up to the vector length are examined.

## Vector Mask (VM)

A vector mask is used to restrict which elements of a vector are processed during a vector operation. A one bit in a mask register enables the processing for that element, a zero bit disables it. The mask register is commonly set using a vector set operation.

# Miscellaneous

## Reference Material

Below is a short list of some of the reading material the author has studied. The author has downloaded a fair number of documents on computer architecture from the web. Too many to list.

*Modern Processor Design Fundamentals of Superscalar Processors by John Paul Shen, Mikko H. Lipasti. Waveland Press, Inc.*

*Computer Architecture A Quantitative Approach, Second Edition, by John L Hennessy & David Patterson, published by Morgan Kaufman Publishers, Inc. San Franciso, California* is a good book on computer architecture. There is a newer edition of the book available.

Memory Systems Cache, DRAM, Disk by Bruce Jacob, Spencer W. Ng., David T. Wang, Samuel Rodriguez, Morgan Kaufman Publishers

PowerPC Microprocessor Developer’s Guide, SAMS publishing. 201 West 103rd Street, Indianapolis, Indiana, 46290

80386/80486 Programming Guide by Ross P. Nelson, Microsoft Press

Programming the 286, C. Vieillefond, SYBEX, 2021 Challenger Drive #100, Alameda, CA 94501

Tech. Report UMD-SCA-2000-02 ENEE 446: Digital Computer Design — An Out-of-Order RiSC-16

Programming the 65C816, David Eyes and Ron Lichty, Western Design Centre Inc.

Microprocessor Manuals from Motorola, and Intel,

The SPARC Architecture Manual Version 8, SPARC International Inc, 535 Middlefield Road. Suite210 Menlo Park California, CA 94025

The SPARC Architecture Manual Version 9, SPARC International Inc, Sab Jose California, PTR Prentice Hall, Englewood Cliffs, New Jersey, 07632

The MMIX processor: <http://mmix.cs.hm.edu/doc/instructions-en.html>

RISCV 2.0 Spec, Andrew Waterman, Yunsup Lee, David Patterson, Krste Asanovi´c CS Division, EECS Department, University of California, Berkeley [{waterman|yunsup|pattrsn|krste}@eecs.berkeley.edu](mailto:%7bwaterman|yunsup|pattrsn|krste%7d@eecs.berkeley.edu)

The Garbage Collection Handbook, Richard Jones, Antony Hosking, Eliot Moss published by CRC Press 2012

## Trademarks

IBM® is a registered trademark of International Business Machines Corporation. Intel® is a registered trademark of Intel Corporation. HP® is a registered trademark of Hewlett-Packard Development Company. "SPARC® is a registered trademark of SPARC International, Inc.

# WISHBONE Compatibility Datasheet

The RTF64 core may be directly interfaced to a WISHBONE compatible bus.

|  |  |  |
| --- | --- | --- |
| WISHBONE Datasheet  WISHBONE SoC Architecture Specification, Revision B.3 | | |
|  |  | |
| Description: | Specifications: | |
| General Description: | Central processing unit (CPU core) | |
| Supported Cycles: | MASTER, READ / WRITE  MASTER, READ-MODIFY-WRITE  MASTER, BLOCK READ / WRITE, BURST READ (FIXED ADDRESS) | |
| Data port, size:  Data port, granularity:  Data port, maximum operand size:  Data transfer ordering:  Data transfer sequencing | 32 bit  8 bit  32 bit  Little Endian  any (undefined) | |
| Clock frequency constraints: |  | |
| Supported signal list and cross reference to equivalent WISHBONE signals | Signal Name:  ack\_i  adr\_o(31:0)  clk\_i  dat\_i(31:0)  dat\_o(31:0)  cyc\_o  stb\_o  wr\_o  sel\_o(3:0)  cti\_o(2:0)  bte\_o(1:0) | WISHBONE Equiv.  ACK\_I  ADR\_O()  CLK\_I  DAT\_I()  DAT\_O()  CYC\_O  STB\_O  WE\_O  SEL\_O  CTI\_O  BTE\_O |
| Special Requirements: |  | |