# Memory

There is a variable amount of memory, 1024 bytes will be used initially.

# CPU

The CPU holds the following values:

 * Register A
 * Register B
 * Program counter (PC)
 * Stack pointer (SP)
 * Zero flag (ZF)
 * Halt flag (HF)
 
_Register A_.

Holds a 32-bit integer.

_Register B_.

Holds a 32-bit integer.

_Program counter_.

The program counter (PC) points towards the next instruction.

_Stack pointer_.

The stack counter (SP) points towards the current element in the stack.

_Zero flag_.

The zero flag (ZF) is low if a tested equality/inequality is true, and is high if the test return false.

_Halt flag_.

The half flag (HF) will be high if the CPU is halted.

# Opcodes

_Integers_.

Any integer value can be written as `d`, e.g. `1`.

_Register_.

There are three registers: `A`, `B`, and `R` in which results are stored.

_Memory address_.

A memory address is denoted with `[0000]`, for example:

    MOV 16, &0003

* Any integer value can be written as `1`.
* Any register can be written as `A`, `B`, `R`.
* Any memory address can be written as `[0000]` to `[0000]`.

_Pointer_.

The value of a memory address can be resolved with `$`.

    MOV 10, [$A] ; Find the address stored in register A.

_Mnemonics_.

The following mnemonics are defined.

|Mnemonic|Opcode|Size (Bytes)|Description|
|-|
| `MOV reg, reg`| `0x10` |  3 | Move a value from register 1 to register 2. |
| `MOV reg, [address]` | `0x11` | 3 | Move a value from register 1 to a memory address. |
| `MOV [address], reg` | `0x12` | 3 | Move a value from a memory address to register 1. |
| `MOV [$reg], reg` | `0x15` | 3 | Use the value in the register as a memory address, and load the value from there into the register. |
| `MOV byte, reg` | `0x13` | 3 | Move a dword into to register 1. |
| `MOV reg, [$reg]` | `0x14` | 3 | Use the value in the register as memory address, and store the value there from register 1. |
| `CMP reg, reg` | `0x20` | 3 | Compare register A with register B for equality. |
| `GT reg, reg` | `0x21` | 3 | Check if register A is greater than register B. |
| `LT reg, reg` | `0x22` | 3 | Check if register A is less than register B. |
| `JMP address` | `0x50` | 2 | Jump to address. |
| `JZ address` | `0x51` | 2 | Jump to address if the zero flag is low. |
| `JNZ address` | `0x52` | 2 | Jump to address if the zero flag is high. |
| `ADD reg, reg` | `0x30` | 3 | Add register 1 to register 2 and store the result in register R. |
| `SUB reg, reg` | `0x31` | 3 | Subtract register 2 from register 1 and store the result in register R. |
| `MUL reg, reg` | `0x32` | 3 | Multiply register 1 with register 2 and store the result in register R. |
| `DIV reg, reg` | `0x33` | 3 | Divide register 1 with register 2 and store the result in register R. (Integer division) |
| `INC reg` | `0x34` | 2 | Increment register by one. |
| `PUSH reg` | `0x40` | 2 | Push the value in register A onto the stack. |
| `POP reg` | `0x41` | 2 | Pop the value from the stack and store this is register A. |
| `HLT` | `0x99` | 1 | Halt the CPU. Set the halt flag high. |
| `CALL ref`| `0x90` | 2 | System call. |

_Labels_.

It is also possible to define a label. This should be on a new line, and looks like this:

    .start
    
Labels only support alpha-numerical characters. If a jump is followed by a word, it should check if the label is defined earlier. Then replace the memory address with that label.

_Comments_.

Comments can be indicated with `;`, everything will be ignored until a new line is found. Example:

    MOV A, B ; Move the value in register A into register B.

# System calls

The following system calls are supported:

_Print_.

Function name:

    CALL print

Reads one value from the stack and prints this onto the screen. Corresponding byte is `0x01`.

# Assembler

The assembler should map the mnemonics to the corresponding opcodes. The result is a binary file. Binaries that are assembled for this emulator have the value of `0x27` in the first byte.

## Example program 1

An example program that calculates `1+1` and prints `2`:

    MOV 1, A
    MOV 1, B
    ADD
    PUSH
    CALL print
    HLT

In [1]:
prog = """
MOV 1 A
MOV 1,B
ADD ; Add A and B and store in R
PUSH
CALL print
HLT
"""

In [2]:
prog

'\nMOV 1 A\nMOV 1,B\nADD ; Add A and B and store in R\nPUSH\nCALL print\nHLT\n'

## Example program 2

To print 1 to 9 and then halt:

    MOV 0, A
    MOV 10, B
    
    .loop
    LT A, B
    JNZ end
    PUSH A
    CALL print
    INC A
    JMP loop
    
    .end
    HLT

In [3]:
prog2 = """
;init
MOV 0, A
MOV 10, B

; for(i; i<10; i++); print i
.loop
LT A, B
JNZ end
PUSH A
CALL print
INC A
JMP loop

.end
HLT
"""

In [4]:
prog2

'\n;init\nMOV 0, A\nMOV 10, B\n\n; for(i; i<10; i++); print i\n.loop\nLT A, B\nJNZ end\nPUSH A\nCALL print\nINC A\nJMP loop\n\n.end\nHLT\n'

## Example program 3

Print the Fibonacci sequence forever.

    MOV 1, A
    MOV 1, B
    PUSH A
    CALL print
    PUSH B 
    CALL print
    .loop
    ADD A, B
    MOV B, A
    MOV R, B
    PUSH B
    CALL print
    JMP loop

## Parser

Here we parse the program and return either an `int` or `string`. These are called symbols.

In [5]:
class Parser():
    def __init__(self, prog):
        self.prog = prog
        self.index = 0
        
    def eat_whitespace(self):
        while(self.index < len(self.prog) and self.prog[self.index] in ['\n', '\r' ' ']): self.index+=1
        if self.index < len(self.prog) and self.prog[self.index] == ';':
            while self.index < len(self.prog) and self.prog[self.index] != '\n': self.index+=1
                
    def eat_word(self):
        chars = []
        while(self.prog[self.index].isalpha()):
            chars.append(self.prog[self.index])
            self.index+=1
        return ''.join(str(c) for c in chars)
    
    def eat_number(self):
        chars = []
        while(self.prog[self.index].isnumeric()):
            chars.append(self.prog[self.index])
            self.index+=1
        return int(''.join(str(c) for c in chars))
    
    def next_symbol(self):
        self.eat_whitespace()
        while self.index < len(self.prog):
            if self.prog[self.index].isalpha(): yield self.eat_word()
            if self.prog[self.index].isnumeric(): yield self.eat_number()
            char = self.prog[self.index]
            self.index+=1
            self.eat_whitespace()

## Tokenizer

Based on the symbols, we create the corresponding tokens. A token type can be `op`, `reg`, `int`, `mem`, or `word`.

In [6]:
class Token():
    def __init__(self, t, value):
        self.type = t
        self.value = value
    def __repr__(self):
        return '{}: {}'.format(self.type, self.value)

Now we will tokenize the symbols.

In [7]:
class Tokenizer():
    mnemonics = ['MOV', 'ADD', 'PUSH', 'CALL', 'HLT']
    registers = ['A', 'B', 'R']
    index=0
    def __init__(self, prog):
        self.prog = prog
        self.tokens = []
        for symbol in Parser(self.prog).next_symbol():
            if symbol in self.mnemonics:
                self.tokens.append(Token('op', symbol))
                continue
            if symbol in self.registers:
                self.tokens.append(Token('reg', symbol))
                continue
            if type(symbol) is int:
                self.tokens.append(Token('int', symbol))
                continue
            if type(symbol) is str:
                self.tokens.append(Token('word', symbol))
                continue
                
    def last_token(self):
        return not self.index < len(self.tokens)
            
    def next_token(self):
        token = self.tokens[self.index]
        self.index+=1
        return token
    
    def reset(self):
        self.index=0

In [8]:
tokenizer = Tokenizer(prog)
tokens = tokenizer.tokens
tokens

[op: MOV,
 int: 1,
 reg: A,
 op: MOV,
 int: 1,
 reg: B,
 op: ADD,
 op: PUSH,
 op: CALL,
 word: print,
 op: HLT]

## Translator

Here we translate the token into the bytecode.

In [9]:
class Translator():
    def translate_to_opcodes(tokenizer):
        tokenizer.reset()
        opcodes = [0x27]
        set_allowed_mov = ['mem', 'reg', 'int']
        lookup_register_label = {'A': 0x00, 'B': 0x01, 'R': 0x02}
        while not tokenizer.last_token():
            token = tokenizer.next_token()
            print('Translating token [{}]'.format(token))
            if token.value == 'MOV':
                source = tokenizer.next_token()
                dest = tokenizer.next_token()
                if source.type not in set_allowed_mov:
                    raise ValueError('Invalid MOV source.')
                if dest.type not in set_allowed_mov:
                    raise ValueError('Invalid MOV dest.')
                print('    Branching MOV into {}, {}:'.format(source.type, dest.type))
                if source.type == 'int' and dest.type == 'reg':
                    print('    MOV value {} into register {}.'.format(source.value, dest.value))
                    opcodes.append(0x13)
                    opcodes.append(source.value)
                    opcodes.append(lookup_register_label[dest.value])
                if source.type == 'reg' and dest.type == 'reg':
                    print('    MOV value from register {} to register {}.'.format(source.value, dest.value))
                    opcodes.append(0x10)
                    opcodes.append(lookup_register_label[source.value])
                    opcodes.append(lookup_register_label[dest.value])
                if source.type == 'mem' and dest.type == 'reg':
                    print('    MOV value from memory address {} to register {}.'.format(source.value, dest.value))
                    opcodes.append(0x12)
                    opcodes.append(source.value)
                    opcodes.append(lookup_register_label[dest.value])
                if source.type == 'reg' and dest.type == 'mem':
                    print('    MOV value from register {} to memory address {}.'.format(source.value, dest.value))
                    opcodes.append(0x11)
                    opcodes.append(source.value)
                    opcodes.append(dest.value)
                if source.type == 'mem' and dest.type == 'mem':
                    print('    MOV value from memory address {} to memory address {}.'.format(source.value, dest.value))
                    opcodes.append(0x14)
                    opcodes.append(source.value)
                    opcodes.append(dest.value)
            if token.value == 'ADD':
                print('    Add register A and B.')
                opcodes.append(0x30)
            if token.value == 'PUSH':
                print('    Pushing register A onto the stack.')
                opcodes.append(0x40)
            if token.value == 'CALL':
                call = tokenizer.next_token()
                print('    System call to `{}`.'.format(call.value))
                if call.value == 'print':
                    opcodes.append(0x90)
                    opcodes.append(0x01)
                else:
                    raise ValueError('Unrecognized system call {}.'.format(call.value))
            if token.value == 'HLT':
                opcodes.append(0x99)

        return opcodes

## Assembler

In [10]:
class Assembler():
    def assemble(prog):
        tokenizer = Tokenizer(prog)
        return Translator.translate_to_opcodes(tokenizer)
    def encode(opcodes):
        return ' '.join(hex(o) for o in opcodes)
    def decode(string):
        return [int(o, 16) for o in string.split(' ')]

In [97]:
import binascii
def rom_to_hex_file(rom, file):
    hexed = [hex(x) for x in rom]
    with open(file, 'wb') as f:
        f.write(bytearray(int(i, 16) for i in hexed))

In [68]:
rom = Assembler.assemble(prog)
encoded = Assembler.encode(rom)
decoded = Assembler.decode(encoded)

Translating token [op: MOV]
    Branching MOV into int, reg:
    MOV value 1 into register A.
Translating token [op: MOV]
    Branching MOV into int, reg:
    MOV value 1 into register B.
Translating token [op: ADD]
    Add register A and B.
Translating token [op: PUSH]
    Pushing register A onto the stack.
Translating token [op: CALL]
    System call to `print`.
Translating token [op: HLT]


In [69]:
rom, encoded, decoded

([39, 19, 1, 0, 19, 1, 1, 48, 64, 144, 1, 153],
 '0x27 0x13 0x1 0x0 0x13 0x1 0x1 0x30 0x40 0x90 0x1 0x99',
 [39, 19, 1, 0, 19, 1, 1, 48, 64, 144, 1, 153])

# Emulator

## Memory

In [14]:
class Memory():
    def __init__(self, size):
        self.ram = [0x00] * size
        self.size = size
    def read(self, address):
        return self.ram[address]
    def write(self, address, value):
        if type(value) is not int: raise ValueError('Unable to cast value to 32-bit binary representation.')
        self.ram[address] = value
    def size(self):
        return size
    def load_rom(self, rom, start_address = 0x00):
        if rom[0x00] != 0x27: raise ValueError('Unsupport ROM type.')
        address = 0x00
        if len(rom) + start_address >= self.size: raise ValueError('Insufficient memory for ROM.')
        for byte in rom[1:]:
            self.ram[start_address + address] = byte
            address+=1
    def print_range(self,a,b):
        result = ""
        for i in range(b-a):
            result += '[{:04x}] {:08x}\n'.format(a+i, self.ram[a+i])
        print(result)

In [15]:
memory = Memory(1024)
memory.write(0x10, 5)
memory.load_rom(rom)
memory.print_range(0,25)

[0000] 00000013
[0001] 00000001
[0002] 00000000
[0003] 00000013
[0004] 00000001
[0005] 00000001
[0006] 00000030
[0007] 00000040
[0008] 00000090
[0009] 00000001
[000a] 00000099
[000b] 00000000
[000c] 00000000
[000d] 00000000
[000e] 00000000
[000f] 00000000
[0010] 00000005
[0011] 00000000
[0012] 00000000
[0013] 00000000
[0014] 00000000
[0015] 00000000
[0016] 00000000
[0017] 00000000
[0018] 00000000



## CPU emulator 

In [16]:
class CPU():
    lookup_register_to_label = {0x00: 'A', 0x01: 'B', 0x02: 'R'}
    
    def __init__(self, memory):
        self.memory = memory
        self.A = self.B = self.PC = self.SP = 0x00
        self.ZF = self.HF = False
        print('CPU init [available memory: {} bytes]'.format(self.memory.size))
        print('Starting at memory address: {:04x}'.format(self.PC))
        
    def run(self):
        self.print_hr()
        while not self.HF:
            if self.PC >= self.memory.size:
                print('Program counter overflow.')
                break
            incr = True
            opcode = self.memory.read(self.PC)
            print('Opcode: {:02x}'.format(opcode))
            
            # MOV
            if opcode == 0x13:
                source = self.memory.read(self.PC+1)
                dest = self.lookup_register_to_label[self.memory.read(self.PC+2)]
                self.PC += 3
                incr = False
                print('Command: MOV {}, {}'.format(source, dest))
                
            # HALT
            if opcode == 0x99:
                self.HF = True
                incr = False
                print('Command: HLT')
                
            if incr: self.PC+=1
            self.print_state()
            self.print_hr()
                
        print('CPU terminated normally.')
    
    def print_hr(self):
        print('-------------------------------------------------')
    def print_state(self):
        print('State: A: 0x{:02x}, B: 0x{:02x}, PC: 0x{:02x}, SP: 0x{:02x}, ZF: {}, HF: {}'.format(self.A, self.B, self.PC, self.SP, self.ZF, self.HF))

## Emulator

In [17]:
program = prog2
print('-- Program --\r\n{}'.format(program))
print('-- Assembler --\r\n')
rom = Assembler.assemble(program)

print('\r\n-- Memory -- \r\n')
memory = Memory(1024)
memory.load_rom(rom)
memory.print_range(0, 10)

print('-- Emulator --\r\n')
cpu = CPU(memory)
cpu.run()

-- Program --

;init
MOV 0, A
MOV 10, B

; for(i; i<10; i++); print i
.loop
LT A, B
JNZ end
PUSH A
CALL print
INC A
JMP loop

.end
HLT

-- Assembler --

Translating token [op: MOV]
    Branching MOV into int, reg:
    MOV value 0 into register A.
Translating token [op: MOV]
    Branching MOV into int, reg:
    MOV value 10 into register B.
Translating token [word: loop]
Translating token [word: LT]
Translating token [reg: A]
Translating token [reg: B]
Translating token [word: JNZ]
Translating token [word: end]
Translating token [op: PUSH]
    Pushing register A onto the stack.
Translating token [reg: A]
Translating token [op: CALL]
    System call to `print`.
Translating token [word: INC]
Translating token [reg: A]
Translating token [word: JMP]
Translating token [word: loop]
Translating token [word: end]
Translating token [op: HLT]

-- Memory -- 

[0000] 00000013
[0001] 00000000
[0002] 00000000
[0003] 00000013
[0004] 0000000a
[0005] 00000001
[0006] 00000040
[0007] 00000090
[0008] 0

In [99]:
'load file.dat'.split()

['load', 'file.dat']

In [100]:
'load'.split()

['load']

In [104]:
cmd, *arg = 'load file.dat r'.split()

In [105]:
cmd, arg

('load', ['file.dat', 'r'])

In [106]:
arg[0]

'file.dat'