# Project 7: A Virtual Machine

While working on project 6 you were told to 'reset the machine' (by pressing the button next to stop at the top of the screen) when you got stuck. But pressing the button clearly doesn't restart your computer. Where is the machine that you're resetting?

As you might have guessed from this week's readings, the machine in question is actually an emulated virtual machine, called the `Python` Interpreter. It's a much more powerful and complex sort of virtual machine than the system we'll build today (kind of like the difference between a RISC and CISC CPU design), but the principle is similar. 

## What are Virtual Machines?

High level languages like `Python` allow programmers to concisely express complex ideas. But more than this, the ability to concisely express these ideas helps programmers to *think*. Imagine trying to implement the Hack Assembler from Project 6 a low level language like Hack assembly itself. You'd end up worrying about all kinds of little details, because even something as simple as reading one line would involve many different operations.

Translating a high level language into a low level language is tricky because of the very different types of thoughts being represented. To make this easier, a virtual machine defines an architecture that can more easily translate between these two kinds of concepts.

In Project 7, we'll implement the first part of a virtual machine, which will allow the creation of a *stack machine*, a useful abstraction.  Next week, Project 8 will focus on the implementation of subroutines on the virtual machine, greatly expanding the kinds of ideas that can be expressed in the language.

## Tools and Goals

If you already know one or more high level languages, you should be able to complete the project exactly as outlined in Section 7.5 of your textbook. Reading further in this document may be unhelpful, as there are more elegant designs for the VM translator than the one used here. This is because we will only be using a subset of the features available in high level languages like `Python`.

If you didn't know a high level language before starting the course, read on!

As with Project 6, we'll be using `Python` today to write a *virtual machine translator*. This program will take a file written in the virtual machine language (described in Chapter 7 of your textbook), and convert (or *compile*) it to Hack Assembler. Once this program is done, we'll be able write higher level programs in the virtual machine language, translate them to Hack, and then use the assembler we wrote in Project 6 to convert them to binary commands.

You already know almost all of the `Python` commands needed to complete this program, but if you've forgotten them, you can always re-read the Chapter 6 notebook for a refresher. Two other helpful commands are the `split` and `+` operations for strings.

`split` will divide a string into pieces that correspond to individual words. For example:

```python
s="push constant 5"
s.split()[0] # The string "push"
s.split()[1] # The string "constant"
s.split()[2] # The string "5"
```

If you try to add a string to something else with the `+` operator, `Python` will produce a new string that joins the items together:

```python
index = 5
foo="@"+index #foo is now the string "@5"
bar = foo + " is the same as " + foo #bar is now the string "@5 is the same as @5".
```

If you're feeling confident, you can work on the project without using the templates below. Begin by reading Chapter 7, and then making three blank notebooks: one for the Parser Module, one for the CodeWriter module, and one for the main program. Section 7.5 outlines a reasonable way to go about writing these modules, and including

```python
import Project7IO as IO
```

at the start of your Parser module will give you access to all the same input and output subroutines you were able to use in Project 6.


If you want to practice your `Python` with a little more structure, the three notebooks below provide a template for completing these files that you can fill in, much like with Project 6 last week.



# The Parser Module

The first module we'll write is the Parser Module. This module is much like the Parser Module from Project 6.

In the VM language, each command we'll implement this week consists of either 1 and 3 words on a single line, separated by spaces. The line represents a command, which will either push something on the stack, pop something off the stack, or perform some arithmetic. In this week's project, we will only implement push, pop, and the nine arithmetic commands.

The Parser contains an *advance()* sub-routine, which populates three state variables, corresponding to the first (nexttype), second (nextarg1), and third (nextarg2) parts of the current command. If a command does not have values for all these variables, set the unneeded ones to the empty string "". For example, the command "add" has no arguments, so after reading "add", nexttype should be "add", nextarg1 should be "", and nextarg2 should be "".

The other subroutines should simply return the current values of nexttype, nextarg1, and nextarg2. The book defines three types of commands: "C_PUSH", "C_POP", and "C_ARITHMETIC" that should be returned by the commandType() function in place of the raw values of nexttype. Note also the special behaviour of arg1() when commandType() is "C_ARITHMETIC".



In [50]:
# The Parser Module
import Project7IO as IO

line = ""
nexttype= ""
nextarg1=""
nextarg2=""

def hasMoreCommands():
    global line
    if line == "EOF":
        return False
    else:
        return True
    
def advance():
    global nexttype, nextarg1, nextarg2,line
    line = IO.nextLine()
    if line == "EOF":
        return
    
    pieces = line.split()
    numpieces = len(pieces)
    nexttype = pieces[0]
    if numpieces == 1:
        nextarg1 = ''
        nextarg2 = ''
    elif numpieces == 2:
        nextarg1 = pieces[1]
        nextarg2 = ''
    else:
        nextarg1 = pieces[1]
        nextarg2 = pieces[2]
    


def commandType():
    global nexttype
    if nexttype == "push":
        return 'C_PUSH'
    elif nexttype == "pop":
        return 'C_POP'
    else:
        return 'C_ARITHMETIC'
    
def arg1():
    global nexttype, nextarg1
    if commandType() == 'C_ARITHMETIC':
        return nexttype
    else:
        return nextarg1
    
def arg2():
    global nextarg2
    return nextarg2 

In [168]:
# Testing the Parser

def partadvance(line):
    global nexttype, nextarg1, nextarg2
    pieces = line.split()
    numpieces = len(pieces)
    nexttype = pieces[0]
    if numpieces == 1:
        nextarg1 = ''
        nextarg2 = ''
    elif numpieces == 2:
        nextarg1 = pieces[1]
        nextarg2 = ''
    else:
        nextarg1 = pieces[1]
        nextarg2 = pieces[2]

def check(a,b,c):
    assert commandType() == a
    assert arg1() == b
    assert arg2() == c

line = 'push constant 0'
partadvance(line)
check('C_PUSH', 'constant','0')

line = 'pop local 1'
partadvance(line)
check('C_POP','local','1')

line = 'add'
partadvance(line)
check('C_ARITHMETIC','add','')

# Testing the Parser

Before proceeding, write some tests for your parser, and make sure the functions work the way you expect. This approach to writing software is called "incremental development". By making sure that your Parser works now, you can safely write code that uses the Parser in the next module.

If you don't test your code before writing the CodeWriter Module, you may have a very hard time figuring out whether errors are coming from the Parser or the CodeWriter.

# The CodeWriter Module

The only other module we need for this week's project is the CodeWriter module. Predictably, this module writes assembly code out, in response to VM language commands that are read in by the Parser module. Basically it translates from VM Language to Hack Assembly, in much the same way the assembler translated from Hack Assembly to Binary Hack Machine Code. However, while each Hack Assembly instruction was mapped to just a single Hack Machine Code instruction, each VM Language instruction will produce *many* Hack Assembly instructions. This is what gives the VM Language its extra power: a single thought can produce many effects at once.

As an example, consider the "add" command in the VM Language. Conceptually, the "add" command is supposed to pop two items from the top of the stack, add them together, and put the result back on the stack. By convention, register 0 (@SP) contains the memory address of the top of the stack. So the assembly code to carry out the VM Language "add" might be:

```code
#Get the first number from the top of the stack, put it in the R13 register (R13-R15 are reserved for this kind of use)
@SP
A=M
A=A-1
D=M
@R13
M=D

#Decrease the stack pointer by one, since we took something off
A=@SP
M=M-1

#Get the second number from the top of the stack, and put it in the R14 register
@SP
A=M
A=A-1
D=M
@R14
M=D

#Decrease the stack pointer again, since we took something off
A=@SP
M=M-1

#Compute the sum
@R13
D=M
@R14
D=D+M

#Store the sum on the stack
@SP
M=D

#Increase the stack pointer again, since we put something back on.
A=@SP
M=M+1

```


If this seems messy to write, that's because it is!

There are two approaches you can use to make these commands easier to write. First, you can try to find the smallest possible assembly program the accomplishes your goals. In this case, the goals are that:

 - The stack pointer is 1 less than it was before the command (-2 for popping twice, +1 for putting the answer back on)
 - The number at the top of the stack is the sum of the two that were on top before.
 
 A smaller assembly program that accomplishes this goal is:
 
 ```code
 #Read the top of the stack into D.
 @SP
 A=M
 A=A-1
 D=M
 
 #Add the second item from the top to D
 A=A-1
 D=D+M
 
 #Reduce the stack pointer by 1
 @SP
 M=M-1
 
 #Overwrite what was the second item from the top with the sum.
 A=M-1
 M=D
```

If you spend some time writing different possible programs on paper for each command first, you can save a lot of time later when you're coding. 

The approach you could use is to write some sub-routines to help. For example, you might write a pair of sub-routines like this:

```python

def pop_store(reg):
    print("@SP")
    print("A=M")
    print("A=A-1")
    print("D=M")
    print("@"+reg)
    print("M=D")
    print("A=@SP")
    print("M=M-1")
    
def push_store():
    print("@SP")
    print("M=D")
    print("A=@SP")
    print("M=M+1")
```

You could then write the code to print out the Hack Assembly instructions for "add" like this:

```python

def add_code():
    pop_store("R13")
    pop_store("R14")
    
    print("@R13")
    print("D=M")
    print("@R14")
    print("D=D+M")
    
    push_store()
```

and the code to print out the Hack Assembly instructions for "and" like this:


```python

def add_code():
    pop_store("R13")
    pop_store("R14")
    
    print("@R13")
    print("D=M")
    print("@R14")
    print("D=D&M")
    
    push_store()
```


To start with, implement the writeArithmetic sub-routine, for the six commands other than "eq", "gt" and "lt". Test your code.

Then, add the "eq", "gt" and "lt" commands, and test again.

Then implement the "push constant" command in the writePushPop sub-routine. At this point, you should be ready to use the main module (provided below) to test your program with the provided test programs.
    

In [166]:
jumpcounter = 0
filename = ""
cnt = 0

def writeArithmetic(command):
    global cnt
    print('@SP')
    if command == 'neg' or command == 'not':
        print('A=M-1')
        if command == "neg":
            print('M=-M')
        elif command == "not":
            print('M=!M')
    else:
        print('AM=M-1')
        print('D=M')
        if command in ['add','sub','and','or']:
            print('A=A-1')
            if command == "add":
                print('M=M+D')
            elif command == "sub":
                print('M=M-D')
            elif command == "and":
                print('M=M&D')
            elif command == "or":
                print('M=M|D')
        else:
            print('@SP')
            print('AM=M-1')
            print('A=M')
            print('D=A-D')
            print('@JMP' + str(cnt))
            cnt += 1
            print("D;J" + command.upper())
            print('@SP')
            print('A=M')
            print('M=0')
            print("@JMP" + str(cnt))
            cnt += 1
            print('0;JMP')
            print('(JMP' + str(cnt - 2) + ')')
            print('@SP')
            print('A=M')
            print('M=-1')
            print('(JMP' + str(cnt - 1) + ')')
            print('@SP')
            print('M=M+1')
    
def writePushPop(type, segment, index):
    if segment == "constant":
        print("@" + index)
        print('D=A')
    elif segment == "local":
        print("@LCL")
        print("D=M")
        print("@" + index)
    elif segment == "argument":
        print("@ARG")
        print("D=M")
        print("@" + index)
    elif segment == "this":
        if type == 'push':
            print('@THIS')
            print("D=M")
            print("@" + index)
        else:
            print("@THIS")
            print("D=M")
            print("@" + index)
            print("D=D+A")
            print("@R13")
            print("M=D")
            print("@SP")
            print("AM=M-1")
            print("D=M")
            print("@R13")
            print("A=M")
            print("M=D")
    elif segment == "that":
        if type == 'push':
            print('@THAT')
            print("D=M")
            print("@" + index)
        else:
            print("@THAT")
            print("D=M")
            print("@" + index)
            print("D=D+A")
            print("@R13")
            print("M=D")
            print("@SP")
            print("AM=M-1")
            print("D=M")
            print("@R13")
            print("A=M")
            print("M=D")
    elif segment == "pointer":
        if index == '0':
            print("@THIS")
        else:
            print("@THAT")
        if type == "pop":
            print("D=A")
        else:
            print("D=M")
    elif segment == "temp":
        print("@R5")
        print("D=M")
        print("@" + str(int(index) + 5))
    elif segment == "static":
        print("@STATIC." + index)
        print("D=M")
        print("@" + index)
        
    if type == "push":
        if segment != "constant" and segment != "pointer":
            print("A=D+A")
            print("D=M")
        print("@SP")
        print("A=M")
        print("M=D")
        print("@SP")
        print("M=M+1")
        
    if type == "pop" and segment != 'this' and segment != 'that':
        if segment != "pointer":
            print("D=D+A")
        print("@R13")
        print("M=D")
        print("@SP")
        print("AM=M-1")
        print("D=M")
        print("@R13")
        print("A=M")
        print("M=D")


# The Main Module

The main module provided below will use your program to assemble three test files. You can then use the CPU emulator to test whether the assemble code your program has produced is correct or not.

In [167]:
import os

def processFile(testtype, fname):
    global line
    IO.setFile(os.path.join('..',testtype,fname,fname+'.vm'))
    IO.setSaveFile(os.path.join('..',testtype,fname,fname+'.asm'))
    line = ""
    advance()
    while hasMoreCommands():
        if commandType() == "C_ARITHMETIC":
            writeArithmetic(arg1())
        elif commandType() == "C_PUSH":
            writePushPop("push", arg1(), arg2())
        elif commandType() == "C_POP":
            writePushPop("pop", arg1(), arg2())
        advance()
    
processFile('MemoryAccess', 'BasicTest')

#Uncomment these lines once you pass the Basic Test. 
processFile('MemoryAccess', 'PointerTest')
processFile('MemoryAccess', 'StaticTest')

#Uncomment these lines once you have handled memory regions as well.
processFile('StackArithmetic', 'SimpleAdd')
processFile('StackArithmetic', 'StackTest')

# Handling Memory Regions

Now that we can compile (translate) VM Language programs that only do simple calculations, it's time to extend our translator so that it can write to many different parts of memory. This might seem like a strange thing to want right now, but it will prove extremely useful in the coming weeks. Section 7.3.1 contains a detailed description of how these segments should work.

You have already handled "push" for the "constant" segment, and it's not possible to "pop" a constant.

Next, implement push and pop for "local", "argument", "this" and "that". These are very similar to one another, and you should be able to reuse a lot of the same assembly code.

After that, implement push and pop for the "pointer" and "temp" segments. These are slightly trickier. Read the textbook descriptions carefully.

Finally, implement the "static" segment, which is slightly trickier.

When you're done, you can test your code with the additional "processFile" comments in the Main Module above. Good Luck!