## Parsing files with Python

A major component of the Chapter 6 Assembler project involves writing a program that will

* read a file,
* ignore white space and comments in the file,
* break each line, or command, in the file into mnemonics,
* determine how the mnemonics map into various binary commands,
* write the equivalent binary code to a second file.

Because you may not have worked with this specific problem in your CSCI135 class, or because you may not remember that much Python, I've prepared this short notebook for you to study.

### Opening and reading a file

First the `.asm` file has to be opened and read. This must be done with a Python script called `hasm.py` (hack assembler) that can accept a command line argument that specifies the assembly file (`.asm`) that you want to covert to binary code (`.hack`). The following code opens a file specified by a command line argument an reads it line by line. Specific features in this code include:

* The `sys.argv` is a list containing each term that was typed at the command line. Individual list entries were separated by spaces at the command line.
* There is error checking to see that only one source (`.asm`) file is specified by checking that the command line argument was `hasm.py filename` and that the filename exists and can be opened. 
* The `with` keyword in python is used, along with `try` and `except` to do error handling.
* control is turned over to a function, `Pass1`, where the actual processing will take place. Later, this function should probably be moved to another file for clarity.
* The `Pass1` function has been equipped with some basic comment processing and white space handling as follows:
*`find('//')` is used to get the location of the comment string `//`. Python indexing is then used to eliminate parts of a line that follow the comment string.
*`strip()` is used to remove all white space from the resulting line.

In [4]:
test = []
test.append("testing")
test.append("testline2")
test.append("testline3")


for line in test:
    print(line)

testing
testline2
testline3


### Dictionaries
The main construct for managing translation of assembly to binary will be the dictionary, or symbol table. Let us demonstrate how these will be used with an example, a dictionary of symbols associated with ROM and RAM addresses.


In [1]:
symbols = {
         "R0" :  "0",
         "R1" :  "1",
         "R2" :  "2",
         "R3" :  "3",
         "R4" :  "4",
         "R5" :  "5",
         "R6" :  "6",
         "R7" :  "7",
         "R8" :  "8",
         "R9" :  "9",
         "R10" :  "10",
         "R11" :  "11",
         "R12" :  "12",
         "R13" :  "13",
         "R14" :  "14",
         "R15" :  "15",
         "SCREEN" : "16384",
         "KBD" :  "24576",
         "SP" : "0",
         "LCL" :  "1",
         "ARG" : "2",
         "THIS" : "3",
         "THAT" : "4"
}
 
# Test if an entry is present:
print("Is R3 in names_dict? ","R3" in symbols)
print("Is i in names_dict? ","i" in symbols)

# Add i to names_dict:
symbol = "i"
next_RAM = 16
symbols[symbol] = next_RAM
next_RAM += 1

# Print the value of "i"
print(symbol,symbols[symbol])

Is R3 in names_dict?  True
Is i in names_dict?  False
i 16


### The format mini-language

Python has format command that will be very powerful for this and other assignments in this course. Essentially, you face the problem of writing a 16 bit binary number, given a decimal value. There are a number of ways of handling this, but the format command is probably the cleanest. See the following for an example, leaving the exercise of zero-padding the output string as an exercise for the student.


In [7]:
RAM_address = 90
print(format(RAM_address,'b'))
# Above, the string 'b' creates a string representing RAM_address in
# binary format. You should research the format command to learn how to 
# make that string 16 bits long, and to 'pad' the places that aren't 
# needed to express RAM_address with zeros.

1011010


### The assembler API

So that we can all agree on what's to be done, and share our work, I am insisting that potions of the API defined in the book be upheld. The following provides them. Note I use `pass` to get something unimplemented to run without error. You'll need to replace that with your code.

In [None]:
def dest2bin(mnemonic):
    # returns the binary code for the destination part of a C-instruction
    pass
def comp2bin(mnemonic):
    # returns the binary code for the comp part of a C-instruction
    pass
def jump2bin(mnemonic):
    # returns the binary code for the jump part of a C-instruction
    pass
    
def commandType(command):
    # returns "A_COMMAND", "C_COMMAND", or "L_COMMAND"
    # depending on the contents of the 'command' string
    pass
def getSymbol(command):
    # given an A_COMMAND or L_COMMAND type, returns the symbol as a string,
    # eg (XXX) returns 'XXX'
    # @sum returns 'sum'
    pass
def getDest(command):
    # return the dest mnemonic in the C-instruction 'commmand'
    pass
def getComp(command):
    # return the comp mnemonic in the C-instruction 'commmand'
    pass
def getJump(command):
    # return the jump mnemonic in the C-instruction 'commmand'
    pass
    

### Bringing it all together

What's left? Well, implement all the functions specified, and then finish functions for `Pass1` and `Pass2`. These will have to work on globally defined dictionaries to keep track of symbols and manage translation of binary. You'll also need to create and manage a `.hack` output file where the binary instructions are written. In addition to the functions mentioned in the API, and here, it's possible that you'll decide to write some helper functions, or even some classes. You're free to do as you like, provided the functions in the API are complete, and `Pass1` as well as `Pass2` are complete and well defined. I recommend breaking your code up into several files and using Python's `import` for managability and readability.