## Chapter 6 Assembler

This project involves writing a program that will

* read a file,
* ignore white space and comments in the file,
* break each line, or command, in the file into mnemonics,
* determine how the mnemonics map into various binary commands,
* write the equivalent binary code to a second file.

This must be done with a Python script called `hasm.py` (hack assembler) that can accept a command line argument that specifies the assembly file (`.asm`) that you want to covert to binary code (`.hack`). 


### Opening and reading a file

First the `.asm` file has to be opened and read. This may be completed in whichever way is preferred, but the following code provides one option. This code opens a file specified by a command line argument and creates a list of all the lines within that file. Specific features in this code include:
* The `sys.argv` is a list containing each term that was typed at the command line. Individual list entries were separated by spaces at the command line.
* The `with` keyword in python is used, along with `try` and `except` to do error handling.
```python
inFileName = sys.argv[1] 
try:
    with open(inFileName, 'r+') as inFile:
        lines = inFile.readlines()

except:
    print("Not a good file.")

In [1]:
# Within this notebook, the filename will be used directly in place of a command line argument

inFileName = 'Add.asm'

try:
    with open(inFileName, 'r+') as inFile:
        lines = inFile.readlines() #returns a list of all lines in inFile

except:
    print("Not a good file.")
    
lines

['// This file is part of www.nand2tetris.org\n',
 '// and the book "The Elements of Computing Systems"\n',
 '// by Nisan and Schocken, MIT Press.\n',
 '\n',
 '// Computes R0 = 2 + 3  (R0 refers to RAM[0])\n',
 '\n',
 '@2\n',
 'D=A\n',
 '@3\n',
 'D=D+A\n',
 '@0\n',
 'M=D']

### Parsing Text

Much of the information in the `.asm` file, such as whitespace, line breaks, and comments, is not relevant to the actual translation process and should be disregarded. Python's built-in string methods offer basic parsing capabilities:

* `split()`: Splits a string into a list of substrings based on a delimiter.
* `strip()`: Removes leading and trailing whitespace or specified characters.
* `find()/index()`: Locates the position of a substring.
* `replace()`: Replaces occurrences of a substring.

Clean up the list `lines` to retain only important information.

In [2]:
fixed = []
for line in lines:
    #remove whitespace
    line = line.strip()

    #remove \\n
    line = line.replace("n", "")
    # print("fixed line: " + line)
    
    #remove //
    if "//" in line:
        index = line.find("//")
        line = line[:index].strip()

    #remove empty lines
    if len(line) == 0:
        pass

    else:
        fixed.append(line)
    
fixed
    

['@2', 'D=A', '@3', 'D=D+A', '@0', 'M=D']

In [3]:
lines = ['@2',
         'D=A',
         '@3',
         'D=D+A',
         '@0',
         'M=D']

### Dictionaries
The main construct for managing translation of assembly to binary will be the dictionary, or symbol table. Create any dictionaries that may be usefull in this program.


In [4]:
comp_dictionary = {
               '':  '0000000',
              '0':  '0101010',
              '1':  '0111111',
              '-1': '0111010',
              'D':  '0001100',
              'A':  '0110000',
              'M':  '1110000',
              '!D': '0001101',
              '!A': '0110001',
              '!M': '1110011',
              '-D': '0001111',
              '-A': '0110011',
              '-M': '1110011',
              
              'D+1': '0011111',
              '1+D': '0011111',

              'A+1': '0110111',
              '1+A': '0110111',

              'M+1': '1110111',
              '1+M': '1110111',

              'D-1': '0001110',
              'A-1': '0110010',
              'M-1': '1110010',

              'D+A': '0000010',
              'A+D': '0000010',

              'D+M': '1000010',
              'M+D': '1000010',

              'D-A': '0010011',
              'D-A': '1010011',

              'A-D': '0000111',
              'M-D': '1000111',

              'D&A': '0000000',
              'A&D': '0000000',

              'D&M': '1000000',
              'M&D': '1000000',

              'D|A': '0010101',
              'A|D': '0010101',

              'D|M': '1010101',
              'M|D': '1010101',           
}

In [5]:
dest_dictionary = {
    'null': '000',
    'M': '001',
    'D': '010',

    'DM': '011',
    'MD': '011',

    'A': '100',

    'AM': '101',
    'MA': '101',

    'AD': '110',
    'DA': '110',

    'ADM': '111',
    'AMD': '111',
    'DMA': '111',
    'DAM': '111',
    'MDA': '111',
    'MAD': '111',
}

In [6]:
jump_dictionary = {
    'null': '000',
    'JGT': '001',
    'JEQ': '010',
    'JGE': '011',
    'JLT': '100',
    'JNE': '101',
    'JLE': '110',
    'JMP': '111'
}

In [7]:
symbol_dictionary = {
    '': '000',
    'R0': '0000000000000000',
    'R1': '0000000000000001',
    'R2': '0000000000000010',
    'R3': '0000000000000011',
    'R4': '0000000000000100',
    'R5': '0000000000000101',
    'R6': '0000000000000110',
    'R7': '0000000000000111',
    'R8': '0000000000001000',
    'R9': '0000000000001001',
    'R10': '0000000000001010',
    'R11': '0000000000001011',
    'R12': '0000000000001100',
    'R13': '0000000000001101',
    'R14': '0000000000001110',
    'R15': '0000000000001111',
    'SCREEN': '1000000000000000',
    'KBD': '1000000000000000',
}

### Decimal to Binary

When converting from an A-instruction to binary, this program will be required to change a decimal value to a binary code that is 0 followed by a 15 bit address. There are a number of ways of handling this, but the format command is probably the cleanest. See the following for an example, leaving the exercise of zero-padding the output string as an exercise for the student.


In [8]:
RAM_address = 18
format(RAM_address,'b').zfill(16) #goes from decimal to binary

'0000000000010010'

### The assembler API

So that we can all agree on what's to be done, and share our work, I am insisting that potions of the API defined in the book be upheld. The following provides them. Note I use `pass` to get something unimplemented to run without error. You'll need to replace that with your code.

Implement the following functions and test them on various assembly lines

In [9]:
def dest2bin(mnemonic):
    # returns the binary code for the dest part of a C-instruction
    # sort the mnemonic to overcome destinations like AD, which is 
    # the same as DA. This is probably easier to do by adding more
    # entries in the dictionary
    return dest_dictionary.get(mnemonic, '000')         #ex. enter AM and get 101

def comp2bin(mnemonic):
    # returns the binary code for the comp part of a C-instruction
    return comp_dictionary.get(mnemonic, '0000000') 

def jump2bin(mnemonic):
    # returns the binary code for the jump part of a C-instruction
    return jump_dictionary.get(mnemonic, '000') 
    
def commandType(command):
    # returns "A_COMMAND", "C_COMMAND", or "L_COMMAND"
    # depending on the contents of the 'command' string
    if (command[0] == '@'):
        ans = "A_COMMAND"
    
    elif (command[0] == "(" and ")" in command): #maybe add ) condition
        ans = "L_COMMAND"
    
    else:
        if ("=" or ";" in command):
            ans = "C_COMMAND"
        else:
            return "error_unidintifiable command"
    
    return ans

def getSymbol(command):
    # given an A_COMMAND or L_COMMAND type, returns the symbol as a string,
    # eg (XXX) returns 'XXX'
    # @sum returns 'sum'
    if (command[0] == "(" and ")" in command): #maybe fix this in future
        command = command.replace("(", "")
        command = command.replace(")", "")

    elif (command[0] == "@"):
        return command[1:]

    else:
        return "error"
    
    return command

def getDest(command):
    # return the dest mnemonic in the C-instruction 'commmand'
    if "=" in command:
        lhs, rhs = command.split("=")
        return lhs
    
    else:
        return "null"
    
def getComp(command):
    # return the comp mnemonic in the C-instruction 'commmand'
    # lhs=rhs;jhs

    #check for equal first
    if "=" in command:
        lhs, rhs = command.split("=")

    else:
        rhs = command

    #now check for semicolon
    if ";" in rhs:
        jhs, unused = rhs.split(";")
        return jhs
    
    else:
        return rhs
   
def getJump(command):
    # return the jump mnemonic in the C-instruction 'commmand'
    if ";" in command:
        lhs, rhs = command.split(";")
        return rhs
    
    else:
        return "null"

### Bringing it all together

What's left? Well, implement all the functions specified, and then finish functions for `Pass1` and `Pass2`. These will have to work on globally defined dictionaries to keep track of symbols and manage translation of binary. You'll also need to create and manage a `.hack` output file where the binary instructions are written. In addition to the functions mentioned in the API, and here, it's possible that you'll decide to write some helper functions, or even some classes. You're free to do as you like, provided the functions in the API are complete, and `Pass1` as well as `Pass2` are complete and well defined. I recommend breaking your code up into several files and using Python's `import` for managability and readability.

In [10]:
def processA(line,lineNo):
    # Convert an A-instruction line of assmebly to a binary code that is
    # 0 followed by a 15 bit address. Will use the symbol table to lookup
    # a symbol and replace it with a value. If label is not is symbol table
    # add it with correct RAM address (next in sequence)
    return None


def processC(line):
    # Convert a C-instruction line of code to the correct computation, destination,
    # and jump binary codes. These should be preceded by 111, which signifies the
    # C-instruction
    return None

def processL(line,lineNo):
    # When an L-Instruction (label in the form (LABEL)) is encountered, 
    # the label should be placed into the symbol table with the correct line
    pass

def pass_1(file):
    # scan each line of file and find L_COMMANDS
    # place them in the symbol table with appropriate ROM numbers
    pass

def pass_2(file):
    # Scan file and write correct binary code to file.
    pass