Skip to content

Abonite/MACPU-Assembler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

macpu_assembler

A assembler of my MACPU


About

In order to drive MACPU and make it have more application scenarios, we must implement a compiler of MACPU assembly language. In order to implement this compiler, I chose the high-performance and memory-safe rust language for development.

For the algorithm model of MACPU, please refer to here; for the FPGA implementation of MACPU, please refer to here.

At present, this assembler only supports single-file compilation, that is, it does not have the link function for the time being.


MACPU assembly syntax

naming, whitespace and comments

When naming various variables, the first letter can only be letters or underscores, and only letters, numbers or underscores can be used in the entire name.

In MACPU assembly, there are no special requirements for blank characters at the beginning of the line and in the line, that is to say, spaces (" ") and tabs ("\t") will be treated the same. The only way to judge the end of the line is a newline character ("\n") or read to the end of the line comment (";").

In MACPU assembly, comments should start with quotes ";".

type of data

In MACPU, you can use a variety of data types to identify, some are used to represent the base of the data, for example, "hex" represents hexadecimal, "oct" represents octal, and "bin" represents binary. When using, you only need to directly Just append these representations before the number, if nothing is added, it means decimal data, such as "hex7FFF", "oct756" and so on.

In addition, there are some tags used to indicate the storage form and data type of data in memory, such tags are , "byte", "word", "dword". When defining data, if the developer does not specify the storage form of the data in the memory, it will use "dword" for storage by default, which also corresponds to the 32-bit unsigned integer in the high-level language. The definition of the sign bit will affect the compiler's error checking and optimization.

preprocessing command

Preprocessing commands are special commands used to set assembler properties, inform the assembler about the program, or provide developers with convenient development. Such commands start with a period ".", such as ".SET". They are preprocessed or recorded by the compiler before the compilation action begins.

In MACPU assembly language, the following preprocessing instructions are currently supported:

  • SET - It is used to set the attributes of the assembler and program, such as ".SET CODESEGMENT hex1000", this instruction can tell the assembler that the currently compiled assembler requires the start address of the code segment to be set to 1000 in hexadecimal
  • VAR - This instruction is used to define a variable, such as ".VAR LENTH 10", this variable will be placed in the data segment, the specific offset address in the data segment will be automatically generated by the compiler, and it can be modified value, in subsequent programs, you can use it directly when you need to use the value "LENTH"
  • STR - This command is used to define a character string, such as ".STR NAME "ALAN TURING"", this data will also be saved in the data segment, in this example, when the developer uses the variable "NAME", get The address of the first character of the entire string in memory is obtained, and the "\0" character representing the end of the string will be automatically added
  • ARR - This instruction will create a continuous piece of data, just like an array in C language. Same as in C language, this instruction requires developers to ensure that the internal data must all be of the same type, like this: ".ARR Byte MYDATA 0,1,2,3,4", which will not affect development The follow-up operation of the personnel, because the processing and use of the array still needs to be written by the developer, but this will affect the behavior of the assembler, because different data types will occupy different lengths in memory, and the assembler will also Perform corresponding detection for the data type. Therefore, when using ARR, it is recommended that developers record the length of the array at the same time to prevent out-of-bounds. Same as "STR", when developers use "MYDATA", the program will get the location of the first value of this array in memory
  • DEF - This instruction is the same as the macro definition in C language, and only provides the function of string replacement. This replacement will be performed after the precompilation command processing is completed and before the official compilation starts.

Representation of various elements

When writing assembly language code, we come across various elements: instructions, registers, immediate numbers, addresses, etc. When writing code specifically, these elements should be expressed in the following form:

  • instruction - All assembly instructions should appear at the beginning of the line, just like the old 8086 assembly, such as "ADD %A1, %A2, %AR1".All instructions must be in upper case
  • register - All registers should start with a percent sign "%".All registers must be in upper case
  • immediate number - Immediate numbers do not need to add any tags, the assembler will automatically recognize them
  • address - All addresses should be marked with "[]", for example: "[%A1]" or "[hex889]"
  • label - A label is not an instruction, it is only used to prompt the compiler for some important program nodes, which can help developers simplify development when using instructions similar to "JMP". Labels must end with a colon ":", eg "LOOP:". Labels can be uppercase or lowercase

工作原理

  1. 读取文件并将文件按行拆分,获得一个Vector,其中的每一个元素都是行字符串与行号的元组
  2. 遍历所有以点号开头的行,并将其按照预处理指令的要求解析(DEFINE,DATA,ARRAY,SET),将有效的指令从Vector中删除,创建一个协程执行需要进行字符串替换的预处理指令(DEFINE,DATA,ARRAY),使用正则表达式。要求DEFINE指令被先处理,DATA次之,ARRAY再次。三种指令不能有重复的命名,不能以数字或引号开头,要对规则进行检查。处理DATA、ARRAY时,局应当同时生成数据端数组和记录变量的hashmap。在进行过DEFINE替换后,再将代码中的非DEFINE变量用hashmap替换
  3. 获取SET指令的设置,并以此为基准进行后续操作
  4. 按行读取指令,先获取所有的有效行数,并创建一个数组,每一个元素都是语法分析树。每读取一行就再给行标记一个连续的新标号,再以协程进行词法分析,词法分析应使用状态机逐个字符处理,拼接语法分析树,协程结束后按标号将语法分析树写入数组中。协程的返回值是result,将所有异常单独收集并统一列出
  5. 如果所有协程都没有异常,则再开始读取语法分析树生成二进制数据,将它们写入预估好大小的数组,并记录标签的位置,所有涉及标签的指令都应该单独最后执行

About

A assembler of my pycpu

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages