Date: February 10, 2015 Author: David Harmon
The focus of this project is to build a fully functional compiler for the miniJava language, a subset of the Java language.
The miniJava language is a subset of Java. Every miniJava program is a legal Java program with Java semantics. Following is an informal summary of the syntactic restrictions of Java that define miniJava. Later assignments will modify restrictions. A miniJava program is a single file without a package declaration (hence corresponds to an unnamed or anonymous package), and has no imports. It consists of Java classes. Classes are simple; there are no interface classes, subclasses, or nested classes. The members of a class are fields and methods. Member declarations can specify public or private access, and can specify static instantiation. Fields do not have an initializing expression in their declaration. Methods have a parameter list and a body. There are no constructor methods. The types of miniJava are primitive types, class types, and array types. The primitive types are limited to int and boolean , and the array types are limited to the integer array int [] and the class [] array where class is any class type. The statements of miniJava are limited to the statement block, declaration statement, assignment statement, method invocation, conditional statement ( if ), and the repetitive statement ( while ). A declaration of a local variable can only appear as a statement within a statement block and must include an initial value assignment. The return statement, if present at all, can only appear as the last statement in a method and yields a result. The expressions of miniJava consist of operations applied to literals and references (including indexed and qualified references), method invocation, and new arrays and objects. Expressions may be parenthesized to specify evaluation order. The operators in miniJava are limited to
relational operations:
< == <= >= != logical operations: && || ! arithmetic operations:
-
-
- /
-
All operators are infix binary operators (binop) with the exception of the unary prefix operators (unop) logical negation(!), and arithmetic negation (-). The latter is both a unary and binary operator.
Lexical Rules
The terminals in the miniJava grammar are the tokens produced by the scanner. The token id stands for any identifier formed from a sequence of letters, digits, and underscores, starting with a letter. Uppercase letters are distinguished from lowercase letters. The token num stands for any integer literal that is a sequence of decimal digits. Tokens binop and unop stand for the operators listed above, and the token eot stands for the end of the input text. The remaining tokens stand for themselves (i.e. for the sequence of characters that are used to spell them). Keywords of the language are shown in bold for readability only; they are written in regular lowercase text. Whitespace and comments may appear before or after any token. Whitespace is limited to spaces, tabs ( \t ), newlines ( \n ) and carriage returns ( \r). There are two forms of comments. One starts with /* and ends with */ , while the other begins with // and extends to the end of the line. The text of miniJava programs is written in ASCII. Characters other than those that are part of a token, whitespace or a comment are erroneous.