Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date
 
 
 
 
 
 
 
 
 
 
 
 
 
 

MiniJava-Compiler

A compiler for a small subset of java.

The compiler was written as the main project in an undergraduate compiler course.
Minijavac parses Minijava source code, reports syntax and semantic errors, and checks to ensure that variables are initialized before they are used.
If the source code is error free, java bytecodes(class files) are generated in the current working directory.
As a language addition, minijava developers can use the power operator(23 ~ 2**3).

Run

The jar file can be found here

minijavac requires java 1.8 to run.
Minijava source code can be compiled as follows:

java -jar minijavac.jar [source-file-name]

The generated byte code is backwards compatible with java 1.1 and will run on any version of java.

java [main-class-name]

Overview

Compilers operate in three main stages:

  1. Parse source files to generate a parse tree. Report any syntax errors.

  2. Traverse the parse tree, building a symbol table from the source code and reporting all semantic errors.

  3. Generate code in the target language(java bytecode in this case)

Parsing and Syntax Analysis

To parse minijava source code, a parser was generated with the Antlr4 parser generator tool. Antlr generates a parser in java for the given minijava grammar. The parser attempts to generates a parse tree given some minijava source code. If any syntax errors are encountered during parsing, an error message is generated underlining the offending tokens. If there are no syntax errors, the parser builds a parse tree and the compiler proceeds to semantic analysis.

Static Semantics and the Symbol Table

Semantic analysis proceeds in the following fashion.

  1. A symbol table is constructed be traversing the parse tree. During this phase, the program is checked for:
    1. Duplicate klasses
    2. Cyclic inheritance errors
    3. Declaration before use
  2. The program is type checked to grant as many static typing guarantees as it can. eg. x = x + y; is checked to ensure that:
    • x and y are both integers(+ operates only on ints in minijava)
    • The left hand side of the assignment, x, has the same type as the expression on the right side.
  3. Variables are checked to ensure that they are initialized before use.

For a given method f() taking no arguments and returning an integer:

	int f(){
		int x;    //declaration
		x=0;      //initialization
		return x; //use
	}

should compile, while

	int f(){
		int x;    //declaration
		return x; //error, variable x may not have been initialized.
	}

should print an initialization before use error

Bytecode Generation

This is where the [magic](CodeGenerator Documentation) happens. The parse tree is traversed a final time in order to generate byte code.

//: # (Generating bytecode, even with ASM, is akin to printing assembly to a file.)

Build

Building the tool from source requires the Antlr4 java library and the ASM java library.

  1. Clone the project from the git repository

  2. Download Antlr4 and ASM

  3. Unzip asm-5.0.3-bin.zip and add asm-5.0.3/lib/all/asm-all-5.03.jar and the antlr jar to your path.

  4. Generate the lexer and parser by feeding Antlr4 the minijava grammar.

    java -jar ~/path/to/antlr-4.4-complete.jar -visitor Minijava.g4

  5. compile with the java 1.8 compiler

    javac *.java

And you're done. If you want to package it as a jar file:

jar cf minijavac.jar *.class