EasyCC

An Easy Compiler Compiler implemented in Java

Introduction

EasyCC is a Compiler Compiler developed mainly to be an educational tool for students learning about compilers. The software has a command line interface and a graphical user interface. EasyCC is composed of four components: Lexical analysis, Syntax analysis, Semantic analysis and Code generation.

Lexical analysis

Lexical analysis is the first step in building a compiler. In this step the user has to build a finite automaton representing the possible tokens in the language. The finite automaton should be provided as a JSON file following this structure:

{
  "states": [
    {
      "types": ..., # initial, normal, final
      "name": ..., # unique name for a state
      "token": ..., # token name describing the target input
      "backtrack": ... # true | false. Usually a true is selected if the input has an unfixed length (e.g. variable, number).
    },
    {
      ... # Another state
    }
  ],
  "edges": [
    {
      "from": ..., # name of `from` state
      "to": ...,  # name of `to` state
      "value": ... # value of the edge
    },
    {
      ... # Another edge
    }
  ]
}

Edge special values:

SPACE Space or tab
EOL End of line
EOF End of file
LETTER A-Za-z
NONZERO 1-9
OTHER Any character not covered by an out-edge from a state

Auto-generate JSON

Writting a JSON file manually can be difficult for a large finite automaton. Therefore, a graphical user interface is developed to auto-generate the lexical JSON file by entering the states and the edges connecting them.

To run the lexical generator GUI:

./gradlew :lexical-analysis:lexical-generator:build :lexical-analysis:lexical-generator:run

Note: The states on the GUI are displayed using the BFS algorithm assuming that the graph generated is connected. Therefore a state is only visible once it is connected to another visible one

Syntax analysis

After defining the set of lexical tokens for a language, the next step is writting the language grammar. Each production in the grammar is composed of terminals and non-terminals tokens. The terminals tokens should match one of the lexical tokens defined in the previous step. The syntax for a grammar file is defined as follow:

% 'My language' Grammar file
% Author: EasyCC
A -> B 'C' D
B -> 'B' | EPSILON
D -> 'D'
   | EPSILON

Explanation:

Starting with % marks the line as a comment
First production is always the starting point for a grammar. (e.g. A -> B 'C' D)
A non-terminal is a word without single quote (e.g. A, B and D)
A terminal is a word with single quote (e.g. 'B', 'C' and 'D')
Multiple productions sharing the same left hand side can be written on one or more lines seperated by | symbol.
An epsilon is represented by the word EPSILON.
Spaces between words and symbols are important, otherwise an exception is thrown.

Name		Name	Last commit message	Last commit date
Latest commit History 253 Commits
buildSrc/src/main/groovy		buildSrc/src/main/groovy
code-generation		code-generation
common		common
dev-gui		dev-gui
easycc		easycc
gradle/wrapper		gradle/wrapper
lexical-analysis		lexical-analysis
semantic-analysis		semantic-analysis
syntax-analysis		syntax-analysis
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EasyCC

Table of contents

Introduction

Lexical analysis

Auto-generate JSON

Syntax analysis

About

Releases

Packages

Languages

License

amirbawab/EasyCC

Folders and files

Latest commit

History

Repository files navigation

EasyCC

Table of contents

Introduction

Lexical analysis

Auto-generate JSON

Syntax analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages