A comprehensive compiler implementation for OPLang, a simple programming language, using the ANTLR4 parser generator.
This is a mini project for the Principle of Programming Languages course (CO3005) at Ho Chi Minh City University of Technology (VNU-HCM) that implements a compiler for OPLang, a custom programming language designed for educational purposes.
📋 For detailed language specification, see OPLang Specification
The project demonstrates fundamental concepts of compiler construction including:
- Lexical Analysis: Tokenization and error handling for invalid characters, unclosed strings, and illegal escape sequences
- Syntax Analysis: Grammar-based parsing using ANTLR4 (ANother Tool for Language Recognition)
- Error Handling: Comprehensive error reporting for both lexical and syntactic errors
- Testing Framework: Automated testing with HTML report generation
-
Read the language specification carefully
- Study the detailed OPLang Specification document
- Understand the syntax and semantics of the OPLang language
- Master the lexical and syntax rules
-
Implement the OPLang.g4 file
- Complete the ANTLR4 grammar file in
src/grammar/OPLang.g4
- Define lexical rules (tokens)
- Define parser rules (grammar rules)
- Handle precedence and associativity
- Complete the ANTLR4 grammar file in
-
Write 100 lexer tests and 100 parser tests
- 100 test cases for lexer in
tests/test_lexer.py
- Test valid and invalid tokens
- Test error handling (unclosed strings, illegal escape sequences, etc.)
- Test edge cases and boundary conditions
- 100 test cases for parser in
tests/test_parser.py
- Test valid grammar structures
- Test syntax errors and error recovery
- Test nested structures and complex expressions
- 100 test cases for lexer in
For lexical errors, the lexer must return the following tokens with specific lexemes:
-
ERROR_TOKEN with
<unrecognized char>
lexeme: when the lexer detects an unrecognized character. -
UNCLOSE_STRING with
<unclosed string>
lexeme: when the lexer detects an unterminated string. The<unclosed string>
lexeme does not include the opening quote. -
ILLEGAL_ESCAPE with
<wrong string>
lexeme: when the lexer detects an illegal escape in string. The wrong string is from the beginning of the string (without the opening quote) to the illegal escape.
- Grammar Implementation: Accuracy and completeness of the
OPLang.g4
file - Test Coverage: Quantity and quality of test cases (200 tests total)
- Error Handling: Capability to handle lexical and syntax errors
-
Study the AST Node Structure
- Read carefully all node classes in
src/utils/nodes.py
- Understand the AST node hierarchy and their properties
- Master how different language constructs map to AST nodes
- Read carefully all node classes in
-
Implement the ASTGeneration Class
- Create a class
ASTGeneration
insrc/astgen/ast_generation.py
- Inherit from
OPLangVisitor
(generated from ANTLR4) - Override visitor methods to construct appropriate AST nodes
- Handle all language constructs defined in the OPLang specification
- Create a class
-
Write 100 AST Generation Test Cases
- Implement 100 test cases in
tests/test_ast_gen.py
- Test AST generation for all language features
- Verify correct node types and structure
- Test edge cases and complex nested structures
- Implement 100 test cases in
The ASTGeneration
class must:
- Inherit from OPLangVisitor: Use the visitor pattern to traverse parse trees
- Return AST nodes: Each visit method should return appropriate node objects from
nodes.py
- Handle all constructs: Support all language features defined in the grammar
- Maintain structure: Preserve the logical structure and relationships between language elements
- AST Implementation: Correctness and completeness of the
ASTGeneration
class - Node Usage: Proper utilization of node classes from
nodes.py
- Test Coverage: Quality and comprehensiveness of 100 AST generation test cases
- Structure Accuracy: AST must correctly represent the source program structure
.
├── Makefile # Cross-platform build automation (Windows, macOS, Linux)
├── run.py # Main project entrypoint for build and test operations
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── venv/ # Python virtual environment (auto-generated)
├── build/ # Generated parser and lexer code
│ └── src/
│ └── grammar/ # Compiled ANTLR4 output
│ ├── OPLangLexer.py # Generated lexer
│ ├── OPLangParser.py # Generated parser
│ ├── OPLangVisitor.py # Generated visitor
│ └── *.tokens # Token definitions
├── external/ # External dependencies
│ └── antlr-4.13.2-complete.jar # ANTLR4 tool
├── reports/ # Automated test reports (HTML format)
│ ├── lexer/ # Lexer test reports with coverage
│ ├── parser/ # Parser test reports with coverage
│ ├── ast/ # AST generation test reports with coverage
│ ├── checker/ # Semantic checker test reports with coverage
│ └── codegen/ # Code generation test reports with coverage
├── src/ # Source code
│ ├── astgen/ # AST generation module
│ │ ├── __init__.py # Package initialization
│ │ └── ast_generation.py # ASTGeneration class implementation
│ ├── codegen/ # Code generation module
│ │ ├── __init__.py # Package initialization
│ │ ├── codegen.py # CodeGenerator class implementation
│ │ ├── emitter.py # Emitter class for JVM bytecode generation
│ │ ├── error.py # Code generation error definitions
│ │ ├── frame.py # Stack frame management
│ │ ├── io.py # I/O symbol definitions
│ │ ├── jasmin_code.py # Jasmin instruction generation
│ │ └── utils.py # Code generation utilities
│ ├── runtime/ # Runtime environment
│ │ ├── OPLang.class # Main runtime class (compiled)
│ │ ├── OPLang.j # Jasmin source for main class
│ │ ├── io.class # I/O runtime class (compiled)
│ │ └── jasmin.jar # Jasmin assembler
│ ├── semantics/ # Semantic analysis module
│ │ ├── __init__.py # Package initialization
│ │ ├── static_checker.py # StaticChecker class implementation
│ │ └── static_error.py # Semantic error definitions
│ ├── utils/ # Utility modules
│ │ ├── __init__.py # Package initialization
│ │ ├── nodes.py # AST node class definitions
│ │ └── visitor.py # Base visitor classes
│ └── grammar/ # Grammar definitions
│ ├── OPLang.g4 # ANTLR4 grammar specification
│ └── lexererr.py # Custom lexer error classes
└── tests/ # Comprehensive test suite
├── test_ast_gen.py # AST generation tests
├── test_checker.py # Semantic analysis tests
├── test_codegen.py # Code generation tests
├── test_lexer.py # Lexer functionality tests
├── test_parser.py # Parser functionality tests
└── utils.py # Testing utilities and helper classes
- Python 3.12+ (recommended) or Python 3.8+
- Java Runtime Environment (JRE) 8+ (required for ANTLR4)
- Git (for cloning the repository)
The project includes a comprehensive Makefile that supports:
- ✅ Windows (PowerShell/CMD)
- ✅ macOS (Terminal/Zsh/Bash)
- ✅ Linux (Bash/Zsh)
-
Clone the repository:
git clone <repository-url> cd project
-
Check system requirements:
make check # OR using the entrypoint script: # Windows: python run.py check # macOS/Linux: python3 run.py check
-
Set up the environment and install dependencies:
make setup # OR using the entrypoint script: # Windows: python run.py setup # macOS/Linux: python3 run.py setup
This command:
- Creates a Python virtual environment
- Installs required Python packages
- Downloads ANTLR4 JAR file automatically
-
Activate the virtual environment (REQUIRED before building and testing):
# On macOS/Linux: source venv/bin/activate # On Windows: venv\Scripts\activate
-
Build the compiler:
make build # OR using the entrypoint script: # Windows: python run.py build # macOS/Linux: python3 run.py build
-
Run tests:
make test-lexer # Test lexical analysis make test-parser # Test syntax analysis make test-ast # Test AST generation # OR using the entrypoint script: # Windows: python run.py test-lexer python run.py test-parser python run.py test-ast # macOS/Linux: python3 run.py test-lexer python3 run.py test-parser python3 run.py test-ast
Using Makefile (recommended):
make help # Get a full list of available commands
Using run.py entrypoint:
# Windows:
python run.py help # Get help for run.py commands
python run.py setup # Setup environment
python run.py build # Build compiler
python run.py test-lexer # Test lexer
python run.py test-parser # Test parser
python run.py test-ast # Test AST generation
python run.py clean # Clean build files
# macOS/Linux:
python3 run.py help # Get help for run.py commands
python3 run.py setup # Setup environment
python3 run.py build # Build compiler
python3 run.py test-lexer # Test lexer
python3 run.py test-parser # Test parser
python3 run.py test-ast # Test AST generation
python3 run.py clean # Clean build files
⚠️ Important: Always activate the virtual environment before running build and test commands:# On macOS/Linux: source venv/bin/activate # On Windows: venv\Scripts\activate
make setup
orpython run.py setup
(Windows) /python3 run.py setup
(macOS/Linux) - Install dependencies and set up environmentmake build
orpython run.py build
(Windows) /python3 run.py build
(macOS/Linux) - Compile ANTLR grammar files to Python codemake check
orpython run.py check
(Windows) /python3 run.py check
(macOS/Linux) - Verify required tools are installed
make test-lexer
orpython run.py test-lexer
(Windows) /python3 run.py test-lexer
(macOS/Linux) - Run lexer tests with HTML report generationmake test-parser
orpython run.py test-parser
(Windows) /python3 run.py test-parser
(macOS/Linux) - Run parser tests with HTML report generationmake test-ast
orpython run.py test-ast
(Windows) /python3 run.py test-ast
(macOS/Linux) - Run AST generation tests with HTML report generationmake test-checker
orpython run.py test-checker
(Windows) /python3 run.py test-checker
(macOS/Linux) - Run semantic checker tests with HTML report generationmake test-codegen
orpython run.py test-codegen
(Windows) /python3 run.py test-codegen
(macOS/Linux) - Run code generation tests with HTML report generation
make clean
orpython run.py clean
(Windows) /python3 run.py clean
(macOS/Linux) - Remove build directoriesmake clean-cache
orpython run.py clean-cache
(Windows) /python3 run.py clean-cache
(macOS/Linux) - Clean Python cache files (pycache, .pyc)make clean-reports
orpython run.py clean-reports
(Windows) /python3 run.py clean-reports
(macOS/Linux) - Remove generated test reportsmake clean-venv
orpython run.py clean-venv
(Windows) /python3 run.py clean-venv
(macOS/Linux) - Remove virtual environment
The project includes a comprehensive testing framework with:
- Unit Tests: Individual component testing using pytest
- Integration Tests: End-to-end compilation testing
- HTML Reports: Detailed test results with coverage information
- Automated CI: Ready for continuous integration setup
tests/test_lexer.py
- Lexical analysis teststests/test_parser.py
- Syntax analysis teststests/test_ast_gen.py
- AST generation teststests/test_checker.py
- Semantic analysis teststests/test_codegen.py
- Code generation teststests/utils.py
- Testing utilities and helper classes
# Activate virtual environment first (REQUIRED)
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows
# Run lexer tests
make test-lexer
# OR
# Windows:
python run.py test-lexer
# macOS/Linux:
python3 run.py test-lexer
# Run parser tests
make test-parser
# OR
# Windows:
python run.py test-parser
# macOS/Linux:
python3 run.py test-parser
# Run AST generation tests
make test-ast
# OR
# Windows:
python run.py test-ast
# macOS/Linux:
python3 run.py test-ast
# Run semantic checker tests
make test-checker
# OR
# Windows:
python run.py test-checker
# macOS/Linux:
python3 run.py test-checker
# Run code generation tests
make test-codegen
# OR
# Windows:
python run.py test-codegen
# macOS/Linux:
python3 run.py test-codegen
# View reports
# Windows:
start reports/lexer/index.html
start reports/parser/index.html
start reports/ast/index.html
start reports/checker/index.html
start reports/codegen/index.html
# macOS:
open reports/lexer/index.html
open reports/parser/index.html
open reports/ast/index.html
open reports/checker/index.html
open reports/codegen/index.html
# Linux:
xdg-open reports/lexer/index.html
xdg-open reports/parser/index.html
xdg-open reports/ast/index.html
xdg-open reports/checker/index.html
xdg-open reports/codegen/index.html
- ✅ Pass/Fail Status for each test case
- ✅ Execution Time measurements
- ✅ Error Messages with stack traces
- ✅ Code Coverage analysis
- ✅ HTML Export for easy sharing
The OPLang compiler follows a traditional compiler architecture:
Source Code (.OPLang)
↓
Lexical Analysis (OPLangLexer)
↓
Token Stream
↓
Syntax Analysis (OPLangParser)
↓
Parse Tree
↓
AST Generation (ASTGeneration) ← Assignment 2
↓
Abstract Syntax Tree (AST)
↓
Semantic Analysis (StaticChecker) ← Assignment 3
↓
Semantically Validated AST
↓
Code Generation (CodeGenerator) ← Assignment 4
↓
Jasmin Assembly Code (.j)
↓
JVM Bytecode (.class)
To add new language features:
-
Modify the grammar in
src/grammar/OPLang.g4
:// Add new rule assignment: ID '=' exp ';' ; // Add new token ASSIGN: '=' ;
-
Rebuild the parser:
# Activate virtual environment first source venv/bin/activate # macOS/Linux # venv\Scripts\activate # Windows make build # OR # Windows: python run.py build # macOS/Linux: python3 run.py build
-
Add test cases in
tests/
:def test_assignment(): source = "x = 42;" expected = "success" assert Parser(source).parse() == expected
-
Run tests to verify:
# Activate virtual environment first source venv/bin/activate # macOS/Linux # venv\Scripts\activate # Windows make test-parser # OR # Windows: python run.py test-parser # macOS/Linux: python3 run.py test-parser
def test_new_feature():
source = "your_test_input"
expected = "expected,tokens,EOF"
assert Tokenizer(source).get_tokens_as_string() == expected
def test_new_syntax():
source = """your test program"""
expected = "success" # or specific error message
assert Parser(source).parse() == expected
- Test functions must start with
test_
- Use descriptive names:
test_variable_declaration()
,test_function_call()
- Number tests sequentially:
test_001()
,test_002()
, etc.
- antlr4-python3-runtime==4.13.2 - ANTLR4 Python runtime for generated parsers
- pytest - Testing framework for unit and integration tests
- pytest-html - HTML report generation for test results
- pytest-timeout - Test timeout handling for long-running tests
- ANTLR 4.13.2 - Parser generator tool (auto-downloaded)
- Java Runtime Environment - Required to run ANTLR4 tool
The project automatically creates and manages a Python virtual environment to isolate dependencies.
# Install Java (macOS with Homebrew)
brew install openjdk
# Install Java (Ubuntu/Debian)
sudo apt update && sudo apt install openjdk-11-jre
# Install Java (Windows)
# Download from: https://www.oracle.com/java/technologies/downloads/
# macOS with Homebrew
brew install python@3.12
# Ubuntu/Debian
sudo apt install python3.12
# Windows
# Download from: https://www.python.org/downloads/
# Manual download if auto-download fails
mkdir -p external
cd external
curl -O https://www.antlr.org/download/antlr-4.13.2-complete.jar
cd ..
make build
# Clean and recreate virtual environment
make clean-venv
make setup
# Remember to activate before building/testing
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
# Ensure you have write permissions
chmod +x Makefile
- Check Prerequisites: Run
make check
to verify system setup - View Logs: Check terminal output for detailed error messages
- Clean Build: Try
make clean && make setup && make build
- Check Java: Ensure Java is properly installed and in PATH
- Virtual Environment: Always activate the virtual environment before running build/test commands:
source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows
This project is developed for educational purposes as part of the Principle of Programming Languages course (CO3005) at the Department of Computer Science, Faculty of Computer Science and Engineering - Ho Chi Minh City University of Technology (VNU-HCM).
- ANTLR Project: For providing an excellent parser generator tool
- Course Instructors: For guidance and project requirements
- Python Community: For the robust ecosystem of testing and development tools
Course: CO3005 - Principle of Programming Languages
Institution: Ho Chi Minh City University of Technology (VNU-HCM)
Department: Computer Science, Faculty of Computer Science and Engineering