Skip to content

Mohamed-code-13/compiler-project

Repository files navigation

Compiler Project - Lexical Analyzer Generator

Overview

A lexical analyzer generator that converts regular expression definitions into a Non-deterministic Finite Automaton (NFA) for token recognition.

Supports: Keywords, punctuations, identifiers, numbers, operators, character ranges (a-z), and standard regex operations (|, *, +, concatenation).


Project Structure

compiler_project/
├── CMakeLists.txt
├── main.cpp
├── README.md
│
├── LexicalAnalyzerGenerator/
│   ├── PostfixConverter.cpp       # Infix to postfix converter
│   ├── FiniteAutomaton.cpp        # State and FA classes
│   ├── NFA.cpp                    # NFA with regex operations
│   ├── RegexToNfaConverter.cpp    # Converts postfix to NFA
│   ├── LexicalParser.cpp          # Parses regex files
├── utils
│   └── StringUtils.cpp            # String utilities
│
└── test/
    ├── test_postfix.cpp
    ├── test_fa.cpp
    ├── test_nfa_operations.cpp
    ├── test_utils.cpp
    ├── test_parser.cpp
    └── test_integration.cpp

File Descriptions

LexicalAnalyzerGenerator/

File Purpose Key Components
PostfixConverter.cpp Converts infix regex to postfix using Shunting Yard algorithm PostfixConverter class, handles operator precedence and parentheses
FiniteAutomaton.cpp Basic automaton structure State class (transitions, epsilon closure), FiniteAutomaton class (start/end states, state counting)
NFA.cpp NFA with regex operations Wraps FiniteAutomaton, operators: + (positive closure), * (Kleene star), `
RegexToNfaConverter.cpp Converts postfix expressions to NFAs Stack-based Thompson's Construction, handles definitions and lambda transitions
LexicalParser.cpp Parses regex specification files Reads keywords {...}, punctuations [...], definitions =, expressions :, expands ranges (a-z), normalizes spaces
StringUtils.cpp String manipulation utilities split()

main.cpp

Entry point - parses input file and builds complete NFA.


Input File Format

{keyword1 keyword2}          # Keywords (priority 0)
[punct1 punct2]              # Punctuations (priority 1)
name = expression            # Regular definition
name : expression            # Token pattern (priority 2+)

Example:

{if while for}
[; , ( )]
letter = a-z | A-Z
digit = 0-9
id: letter (letter|digit)*
num: digit+

How to Build and Run

Build with CMake

# Create build directory
mkdir build && cd build

# Configure
cmake ..

# Build
make

# Run main program
./compiler_project

# Run tests
./test_postfix
./test_fa
./test_nfa_operations
./test_parser
./test_integration

Build without CMake

g++ -std=c++20 -I. main.cpp LexicalAnalyzerGenerator/*.cpp -o compiler_project

Testing

Test File Tests
test_postfix.cpp Infix to postfix conversion
test_fa.cpp Basic automaton operations
test_nfa_operations.cpp NFA regex operations (union, concatenation, closures)
test_utils.cpp String utilities
test_parser.cpp File parsing and token priorities
test_integration.cpp Complete pipeline

Run all tests:

cd build
./test_postfix && ./test_fa && ./test_nfa_operations && ./test_parser && ./test_integration

Pipeline Flow

Input File → LexicalParser → PostfixConverter → RegexToNfaConverter → Complete NFA
                ↓              ↓                      ↓
          Parse tokens   Infix→Postfix      Postfix→NFA (Thompson's)
          Expand ranges  Handle operators   Stack-based construction
          Normalize spaces

Key Algorithms

  • Shunting Yard: Infix to postfix conversion with operator precedence
  • Thompson's Construction: Postfix to NFA using stack-based approach
  • Epsilon Closure: DFS to compute ε-closure for NFA states
  • Range Expansion: a-za|b|c|...|z

Token Priority System

  • Priority 0: Keywords (highest)
  • Priority 1: Punctuations
  • Priority 2+: Regular expressions (order of definition)

Used to resolve conflicts during token recognition.


Quick Reference

Regex Operators

  • | - Alternation (union)
  • % - Concatenation
  • * - Kleene closure (0 or more)
  • + - Positive closure (1 or more)
  • () - Grouping
  • \ - Escape character

Special Tokens

  • \L - Lambda (epsilon/empty string)
  • a-z - Character range
  • \( - Escaped parenthesis

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors