Parser is a robust, production-ready recursive descent parser for a C-like programming language. It implements a full lexical analysis and syntax analysis pipeline, generating Abstract Syntax Trees (AST) that can be used for code generation, interpretation, or static analysis.
- Complete C-subset Support: Variables, functions, control flow, expressions
- Recursive Descent Parsing: Predictable, maintainable parsing strategy
- Comprehensive AST: Full tree representation with line/column tracking
- Symbol Table Management: Type checking and scope management
- Error Recovery: Synchronization points for continued parsing after errors
- Standard Library Integration: Built-in support for
printf,scanf,malloc,free - Operator Precedence: Correct handling of arithmetic and logical operators
- Type System: Support for
int,float,char,void, and pointer types
- Version: 3.0
- Status: Stable
- Author: ALI RAZA
- Repository: https://github.com/0xAli-Raza/AdvancedParser
flowchart TD
A["Token Stream(from Lexer)"] --> B["Parser(Recursive Descent)"]
B --> C["AST Nodes(Program Tree)"]
C --> D["Symbol Table(Type Info)"]
- Token Layer: Lexical tokens with metadata (line, column)
- Parser Layer: Syntax analysis and AST construction
- AST Layer: Tree representation of program structure
- Symbol Table Layer: Semantic information and type tracking
bash
Python 3.7+
colorama>=0.4.0bash
#Clone the repository
git clone https://github.com/0xAli-Raza/AdvancedParser.git
cd AdvancedParser
#Install dependencies
pip install colorama
#Run the parser
python parser.pyfrom parser import Parser, print_ast
# Define your tokens (token_type, lexeme, line, column)
tokens = [
('KEYWORD', 'int', 1, 0),
('IDENTIFIER', 'main', 1, 4),
('LPAREN', '(', 1, 8),
('RPAREN', ')', 1, 9),
('LBRACE', '{', 1, 11),
('KEYWORD', 'return', 2, 4),
('INTEGER_LITERAL', '0', 2, 11),
('SEMICOLON', ';', 2, 12),
('RBRACE', '}', 3, 0),
('EOF', '', 3, 1)
]
# Create parser and parse
parser = Parser(tokens)
ast = parser.parse()
# Print the AST
if not parser.errors:
print_ast(ast)
else:
for error in parser.errors:
print(error)Represents individual lexical tokens with position information.
class Token:
def __init__(self, token_type, lexeme, line, col):
self.type = token_type
self.lexeme = lexeme
self.line = line
self.col = colHierarchical node classes representing different language constructs:
Program- Root node containing all statementsFunctionDeclaration- Function definitionsVarDeclaration- Variable declarationsIfStatement,WhileStatement,ForStatement- Control structuresBinaryOp,ComparisonOp,LogicalOp- ExpressionsAssignment,CompoundAssignment- Assignment operationsFunctionCall,ReturnStatement- Function operations
Recursive descent parser implementing the C grammar.
Manages declared symbols with type information and scope.
The parser implements a subset of C grammar including:
- Declarations: Variable and function declarations
- Statements: Assignment, compound assignment, expression statements
- Control Flow: if-else, while, for loops
- Loop Control: break, continue
- Functions: Declaration, calls, return statements
- Expressions: Binary operations, comparisons, logical operations
- Literals: Integers, floats, strings
int- Integer typefloat- Floating-point typechar- Character typevoid- Void type (for functions)
Arithmetic: +, -, *, /, %
Comparison: ==, !=, <, >, <=, >=
Logical: &&, ||, !
Assignment: =, +=, -=, *=, /=
Conditional:
if (condition) {
// statements
} else {
// statements
}Loops:
while (condition) {
// statements
}
for (init; condition; update) {
// statements
}Loop Control:
break; // Exit loop
continue; // Skip to next iterationDeclaration:
int functionName(int param1, float param2) {
// statements
return value;
}Function Calls:
result = functionName(arg1, arg2);tokens = [
('KEYWORD', 'int', 1, 0),
('IDENTIFIER', 'x', 1, 4),
('OPERATOR', '=', 1, 6),
('INTEGER_LITERAL', '10', 1, 8),
('SEMICOLON', ';', 1, 10),
('EOF', '', 1, 11)
]
parser = Parser(tokens)
ast = parser.parse()tokens = [
('KEYWORD', 'int', 1, 0),
('IDENTIFIER', 'max', 1, 4),
('LPAREN', '(', 1, 7),
('KEYWORD', 'int', 1, 8),
('IDENTIFIER', 'a', 1, 12),
('COMMA', ',', 1, 13),
('KEYWORD', 'int', 1, 15),
('IDENTIFIER', 'b', 1, 19),
('RPAREN', ')', 1, 20),
('LBRACE', '{', 1, 22),
('KEYWORD', 'if', 2, 4),
('LPAREN', '(', 2, 7),
('IDENTIFIER', 'a', 2, 8),
('OPERATOR', '>', 2, 10),
('IDENTIFIER', 'b', 2, 12),
('RPAREN', ')', 2, 13),
('LBRACE', '{', 2, 15),
('KEYWORD', 'return', 3, 8),
('IDENTIFIER', 'a', 3, 15),
('SEMICOLON', ';', 3, 16),
('RBRACE', '}', 4, 4),
('KEYWORD', 'return', 5, 4),
('IDENTIFIER', 'b', 5, 11),
('SEMICOLON', ';', 5, 12),
('RBRACE', '}', 6, 0),
('EOF', '', 6, 1)
]
parser = Parser(tokens)
ast = parser.parse()
print_ast(ast)from parser import run_parser_tests
# Run all 15 built-in test cases
run_parser_tests()Parser(tokens: List[Tuple])Creates a new parser instance with the given token list.
Parameters:
tokens: List of tuples in format(token_type, lexeme, line, col)
parse() -> Program
Main entry point for parsing. Returns the root AST node.
parser = Parser(tokens)
ast = parser.parse()report_error(message: str)
Records a parsing error with line/column information.
errors: List of error messages encountered during parsingsymbol_table: Current symbol table instancecurrent_token: Token currently being processedloop_depth: Current nesting level of loops (for break/continue validation)function_depth: Current nesting level of functions (for return validation)
All AST nodes inherit from ASTNode and include line and col attributes for position tracking.
class Program(ASTNode):
def __init__(self, statements: List[ASTNode])Root node containing all top-level statements.
class FunctionDeclaration(ASTNode):
def __init__(self, return_type: str, name: str,
parameters: List[Parameter], body: Block)class VarDeclaration(ASTNode):
def __init__(self, var_type: str, identifier: str,
value: ASTNode = None)class BinaryOp(ASTNode):
def __init__(self, operator: str, left: ASTNode, right: ASTNode)class SymbolTable:
def declare(name: str, var_type: str, value=None,
is_function=False, params=None) -> Symbol
def lookup(name: str) -> Symbol
def exists(name: str) -> boolBuilt-in Functions:
printf(int, variadic)scanf(int, variadic)malloc(void*)free(void)
def print_ast(node: ASTNode, indent: int = 0, prefix: str = "")Pretty-prints the AST structure with indentation.
The parser implements panic-mode error recovery with intelligent synchronization:
-
Synchronization Points:
- Semicolons (
;) - Block delimiters (
{,}) - Statement keywords (
int,float,if,while,for,return)
- Semicolons (
-
Error Limit: Parsing stops after 10 errors to prevent excessive output
-
Context Preservation: Error recovery maintains parser state to continue parsing
Errors include precise location information:
Syntax Error at 5:12 - Expected SEMICOLON, got IDENTIFIER:'x'
- Break/Continue: Must be inside a loop
- Return: Must be inside a function
- Empty Identifiers: Detected and reported
The parser includes 15 comprehensive test cases:
- Simple Main Function
- Multiple Variable Types
- Arithmetic Operations
- Variable Assignment
- Compound Assignment
- If Statement
- If-Else Statement
- While Loop
- For Loop
- Break Statement
- Continue Statement
- Function Declaration with Call
- Standard Library Function Call
- Logical Operators
- Nested Loops
python parser.pyEach test displays:
- Input C code
- Token stream
- Generated AST
- Parse status (success/failure)
- Summary statistics
test_case = {
"name": "Your Test Name",
"code": """your C code here""",
"tokens": [
('TOKEN_TYPE', 'lexeme', line, col),
# ... more tokens
]
}- GitHub: @0xAli-Raza
- Repository: AdvancedParser
- Last Update: 22 Jan, 2026
