A command-line C lexical analyzer that tokenizes source code and detects syntax errors.
This lexical analyzer parses C source files and identifies keywords, identifiers, operators, literals, and symbols. It validates numeric formats (decimal, hexadecimal, octal, binary), detects mismatched brackets and quotes, and reports syntax errors with line numbers.
- Tokenizes C source code into categories (keywords, identifiers, literals, operators, symbols)
- Supports multiple numeric formats (hex, octal, binary, float)
- Validates bracket matching:
(),{},[] - Detects unclosed strings and character literals
- Handles single-line and multi-line comments
- Reports errors with line numbers
- Ignores preprocessor directives
Clone the repository:
git clone https://github.com/ArjunVasavan/Lexical_analyzer.git
cd Lexical_analyzergcc *.c -o lexer./lexer test.c- Keywords: C reserved words (int, float, if, while, etc.)
- Identifiers: Variable and function names
- Numeric Literals: Integers, floats, hex (0x), octal (0), binary (0b)
- String Literals: Text in double quotes
- Character Literals: Characters in single quotes
- Operators: Arithmetic, logical, bitwise operators
- Symbols: Special characters (semicolons, commas, etc.)
- Invalid numeric formats
- Unclosed strings or character literals
- Mismatched brackets/parentheses
- Identifiers starting with digits
- Brackets/quotes not on the same line
The analyzer reads the input file character by character, building tokens based on patterns. It tracks bracket counts and line numbers to detect mismatches, validates numeric literals against their format rules, and reports errors immediately when found.
Input (test.c):
int main() {
int x = 0x1A;
printf("Hello");
return 0;
}Output:
Keyword : int
Identifier : main
Symbols : (
Symbols : )
Symbols : {
...
Arjun Vasavan
This project is licensed under the MIT License.
© 2025 Arjun Vasavan