Lexical Analyzer

The task of translating high level code, i.e., programming languages, into a format that can be understood by a computer - binary code - is the main job of a compiler. Speaking in a simple way, the compiler can be splited in 3 parts:

Lexical Analyzer (LA)
Syntax Analyzer (SA)
Semantic Analyzer (SMA)

The Lexical Analyzer is responsible for separating the source code into lexemes, which are the words that compose the code. After separating all lexemes, the LA classifies them using Token classification. Keywords, Special Symbols, Identifiers and Operators, are examples of tokens. Removing white spaces and comments from the compiled code is also a role played by the Lexical Analyzer. The output of this process is a table containing the lexemes and their token classification. Lexical errors as invalid constructions of lexemes, e.g. '12variableName', 'na;;me', are also captured by the LA.

This project is an implementation of a simple Lexical Analyzer made in Java. It provides a GUI where the user can type the code and get the tokens of it. It is also possible to load the code from a file and make the analysis.

Recognized Tokens

The Lexical Analyzer of this project recognizes the following classes of tokens:

IDENTIFIER - Variable names;
STRING - Words between double quotes "";
INTEGER - Number with no dot ( . );
FLOAT - Float point numbers;
PLUS - ( + );
MINUS - ( - );
TIMES - ( * ),
DIVIDE - ( / );
KEYWORD - for, while, do, if, else, print, switch, case, default and null;
INVALID;
ASSIGN_OP - Assignment operator ( = );
SEMICOLON - ( ; )
LEFT_PARENTHESIS - '(';
RIGHT_PARENTHESIS - ')';
LEFT_BRACE - ( { );
RIGHT_BRACE - ( } );
COMMA - ( , );
DOT - ( . );
DOTDOT - ( .. );
COLON - ( : );
EQUAL - ( == );
LOWER_OR_EQUALS - ( <= );
GREATER_OR_EQUALS - ( >= );
NOT_EQUALS - ( <> );
GREATER_THAN - ( > );
LOWER_THAN - ( < );
AT_SIGN - ( @ ).

P.S. 1: Sentences initiated by // or chunks of sentences between / / are considered comments and are not mentioned in the output.

P.S. 2: The lexemes must be separated by at least one white space(' ') to be recognized as separated things.

Screenshot

Conclusion

This is a very simple example that demonstrates how a Lexical Analyzer can be implemented. This project is also a usage example of Finite-State Automata, a very powerful and useful tool.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gradle		.gradle
gradle/wrapper		gradle/wrapper
src/main		src/main
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lexical Analyzer

Recognized Tokens

Screenshot

Conclusion

About

Releases

Packages

Languages

FelipeTomazEC/Lexical-Analyzer

Folders and files

Latest commit

History

Repository files navigation

Lexical Analyzer

Recognized Tokens

Screenshot

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages