Interpreter for the Declaration Section of a C program
We’ll use 2 language tools: Lex and Yacc (or flex and bison) to accomplish this.
The LEX file
The lex file has 3 main parts, separated by “double percentage signs”. The first part is a list of harder files and function definitions encapsulated in “percentage-curly braces”. The second part has the list of acceptable tokens and the final is for some C user defined functions.
y.tab.h is generated by Yacc and it defines the list of tokens that we mention here.
In the next section, we start by defining datatype tokens. Although struct can be considered a datatype token, we considered it as a
STRUCT token separately because of its unique syntax. Then we define the single character tokens. We also define character, integer & floating point values. From here, we start getting more generic to Array_identifiers, identifiers and strings. When a linefeed character (
\n in this case) is encountered,
yylineno is incremented.
yylineno keeps track of the line number in the input C file. We just ignore any other token.
How this works is that. The token encountered is stored in a variable called
yytext. It is compared with these list of acceptable token formats from top to bottom. The first time it finds a match, the corresponding code on the right hand side is executed. If,say we encounter the token “int”, this is stored in
yytext and matched up against the token list. Since it is present in the beginning itself, the corresponding C code in curly braces is executed. Here, “int” is stored in
yylval is required to pass tokens recognized from the Lexer to the parser. So, this variable
yylval is considered the bridge between the two. You only need to pass values through
yylval if you plan to use the value in the parser. For other tokens, you don't need to worry about it. And this “dataType” could have been anything else, like “string_value” or “blah_blah”. I used dataType because it just makes sense to me.
In the 3rd and final section, we have a implemented a few functions.
yywrap is called when the End of input file is reached. It returns 1 to signal the end of input, which is true for our case as we input 1 file at a time.
yyerror is invoked when an invalid token or sequence is encountered. It takes in the error string as input and we display the line number and error message just like you would see in a compiler. I also defined an
InvalidToken function to catch stray tokens in case they don't invoke
yyerror. This may not be the most efficient way to define to tokens or write function, but its not half bad.
The YACC file
Like the lex file, this also consists of 3 sections. The first part is a list of harder files and function definitions encapsulated in “percentage-curly braces”. After that, we declare a number of datatypes and functions. If “extern” precedes their declaration, then those variables or those functions are actually defined externally. that is, not in this file. In between the first and second sections, we define the nature of the error message generated by YACC.
%unionallows us to define the members of yylval. yylval is actually of type “union”.
%tokenis used to define the tokens passed from the lex file. If the value of the token is passed, be sure to indicate the type in angular brackets .
%typeis used to define the symbols used below that are not tokens passed from the lex file. They are a combination of those tokens.
Everything mentioned here was between the 1st and 2nd section. Now lets take a look at the 2nd section, which is the meat of this file, if not the entire application. It is here that we define the Grammar for our language, which is C in our case.
Since we are dealing with declaration statements, it makes sense to make this “Declaration” the root of the tree. Every statement in our input C files must be a declaration statement of one of the following forms mentioned. If the statement does not conform to any of these froms, then
yyerror is invoked and this angular bracket stops execution.
Since it stops checking for errors after the first is encountered, I guess its more like an interpreter than a compiler.
In the third section, we have a main function which is the starting point of exectution. Here, we call
yyparse() which initiates all the tokenismg and parsing discussed until now. Its an amazing function. Since the program stops on encountering an error, yyparse will only return if no errors are found. So its safe to say that if a program reaches this point, there are no errors in the input C file.
Data_Type : character array that holds the data_type for the current declaration statement. noOfIdentifiers : number of identifiers in the input file.
clearBuffers() : to clear the value of the datatype stored. storeDataType() : to store the datatype of the current declaration statement. retrieveDataType(): Created to make things look uniform. isDuplicate() : checks if the newly encountered identifier has already been declared before. extractIdentifier() : extracts the name of the array. storeIdentifier() : add the encountered identifier to the list of identifiers. AssignmentError() is called in case an invalid assignment is made DuplicateIdentifierError() is called if the isDuplicate() function returns True in the yacc file.
isValidAssignment() : checks if the datatype which we pass in from the later part of the yacc file is the same as the current datatype of the identifier.
itoa, ftoa and ctoa are used to convert integers, floating point numbers and characters into ascii type.