Skip to content

SijiaHoe/Logic_expression_compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logic Expression Compiler

Experiment Contents

​ The input is a logical expression containing a number, and the output is the boolean value after calculation and the number of comparisons skipped due to short-circuit operations. The output format is "Output: [true or false], [number of times]". Note that the evaluation of the logical expression conforms to the short-circuit algorithm. Among them, the operater "!" counts the number of logical operations, and "==" or "!=" are only used for non-boolean numeric comparisons.

Lexical Analysis

​ Flex is an open source lexical analyzer. The compiler converts the pattern input by the developer into a state transition diagram, and generates the corresponding implementation code, which is stored in the .yy, c files.

token symbol meaning
dight [0-9]|[1-9][0-9]* number
LT < less than
GT > greater than
LE <= less or equal to
GE >= greater or equal to
EQ == equal to
NE != not equal to
AND && and
OR || or
NOT ! not
LP ( left parentheses
RP ) right parentheses
newline \n\r line-termination
whitespace \t\f\v whitespace

Grammar Analysis

Bison

​ By using bison and flex together, it can convert the grammar rules provided by the user into a parser, read the productions of the grammar provided by the user, generate a LALR(1) action table in C language format, and include it in a C function named yyparse. The function of yyparse is to use this action table to parse the token stream, and the token stream is obtained by scanning the source program by the lexical analyzer generated by flex.

​ In bison, ":" represents a "->", different productions of the same non-terminal symbol are separated by "|", and the end with ";" indicates the end of a non-terminal symbol production; each production is followed by a curly bracket is a piece of C code that will be executed when the production is applied. This code is called an action. Comments can be inserted between the production and inside the C code (more on this code in this document later) ; When the right-hand side of the production is ε, there is no need to write any symbol, and a comment /* empty */ is generally used instead.

​ bison treats the nonterminal symbol on the left of the first production in the Productions section (Program in this document) as the start symbol of the grammar, and, to ensure that the start symbol is not on the right side of any production, bison will automatically generate the augmented grammar, and the newly added symbol is used as the starting symbol for parsing.

Grammatical precedence and associativity

​ The precedence and associativity of these operator tokens are defined in the .y file. In the code, the first declaration has a lower precedence, and the tokens declared at the same time have the same precedence. The rules are as follows:

Priority(top-down) simbol Associativity
1 OR Left
2 AND Left
3 LT GT LE GE Left
4 EQ NE Left
5 NOT Right
6 LP RP Left

Grammar Design

​ According to the form of logical expressions, the following grammar is summarized:

● Program => Exp

● Exp => NOT Exp

​ | Exp AND Exp

​ | Exp OR Exp

​ | Exp LT Exp

​ | Exp LE Exp

​ | Exp GT Exp

​ | Exp GE Exp

​ | Exp EQ Exp

​ | Exp NE Exp

​ | LP Exp Rp

​ | VALUE

​ Program is the start symbol of the grammar, and the corresponding augmented grammar Program' will be automatically generated after bison compilation.

Records of short circuit times

​ The number of short circuits is only relevant to the AND and OR operators. First, in the .y file, define a global variable to count the number of short circuits:

int short_cut = 0;	// record the number of shortcuts 

​ For AND operation, if the result of the first expression is False, a short circuit is performed. Then the short_cut number is incremented by 1, and the return is True; for OR operation, if the result of the first expression is True, a short circuit is performed, and the short_cut value is incremented 1, returned as True. In other cases, the short-circuit operation is not considered. The part of the code corresponding to the short-circuit operation is as follows:

	| Exp AND Exp {			// AND operation
		if ( $1 == 0 ) {	// The first operand is False, so short-circuit
			short_cut++;
			$$ = 0;
		}
		else
		{
			if ($3 == 1) { $$ = 1; }
			else { $$ = 0; }
		}
	}
	| Exp OR Exp {  	// OR operation
		if ($1 == 1) {	// The first operand is True, so short-circuit
			short_cut++;
			$$ = 1;
		}
		else {
			if ($3 == 0) { $$ = 0; }
			else { $$ = 1; }
		}
	}

Error Handling

​ All input symbols accepted by the program are as follows:

%token LT 	// <
%token GT	// >
%token LE	// <=
%token GE	// >=
%token EQ	// ==
%token NE	// !=
%token AND	// &&
%token OR	// ||
%token NOT	// !
%token LP	// (
%token RP	// )
digit	[0-9]|[1-9][0-9]*	// Numbers
whitespace [ \t\f\v]		// Whitespace
newline [\r\n]  		// Newline

​ When the input character is not in the above list, the program outputs an error message and displays the content of the first input error character. When there are multiple errors, only the first error will be displayed, and the operation will stop immediately when an error is detected. Code shows as below:

/* error, incorrect character */
.	{
	printf("Error! Wrong token '%s'.\n", yytext);
	exit(0);
}

compilation and Runtime output

Compile outputs

​ Compile files:

image

​ Compile succeeds, generates files listed below:

image

Operation Result

​ Test program output:

image image image

​ Let's test some illegal grammars:

image

​ The program works as expected.

About

Logic Expression Compiler Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published