This assignment is to write a scanner for the μC language (NOT C language) with lex. This document gives the lexical definition of the language, while the syntactic definition and code generation will follow in subsequent assignments.
Tokens are divided into two classes:
- tokens that will be passed to the parser, and
- tokens that will discarded by the scanner (e.g., recognized but not pased to the parser).
The following tokens will be recognized by the scanner and will be eventually passed to the parser.
An identifier is a string of letters (a ~ z
, A ~ Z
, _
) and digits ( 0 ~ 9
) and it begins with a letter or underscore. Identifiers are
case-sensitive; for example, ident
, Ident
, and IDENT
are not the same identifier. Note that keywords are not identifiers.
Integer literals: a sequence of one or more digits, such as 1
, 23
, and 666
.
Floating-point literals: numbers that contain floating decimal points, such as 0.2
and 3.141
.
A string literal is a sequence of zero or more ASCII characters appearing between double-quote ( "
) delimiters. A double-quote
appearing with a string must be written after a "
, e.g., "abc"
, "Hello world"
, and "She is a \"girl\""
.
The following tokens will be recognized by the scanner, but should be discarded, rather than returning to the parser.
A sequence of blanks(spaces), tabs, and newlines.
Comments can be added in several ways:
- C-style is texts surrounded by
/*
and*/
delimiters, which may span more than one line - C++ style comments are a text following a
//
delimiter running up to the end of the line.
Whichever comment style is encountered first remains in effect until the appropriate comment close is encountered. For example,
// this is a comment // line */ /* with /* delimiters */ before the end
and
/* this is a comment // line with some and C delimiters */
are both valid comments.
The undefined characters or strings should be discarded by your scanner during parsing.
Here we have prepared 11 μC programs, which are used to test the functionalities of your scanner.
python3 judge/judge.py input/in01_arithmetic.c output/in01.out
Make Makefile
$ make clean && make
Execute
$ ./myscanner < input/in01_arithmetic.c > output/in01.out
Check diff
$ diff -y tmp.out answer/in01_arithmetic.out
$ od -c answer/in05_comment.out
- Ubuntu 20.04 LTS
- Install dependencies:
$ sudo apt install gcc flex bison python3 git
- Generating a lexical analyzer with the lex command: (https://www.ibm.com/docs/en/aix/7.1?topic=information-generating-lexical-analyzer-lex-command)
- C tokens, Keywords, Identifiers: (https://www.guru99.com/c-tokens-keywords-identifier.html)