Skip to content

Handling Omitted Semicolons in Go

Adrian Thurston edited this page Dec 2, 2019 · 2 revisions

The Go Programming Language allows the programmer to omit semicolons at the end of a line. While this is convenient and familiar to most programmers, it can make a real mess of an otherwise elegant grammar.

Go has a few simple lexical rules for when semis can be omitted. We can implement the first rule in Colm with a relatively self-contained rule that finds cases where they can be omitted and inserts them into the input stream. This allows us to use the elegant grammar that contains the semis.

What this looks like in colm:

token insert_semi
        / ( <tokens-before-optional-semi> '\n' /
    {
        Prefix: str = input->pull( match_length - 1 )
        input->push( ";" )
        input->push( Prefix )
    }

The <tokens-before-optional-semi> pattern can reference the various token patterns by name, which is a nice aspect of this. The reason I say relatively self-contained above is that one must also exclude the keywords from the list. Here is an example of how to do it in a fictional language called Tiny Imperative Language (TIL).

Note that in this case, we are inserting the semi as text. We could also insert it as a token with a zero-length string text if we didn't want to disturb the unparsed text.

lex
	literal `var `if `then `else `while `do `for `read `write
			`end `to `goto

	literal `:= `!= `+ `- `* `/ `= `( `) `: `;

	ignore /'//' [^\n]* '\n'/
	ignore /[\n\t ]+/

	token id /[a-zA-Z_]+/
	token integer /[0-9]+/
	token stringlit /'"' [^"]* '"'/

	token insert_semi /
		( ( id -
			'var' - 'if' - 'then' - 'else' - 'while' -
			'do' - 'for' - 'read' - 'write' - 'end' -
			'to' - 'goto' ) |
		integer | stringlit | ')' ) '\n' /
	{
		Prefix: str = input->pull( match_length - 1 )
		input->push( ";" )
		input->push( Prefix )
	}

end

def program
	[statement*]

def statement
	[declaration]
|	[assignment_statement]
|	[if_statement]
|	[while_statement]
|	[do_statement]
|	[for_statement]
|	[read_statement]
|	[write_statement]
|	[labelled_statement]
|	[goto_statement]

def declaration
	[`var id `;]

def assignment_statement
	[id `:= expression `;]

def if_statement
	[`if expression `then statement* opt_else_statement `end]

def opt_else_statement
	[`else statement*]
|	[]

def while_statement
	[`while expression `do statement* `end]

def do_statement
	[`do statement* `while expression `;]

def for_statement
	[`for id `:= expression `to expression `do statement* `end]

def read_statement
	[`read id `;]

def write_statement
	[`write expression `;]

def expression
	[term]
|	[expression eqop term]

def eqop [`=] | [`!=]

def term
	[factor]
|	[term addop factor]

def addop [`+] | [`-]

def factor
	[primary]
|	[factor mulop primary]

def mulop [`*] | [`/]

def primary
	[id]
|	[lit]
|	[`( expression `)]

def lit
	[integer]
|	[stringlit]

def labelled_statement
	[id `: statement]

def goto_statement
	[`goto id `;]

parse P: program[stdin]

if P 
	print[ P ]
else {
	print "[error]
	exit( 1 )
}
Clone this wiki locally