Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate code generation from parsing in expressions. #387

Merged
merged 1 commit into from
Oct 1, 2019

Conversation

dvander
Copy link
Member

@dvander dvander commented Sep 30, 2019

The Achilles' heel of the compiler has always been how it interleaves
parsing with code generation. This causes so many intriciate problems
that it is almost impossible to significantly improve the language. It
also makes it difficult to know what the actual semantics of the
language are.

This series of patches aims to correct this long-standing behavior by
finally separating code generation from parsing. To make this change as
incremental as possible, a new parser has only been introduced for
expressions (that is, not statements). In addition, code generation has
been kept as similar as possible, even when there are obvious
improvements to make.

Expressions now parse, bind, analyze, and emit code in distinct,
separate steps. Since the compiler is still multi-pass, these steps
occur within the old compiler framework, to make them as transparent as
possible.

The new parse step is the simplest, and builds an AST. The bind step
resolves any symbolic names. The analysis step performs type checks,
computes "values" and constants, and prepares the AST for emitting code.
The AST retains many quirks of the old parser related to l-values and
constants, with a few exceptions.

The old parser would load constants, then delete the resulting assembly
if they could be folded later. The new parser does not do this.
Constants are folded during analysis, and if they make it to code
generation, are emitted there. Callers are responsible for folding
constants opportunistically. A very desirable goal is to entirely remove
the staging pipeline and peephole optimizer, and this is a major step
towards that.

Type checking is essentially identical to the old parser.

Even though code generation is mostly identical, there are a few cases
where it's not. For example, function calls were a particular nightmare
in the old parser, because arguments are evaluated right-to-left but
parsed left-to-right. This led to an extraordinary series of hacks,
involving a special stage-and-reorder algorithm in the assembler, and a
mechanism to keep moving the stack around so |this| landed in the
correct position. With an AST, absolutely none of this is necessary.
The generated code will look roughly the same except for member function
calls.

Another place code generation has changed is for logical operators (||
and &&). The emitter code for this has been borrowed from exp/compiler
since it is very simple and efficient.

Finally...

The old parser would usually ignore errors and keep parsing. This was
necessary because the first pass does not have full type information yet
and is usually guaranteed to throw errors. However, the more progress
it can make, the more accurate future passes will be. The new parser aborts
analysis immediately on any type error (currently limited to the scope
of an expression). This has a few implications. One is that, on badly
broken content, the exact error message that bubbles up may change. The
second is that we are aggressive about calls to markusage() since the
correctness of the second pass relies on them. These calls happen during
binding and as early during analysis as possible.

So, what does this all mean for exp/compiler (aka spcomp2)? Nothing
changes: it is inactive. However we will look to borrow more ideas
and code from the experimental tree.

The new parser is hidden behind a flag (-N, for "new parser"), and is
disabled by default until full test coverage is available.

@Fyren
Copy link
Contributor

Fyren commented Sep 30, 2019

Designated initializers are actually C++20, though g++ and clang++ have supported them as an extension for a long time. Looks like MSVC doesn't.

@dvander dvander force-pushed the part-the-sea-ch-1 branch 4 times, most recently from b516654 to f5491bf Compare October 1, 2019 05:16
The Achilles' heel of the compiler has always been how it interleaves
parsing with code generation. This causes so many intriciate problems
that it is almost impossible to significantly improve the language. It
also makes it difficult to know what the actual semantics of the
language are.

This series of patches aims to correct this long-standing behavior by
finally separating code generation from parsing. To make this change as
incremental as possible, a new parser has only been introduced for
expressions (that is, not statements). In addition, code generation has
been kept as similar as possible, even when there are obvious
improvements to make.

Expressions now parse, bind, analyze, and emit code in distinct,
separate steps. Since the compiler is still multi-pass, these steps
occur within the old compiler framework, to make them as transparent as
possible.

The new parse step is the simplest, and builds an AST. The bind step
resolves any symbolic names. The analysis step performs type checks,
computes "values" and constants, and prepares the AST for emitting code.
The AST retains many quirks of the old parser related to l-values and
constants, with a few exceptions.

The old parser would load constants, then delete the resulting assembly
if they could be folded later. The new parser does not do this.
Constants are folded during analysis, and if they make it to code
generation, are emitted there. Callers are responsible for folding
constants opportunistically. A very desirable goal is to entirely remove
the staging pipeline and peephole optimizer, and this is a major step
towards that.

Type checking is essentially identical to the old parser.

Even though code generation is mostly identical, there are a few cases
where it's not. For example, function calls were a particular nightmare
in the old parser, because arguments are evaluated right-to-left but
parsed left-to-right. This led to an extraordinary series of hacks,
involving a special stage-and-reorder algorithm in the assembler, and a
mechanism to keep moving the stack around so |this| landed in the
correct position. With an AST, absolutely none of this is necessary.
The generated code will look roughly the same except for member function
calls.

Another place code generation has changed is for logical operators (||
and &&). The emitter code for this has been borrowed from exp/compiler
since it is very simple and efficient.

Finally...

The old parser would usually ignore errors and keep parsing. This was
necessary because the first pass does not have full type information yet
and is usually guaranteed to throw errors. However, the more progress
it can make, the more accurate future passes will be. The new parser aborts
analysis immediately on any type error (currently limited to the scope
of an expression). This has a few implications. One is that, on badly
broken content, the exact error message that bubbles up may change. The
second is that we are aggressive about calls to markusage() since the
correctness of the second pass relies on them. These calls happen during
binding and as early during analysis as possible.

So, what does this all mean for exp/compiler (aka spcomp2)? Nothing
changes: it is inactive. However we will look to borrow more ideas
and code from the experimental tree.

The new parser is hidden behind a flag (-N, for "new parser"), and is
disabled by default until full test coverage is available.
@dvander dvander force-pushed the part-the-sea-ch-1 branch from f5491bf to d169840 Compare October 1, 2019 05:25
@dvander dvander merged commit 1de2266 into master Oct 1, 2019
@dvander dvander deleted the part-the-sea-ch-1 branch October 1, 2019 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants