Separate lexer from parser #9475

jacob-carlborg · 2019-03-21T19:26:04Z

Use composition instead of inheritance.

dlang-bot · 2019-03-21T19:26:07Z

Thanks for your pull request and interest in making D better, @jacob-carlborg! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + dmd#9475"

thewilsonator · 2019-03-22T04:02:07Z

... compilable/testcolor.sh
==============================
Test compilable/testcolor.sh failed. The logged output:
Expected: __stdin.d\(2\):\ Error:\ no\ identifier\ for\ declarator\ \`test\`
Actual:   $'__stdin.d(2): Error: no identifier for declarator ` r\241(`'
00000000  5f 5f 73 74 64 69 6e 2e  64 28 32 29 3a 20 45 72  |__stdin.d(2): Er|
00000010  72 6f 72 3a 20 6e 6f 20  69 64 65 6e 74 69 66 69  |ror: no identifi|
00000020  65 72 20 66 6f 72 20 64  65 63 6c 61 72 61 74 6f  |er for declarato|
00000030  72 20 60 74 65 73 74 60  0a                       |r `test`.|
00000039
00000000  5f 5f 73 74 64 69 6e 2e  64 28 32 29 3a 20 45 72  |__stdin.d(2): Er|
00000010  72 6f 72 3a 20 6e 6f 20  69 64 65 6e 74 69 66 69  |ror: no identifi|
00000020  65 72 20 66 6f 72 20 64  65 63 6c 61 72 61 74 6f  |er for declarato|
00000030  72 20 60 20 72 a1 28 60  0a                       |r ` r.(`.|
00000039
==============================
Test compilable/testcolor.sh failed. The xtrace output:
+ source compilable/testcolor.sh
++ set -eo pipefail
++ [[ freebsd = *\w\i\n* ]]
++ expectedWithoutColor='__stdin.d(2): Error: no identifier for declarator `test`'
++ expectedWithColor='�[1m__stdin.d(2): �[1;31mError: �[mno identifier for declarator �[0;36m�[m�[1mtest�[0;36m�[m'
++ check -c test '__stdin.d(2): Error: no identifier for declarator `test`'
++ local actual expected
+++ echo test
+++ normalize
+++ tr -d '\n\r'
+++ /home/braddr/sandbox/at-client/pull-3572445-FreeBSD_32/dmd/generated/freebsd/release/32/dmd -c -o- -c -
+++ true
++ actual='__stdin.d(2): Error: no identifier for declarator ` r�(`'
++ compare '__stdin.d(2): Error: no identifier for declarator ` r�(`' '__stdin.d(2): Error: no identifier for declarator `test`'
++ local 'actual=__stdin.d(2): Error: no identifier for declarator ` r�(`'
++ local 'expected=__stdin.d(2): Error: no identifier for declarator `test`'
++ '[' '__stdin.d(2): Error: no identifier for declarator ` r�(`' '!=' '__stdin.d(2): Error: no identifier for declarator `test`' ']'
++ printf 'Expected: %q\n' '__stdin.d(2): Error: no identifier for declarator `test`'
++ printf 'Actual:   %q\n' '__stdin.d(2): Error: no identifier for declarator ` r�(`'

thewilsonator

Please merge when green.

jacob-carlborg · 2019-03-22T05:32:20Z

Yeah, I haven't figured out what's wrong yet.

WalterBright · 2019-03-22T05:49:27Z

Use composition instead of inheritance.

Why?

jacob-carlborg · 2019-03-22T07:10:31Z

Why?

I think it's better practice. I don't think a lexer and parser has a "isa" relation.

For a more practical reason. I'm planning to change the error handling in the lexer. To minimize the code that needs to change I need to implement wrapper functions in the parser for things like peek and peekNext. That's not possible with inheritance because the difference will only be the return type.

jacob-carlborg · 2019-03-22T07:20:52Z

With this change it's then possible to change both the parser and lexer to structs. Might give some minor performance improvements.

WalterBright · 2019-03-22T07:31:50Z

With this change it's then possible to change both the parser and lexer to structs. Might give some minor performance improvements.

I don't see how. The performance functions shouldn't be virtual.

I think it's better practice. I don't think a lexer and parser has a "isa" relation.

They share code, and it's apparently 120 fewer lines of source.

For a more practical reason. I'm planning to change the error handling in the lexer.

Why?

To minimize the code that needs to change I need to implement wrapper functions in the parser for things like peek and peekNext. That's not possible with inheritance because the difference will only be the return type.

Have you already made these changes and are submitting the PRs piece by piece? If so, I'd like to see what the purpose behind all this reshuffling is. If not, I think this needs to show some demonstrable value before submitting these changes.

ibuclaw · 2019-03-22T07:35:49Z

Aren't they both already final classes? And new'd as scope classes at that, so I'm not sure whether there's any measurable benefit (just thinking out loud however).

andralex

Seems like this change is aspirationally nice but practically a net negative. There should be a palpable benefit to adding boilerplate. I'm opposed to it.

andralex · 2019-03-22T07:41:04Z

src/dmd/parse.d

+        va_start(args, format);
+        lexer.deprecationSupplemental(loc, format, args);
+        va_end(args);
+    }


These 19 methods that now need awkward wrappers are evidence that this is not a good path of action. For some, probably callers could be replaced to spell parser.lexer.method instead of parser.method. That would run afoul of the law of Demeter (https://en.wikipedia.org/wiki/Law_of_Demeter) though.

I added the wrappers to reduce the number of changes required. Walter often complains that PRs are too big. Some of these methods are called in hundreds of places (peekNext is called 424 times just within the parser). With the wrappers they can be removed one at the time, each in a separate PR. Without the wrappers, I need to make all changes at once. It's also not just the parser that needs changing. There is code outside of the parser the access these methods as well:

dmd/src/dmd/compiler.d

Line 92 in 782dc94

p.scanloc = Loc.initial;

dmd/src/dmd/dsymbolsem.d

Line 1894 in 782dc94

if (p.token.value != TOK.endOfFile)

Although, one can question if a lexer needs to have a public API of 19 methods and fields.

Usually everything in the DMD code base is public. This might lead to code starting to use methods and fields they actually have no business of using. It leads to leaky abstractions and goes agains Law of Demeter, as you mentioned above.

Usually everything in the DMD code base is public.

Making more stuff private would be a better way of going about things.

The stuff is already used outside of the lexer, making it more difficult.

jacob-carlborg · 2019-03-22T10:36:55Z

I don't see how. The performance functions shouldn't be virtual.

I said "might", not "will". There was one method that could have been final but wasn't, releaseToken [1]. Although I don't think that has an impact on the performance.

They share code, and it's apparently 120 fewer lines of source.

I don't think that's a reason for inheritance. Object orientation is not used for sharing code, it's used to model objects and their relations. Perhaps it's a reason in C++ but in D we have better alternatives.

Why?

Have you already made these changes

Some of them, not all.

and are submitting the PRs piece by piece?

Yes. But I think this is an improvement regardless of those other changes.

If so, I'd like to see what the purpose behind all this reshuffling is. If not, I think this needs to show some demonstrable value before submitting these changes.

Here's the full story:

I would like the compiler to be more usable as a library. One of the current issues with the compiler is the global state (based on your PRs Walter, it looks like you want the same). One way to identify the global state is to add pure annotations to as many functions as possible. One of the reasons that many of the functions in the lexer cannot be made pure is due to the current error handling. Although it doesn't really have any state but it performs IO so the error functions cannot be made pure.

To solve this I plan to collect all the errors and return them from the appropriate functions and push the error printing up to the caller, in this case the caller is usually the parser. This is for functions like peek, scan and similar. The way I would do that is to combine the current return value with an array of diagnostics:

struct Diagnosed(T)
{
    T value;
    Diagnostic[] diagnostics;
}

final Diagnosed!(Token*) peek(Token* ct)

In the long run I want to do the same thing for the parser. But, to reduce the amount of code that need to be changed at once I would add a wrapper in the parser that prints the error:

// This interface is exactly the same as it is currently in the lexer
Token* peek(Token* ct)
{
    auto diagnosed = lexer.peek(ct);
    printDiagnostics(diagnosed.diagnostics);
    return diagnosed.value;
}

You can see the current progress here [2]. It's a bit messy but it's a work in progress.

[1] https://github.com/dlang/dmd/pull/9468/files#diff-b24a7d8934fc062aaa3f71e4e5f96081R515
[2] https://github.com/jacob-carlborg/dmd/tree/lexer-diagnostics

jacob-carlborg · 2019-03-22T10:40:31Z

Aren't they both already final classes?

Only the parser. The lexer cannot definitely be final since the parser inherits from it.

And new'd as scope classes at that, so I'm not sure whether there's any measurable benefit (just thinking out loud however).

Yes, it seems like scope is used. But I think it only complicates everything. One needs to remember to add final and always use scope.

In my opinion the least powerful abstraction should be used. In this case I don't think there's a reason for inheritance and therefore no reason for classes.

Use composition instead of inheritance.

codecov · 2019-04-04T20:48:54Z

Codecov Report

Merging #9475 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #9475   +/-   ##
=======================================
  Coverage   85.58%   85.58%           
=======================================
  Files         143      143           
  Lines       73776    73776           
=======================================
  Hits        63141    63141           
  Misses      10635    10635

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 782dc94...bbb76f3. Read the comment docs.

RazvanN7 · 2021-01-12T12:53:47Z

@jacob-carlborg Is it ok if we close this in favor of #9899 ?

jacob-carlborg · 2021-01-12T13:14:50Z

Is it ok if we close this in favor of #9899 ?

Close both of them.

jacob-carlborg added the Refactoring No semantic changes to code label Mar 21, 2019

jacob-carlborg force-pushed the separate-lexer-parser branch from 65ebfdd to 6fe42ae Compare March 21, 2019 19:36

thewilsonator approved these changes Mar 22, 2019

View reviewed changes

andralex requested changes Mar 22, 2019

View reviewed changes

Separate lexer from parser

bbb76f3

Use composition instead of inheritance.

jacob-carlborg force-pushed the separate-lexer-parser branch from 6fe42ae to bbb76f3 Compare March 22, 2019 11:48

thewilsonator mentioned this pull request May 27, 2019

Separate lexer from parser #9899

Closed

jacob-carlborg closed this Jan 12, 2021

RazvanN7 mentioned this pull request Mar 10, 2022

Template Parser with Lexer argument #13792

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate lexer from parser #9475

Separate lexer from parser #9475

jacob-carlborg commented Mar 21, 2019

dlang-bot commented Mar 21, 2019

thewilsonator commented Mar 22, 2019

thewilsonator left a comment

jacob-carlborg commented Mar 22, 2019

WalterBright commented Mar 22, 2019

jacob-carlborg commented Mar 22, 2019

jacob-carlborg commented Mar 22, 2019

WalterBright commented Mar 22, 2019

ibuclaw commented Mar 22, 2019

andralex left a comment •

edited

andralex Mar 22, 2019

jacob-carlborg Mar 22, 2019 •

edited

andralex Mar 28, 2019

jacob-carlborg Mar 29, 2019

jacob-carlborg commented Mar 22, 2019

jacob-carlborg commented Mar 22, 2019 •

edited

codecov bot commented Apr 4, 2019 •

edited

RazvanN7 commented Jan 12, 2021

jacob-carlborg commented Jan 12, 2021

Separate lexer from parser #9475

Separate lexer from parser #9475

Conversation

jacob-carlborg commented Mar 21, 2019

dlang-bot commented Mar 21, 2019

Bugzilla references

Testing this PR locally

thewilsonator commented Mar 22, 2019

thewilsonator left a comment

Choose a reason for hiding this comment

jacob-carlborg commented Mar 22, 2019

WalterBright commented Mar 22, 2019

jacob-carlborg commented Mar 22, 2019

jacob-carlborg commented Mar 22, 2019

WalterBright commented Mar 22, 2019

ibuclaw commented Mar 22, 2019

andralex left a comment • edited

Choose a reason for hiding this comment

andralex Mar 22, 2019

Choose a reason for hiding this comment

jacob-carlborg Mar 22, 2019 • edited

Choose a reason for hiding this comment

andralex Mar 28, 2019

Choose a reason for hiding this comment

jacob-carlborg Mar 29, 2019

Choose a reason for hiding this comment

jacob-carlborg commented Mar 22, 2019

jacob-carlborg commented Mar 22, 2019 • edited

codecov bot commented Apr 4, 2019 • edited

Codecov Report

RazvanN7 commented Jan 12, 2021

jacob-carlborg commented Jan 12, 2021

andralex left a comment •

edited

jacob-carlborg Mar 22, 2019 •

edited

jacob-carlborg commented Mar 22, 2019 •

edited

codecov bot commented Apr 4, 2019 •

edited