Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate lexer from parser #9475

Closed

Conversation

jacob-carlborg
Copy link
Contributor

Use composition instead of inheritance.

@jacob-carlborg jacob-carlborg added the Refactoring No semantic changes to code label Mar 21, 2019
@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @jacob-carlborg! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + dmd#9475"

@thewilsonator
Copy link
Contributor

... compilable/testcolor.sh
==============================
Test compilable/testcolor.sh failed. The logged output:
Expected: __stdin.d\(2\):\ Error:\ no\ identifier\ for\ declarator\ \`test\`
Actual:   $'__stdin.d(2): Error: no identifier for declarator ` r\241(`'
00000000  5f 5f 73 74 64 69 6e 2e  64 28 32 29 3a 20 45 72  |__stdin.d(2): Er|
00000010  72 6f 72 3a 20 6e 6f 20  69 64 65 6e 74 69 66 69  |ror: no identifi|
00000020  65 72 20 66 6f 72 20 64  65 63 6c 61 72 61 74 6f  |er for declarato|
00000030  72 20 60 74 65 73 74 60  0a                       |r `test`.|
00000039
00000000  5f 5f 73 74 64 69 6e 2e  64 28 32 29 3a 20 45 72  |__stdin.d(2): Er|
00000010  72 6f 72 3a 20 6e 6f 20  69 64 65 6e 74 69 66 69  |ror: no identifi|
00000020  65 72 20 66 6f 72 20 64  65 63 6c 61 72 61 74 6f  |er for declarato|
00000030  72 20 60 20 72 a1 28 60  0a                       |r ` r.(`.|
00000039
==============================
Test compilable/testcolor.sh failed. The xtrace output:
+ source compilable/testcolor.sh
++ set -eo pipefail
++ [[ freebsd = *\w\i\n* ]]
++ expectedWithoutColor='__stdin.d(2): Error: no identifier for declarator `test`'
++ expectedWithColor='�[1m__stdin.d(2): �[1;31mError: �[mno identifier for declarator �[0;36m�[m�[1mtest�[0;36m�[m'
++ check -c test '__stdin.d(2): Error: no identifier for declarator `test`'
++ local actual expected
+++ echo test
+++ normalize
+++ tr -d '\n\r'
+++ /home/braddr/sandbox/at-client/pull-3572445-FreeBSD_32/dmd/generated/freebsd/release/32/dmd -c -o- -c -
+++ true
++ actual='__stdin.d(2): Error: no identifier for declarator ` r�(`'
++ compare '__stdin.d(2): Error: no identifier for declarator ` r�(`' '__stdin.d(2): Error: no identifier for declarator `test`'
++ local 'actual=__stdin.d(2): Error: no identifier for declarator ` r�(`'
++ local 'expected=__stdin.d(2): Error: no identifier for declarator `test`'
++ '[' '__stdin.d(2): Error: no identifier for declarator ` r�(`' '!=' '__stdin.d(2): Error: no identifier for declarator `test`' ']'
++ printf 'Expected: %q\n' '__stdin.d(2): Error: no identifier for declarator `test`'
++ printf 'Actual:   %q\n' '__stdin.d(2): Error: no identifier for declarator ` r�(`'

Copy link
Contributor

@thewilsonator thewilsonator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge when green.

@jacob-carlborg
Copy link
Contributor Author

Yeah, I haven't figured out what's wrong yet.

@WalterBright
Copy link
Member

Use composition instead of inheritance.

Why?

@jacob-carlborg
Copy link
Contributor Author

Why?

I think it's better practice. I don't think a lexer and parser has a "isa" relation.

For a more practical reason. I'm planning to change the error handling in the lexer. To minimize the code that needs to change I need to implement wrapper functions in the parser for things like peek and peekNext. That's not possible with inheritance because the difference will only be the return type.

@jacob-carlborg
Copy link
Contributor Author

With this change it's then possible to change both the parser and lexer to structs. Might give some minor performance improvements.

@WalterBright
Copy link
Member

With this change it's then possible to change both the parser and lexer to structs. Might give some minor performance improvements.

I don't see how. The performance functions shouldn't be virtual.

I think it's better practice. I don't think a lexer and parser has a "isa" relation.

They share code, and it's apparently 120 fewer lines of source.

For a more practical reason. I'm planning to change the error handling in the lexer.

Why?

To minimize the code that needs to change I need to implement wrapper functions in the parser for things like peek and peekNext. That's not possible with inheritance because the difference will only be the return type.

Have you already made these changes and are submitting the PRs piece by piece? If so, I'd like to see what the purpose behind all this reshuffling is. If not, I think this needs to show some demonstrable value before submitting these changes.

@ibuclaw
Copy link
Member

ibuclaw commented Mar 22, 2019

Aren't they both already final classes? And new'd as scope classes at that, so I'm not sure whether there's any measurable benefit (just thinking out loud however).

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this change is aspirationally nice but practically a net negative. There should be a palpable benefit to adding boilerplate. I'm opposed to it.

va_start(args, format);
lexer.deprecationSupplemental(loc, format, args);
va_end(args);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 19 methods that now need awkward wrappers are evidence that this is not a good path of action. For some, probably callers could be replaced to spell parser.lexer.method instead of parser.method. That would run afoul of the law of Demeter (https://en.wikipedia.org/wiki/Law_of_Demeter) though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the wrappers to reduce the number of changes required. Walter often complains that PRs are too big. Some of these methods are called in hundreds of places (peekNext is called 424 times just within the parser). With the wrappers they can be removed one at the time, each in a separate PR. Without the wrappers, I need to make all changes at once. It's also not just the parser that needs changing. There is code outside of the parser the access these methods as well:

p.scanloc = Loc.initial;

if (p.token.value != TOK.endOfFile)

Although, one can question if a lexer needs to have a public API of 19 methods and fields.

Usually everything in the DMD code base is public. This might lead to code starting to use methods and fields they actually have no business of using. It leads to leaky abstractions and goes agains Law of Demeter, as you mentioned above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually everything in the DMD code base is public.

Making more stuff private would be a better way of going about things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stuff is already used outside of the lexer, making it more difficult.

@jacob-carlborg
Copy link
Contributor Author

I don't see how. The performance functions shouldn't be virtual.

I said "might", not "will". There was one method that could have been final but wasn't, releaseToken [1]. Although I don't think that has an impact on the performance.

They share code, and it's apparently 120 fewer lines of source.

I don't think that's a reason for inheritance. Object orientation is not used for sharing code, it's used to model objects and their relations. Perhaps it's a reason in C++ but in D we have better alternatives.

Why?

Have you already made these changes

Some of them, not all.

and are submitting the PRs piece by piece?

Yes. But I think this is an improvement regardless of those other changes.

If so, I'd like to see what the purpose behind all this reshuffling is. If not, I think this needs to show some demonstrable value before submitting these changes.

Here's the full story:

I would like the compiler to be more usable as a library. One of the current issues with the compiler is the global state (based on your PRs Walter, it looks like you want the same). One way to identify the global state is to add pure annotations to as many functions as possible. One of the reasons that many of the functions in the lexer cannot be made pure is due to the current error handling. Although it doesn't really have any state but it performs IO so the error functions cannot be made pure.

To solve this I plan to collect all the errors and return them from the appropriate functions and push the error printing up to the caller, in this case the caller is usually the parser. This is for functions like peek, scan and similar. The way I would do that is to combine the current return value with an array of diagnostics:

struct Diagnosed(T)
{
    T value;
    Diagnostic[] diagnostics;
}

final Diagnosed!(Token*) peek(Token* ct)

In the long run I want to do the same thing for the parser. But, to reduce the amount of code that need to be changed at once I would add a wrapper in the parser that prints the error:

// This interface is exactly the same as it is currently in the lexer
Token* peek(Token* ct)
{
    auto diagnosed = lexer.peek(ct);
    printDiagnostics(diagnosed.diagnostics);
    return diagnosed.value;
}

You can see the current progress here [2]. It's a bit messy but it's a work in progress.

[1] https://github.com/dlang/dmd/pull/9468/files#diff-b24a7d8934fc062aaa3f71e4e5f96081R515
[2] https://github.com/jacob-carlborg/dmd/tree/lexer-diagnostics

@jacob-carlborg
Copy link
Contributor Author

jacob-carlborg commented Mar 22, 2019

Aren't they both already final classes?

Only the parser. The lexer cannot definitely be final since the parser inherits from it.

And new'd as scope classes at that, so I'm not sure whether there's any measurable benefit (just thinking out loud however).

Yes, it seems like scope is used. But I think it only complicates everything. One needs to remember to add final and always use scope.

In my opinion the least powerful abstraction should be used. In this case I don't think there's a reason for inheritance and therefore no reason for classes.

Use composition instead of inheritance.
@codecov
Copy link

codecov bot commented Apr 4, 2019

Codecov Report

Merging #9475 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #9475   +/-   ##
=======================================
  Coverage   85.58%   85.58%           
=======================================
  Files         143      143           
  Lines       73776    73776           
=======================================
  Hits        63141    63141           
  Misses      10635    10635

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 782dc94...bbb76f3. Read the comment docs.

@RazvanN7
Copy link
Contributor

@jacob-carlborg Is it ok if we close this in favor of #9899 ?

@jacob-carlborg
Copy link
Contributor Author

Is it ok if we close this in favor of #9899 ?

Close both of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactoring No semantic changes to code
Projects
None yet
7 participants