New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate lexer from parser #9475
Conversation
Thanks for your pull request and interest in making D better, @jacob-carlborg! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub fetch digger
dub run digger -- build "master + dmd#9475" |
65ebfdd
to
6fe42ae
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please merge when green.
Yeah, I haven't figured out what's wrong yet. |
Why? |
I think it's better practice. I don't think a lexer and parser has a "isa" relation. For a more practical reason. I'm planning to change the error handling in the lexer. To minimize the code that needs to change I need to implement wrapper functions in the parser for things like |
With this change it's then possible to change both the parser and lexer to structs. Might give some minor performance improvements. |
I don't see how. The performance functions shouldn't be virtual.
They share code, and it's apparently 120 fewer lines of source.
Why?
Have you already made these changes and are submitting the PRs piece by piece? If so, I'd like to see what the purpose behind all this reshuffling is. If not, I think this needs to show some demonstrable value before submitting these changes. |
Aren't they both already final classes? And new'd as scope classes at that, so I'm not sure whether there's any measurable benefit (just thinking out loud however). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this change is aspirationally nice but practically a net negative. There should be a palpable benefit to adding boilerplate. I'm opposed to it.
va_start(args, format); | ||
lexer.deprecationSupplemental(loc, format, args); | ||
va_end(args); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 19 methods that now need awkward wrappers are evidence that this is not a good path of action. For some, probably callers could be replaced to spell parser.lexer.method
instead of parser.method
. That would run afoul of the law of Demeter (https://en.wikipedia.org/wiki/Law_of_Demeter) though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the wrappers to reduce the number of changes required. Walter often complains that PRs are too big. Some of these methods are called in hundreds of places (peekNext
is called 424 times just within the parser). With the wrappers they can be removed one at the time, each in a separate PR. Without the wrappers, I need to make all changes at once. It's also not just the parser that needs changing. There is code outside of the parser the access these methods as well:
Line 92 in 782dc94
p.scanloc = Loc.initial; |
Line 1894 in 782dc94
if (p.token.value != TOK.endOfFile) |
Although, one can question if a lexer needs to have a public API of 19 methods and fields.
Usually everything in the DMD code base is public. This might lead to code starting to use methods and fields they actually have no business of using. It leads to leaky abstractions and goes agains Law of Demeter, as you mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually everything in the DMD code base is public.
Making more stuff private would be a better way of going about things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stuff is already used outside of the lexer, making it more difficult.
I said "might", not "will". There was one method that could have been
I don't think that's a reason for inheritance. Object orientation is not used for sharing code, it's used to model objects and their relations. Perhaps it's a reason in C++ but in D we have better alternatives.
Some of them, not all.
Yes. But I think this is an improvement regardless of those other changes.
Here's the full story: I would like the compiler to be more usable as a library. One of the current issues with the compiler is the global state (based on your PRs Walter, it looks like you want the same). One way to identify the global state is to add To solve this I plan to collect all the errors and return them from the appropriate functions and push the error printing up to the caller, in this case the caller is usually the parser. This is for functions like struct Diagnosed(T)
{
T value;
Diagnostic[] diagnostics;
}
final Diagnosed!(Token*) peek(Token* ct) In the long run I want to do the same thing for the parser. But, to reduce the amount of code that need to be changed at once I would add a wrapper in the parser that prints the error: // This interface is exactly the same as it is currently in the lexer
Token* peek(Token* ct)
{
auto diagnosed = lexer.peek(ct);
printDiagnostics(diagnosed.diagnostics);
return diagnosed.value;
} You can see the current progress here [2]. It's a bit messy but it's a work in progress. [1] https://github.com/dlang/dmd/pull/9468/files#diff-b24a7d8934fc062aaa3f71e4e5f96081R515 |
Only the parser. The lexer cannot definitely be final since the parser inherits from it.
Yes, it seems like In my opinion the least powerful abstraction should be used. In this case I don't think there's a reason for inheritance and therefore no reason for classes. |
Use composition instead of inheritance.
6fe42ae
to
bbb76f3
Compare
Codecov Report
@@ Coverage Diff @@
## master #9475 +/- ##
=======================================
Coverage 85.58% 85.58%
=======================================
Files 143 143
Lines 73776 73776
=======================================
Hits 63141 63141
Misses 10635 10635 Continue to review full report at Codecov.
|
@jacob-carlborg Is it ok if we close this in favor of #9899 ? |
Close both of them. |
Use composition instead of inheritance.