New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(frontend): store raw comments #15944
base: master
Are you sure you want to change the base?
Conversation
Previously, non-doc comments were discarded and not available in the AST. Now, it is stored in a `rawComment` field in the top-level AST node associated with the comment, gated behind the `DMDLIB` version flag. Signed-off-by: Prajwal S N <prajwalnadig21@gmail.com>
Thanks for your pull request and interest in making D better, @snprajwal! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + dmd#15944" |
Do you need comments on the AST in dfmt? |
/*************************************************** | ||
* Parse a comment embedded between t.ptr and p. | ||
* Remove trailing blanks and tabs from lines. | ||
* Replace all newlines with \n. | ||
* Preserve leading comment string in each line. | ||
* Append to previous one for this token. | ||
*/ | ||
private void getRawComment(Token* t) pure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't intended properly, also you don't need that many stars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, missed this. Will fix.
/*************************************************** | ||
* Parse a comment embedded between t.ptr and p. | ||
* Remove trailing blanks and tabs from lines. | ||
* Replace all newlines with \n. | ||
* Preserve leading comment string in each line. | ||
* Append to previous one for this token. | ||
*/ | ||
private void getRawComment(Token* t) pure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a unittest
|
||
for (; q < p /* start of next token */ ; q++) | ||
{ | ||
char c = *q; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this stuff not be reused from the existing comment lexing stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing function strips the comment prefix, and stores only the string. There's no way for us to figure out whether a comment is //
, /**/
, ///
, or any of the other types. I couldn't see a way for us to store that information apart from using a different function to lex the comment and store it as a raw string, prefix included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to take inspiration from libdparse, it stores the raw tokens (as slice of tokens) for comments and whitespace and has a getter function that extracts the ddoc comments and strips them of the borders (once, then memoizes the result)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDC keeps comments as a token.
For here I would get all the comment lexing logic into one place and then return the info anything else would need rather than just having basically the same logic twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WebFreak001 what do you mean by slice of tokens (wrt comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An important consideration for a lexer is raw speed. It's better to pay the price in extra code and have specific lexers hand-tuned for exactly what it needs, rather than have more generic code with multiple uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but classifying (say) a comment after when it has been lexed (in all but name) is not where slowness comes from.
SDC (as mentioned above), does lots of SIMD/SWAR tricks that make it very fast at lexing, it's still all pretty much generic — the building blocks are the same everywhere, and just complicated enough that having them done 5 times over would preclude make all 5 versions faster.
So I've had a bit of a nasty bug where the comments get attached to the wrong AST node (the previous language item, to be specific). I've not had the time to debug this due to some other stuff going on, I'll get to it in the coming week. Once I iron that out it should be good to go. |
Previously, non-doc comments were discarded and not available in the AST. Now, it is stored in a
rawComment
field in the top-level AST node associated with the comment, gated behind theDMDLIB
version flag.@rikkimax also suggested that we should put it behind a different version flag, but I'm not sure if everyone's okay with introducing another gate into the frontend, so wanted to clarify it once here.
Also, I've been running into an interesting bug while using these changes - the doc comments are correctly attached to their respective node, but the regular comments get attached to the node above the respective node. E.g.
The first comment disappears after parsing, and the second comment is attached to the struct instead of the function. Any ideas as to what's happening here? I've tried debugging it across this week but not had much luck.
cc @WebFreak001 @RazvanN7