Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend): store raw comments #15944

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

snprajwal
Copy link
Contributor

Previously, non-doc comments were discarded and not available in the AST. Now, it is stored in a rawComment field in the top-level AST node associated with the comment, gated behind the DMDLIB version flag.

@rikkimax also suggested that we should put it behind a different version flag, but I'm not sure if everyone's okay with introducing another gate into the frontend, so wanted to clarify it once here.

Also, I've been running into an interesting bug while using these changes - the doc comments are correctly attached to their respective node, but the regular comments get attached to the node above the respective node. E.g.

// This is a comment for foo
struct Foo {}
// This is a coment for bar
void bar() {}

The first comment disappears after parsing, and the second comment is attached to the struct instead of the function. Any ideas as to what's happening here? I've tried debugging it across this week but not had much luck.

cc @WebFreak001 @RazvanN7

Previously, non-doc comments were discarded and not available in the
AST. Now, it is stored in a `rawComment` field in the top-level AST node
associated with the comment, gated behind the `DMDLIB` version flag.

Signed-off-by: Prajwal S N <prajwalnadig21@gmail.com>
@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @snprajwal! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#15944"

@maxhaton
Copy link
Member

Do you need comments on the AST in dfmt?

Comment on lines +3093 to +3100
/***************************************************
* Parse a comment embedded between t.ptr and p.
* Remove trailing blanks and tabs from lines.
* Replace all newlines with \n.
* Preserve leading comment string in each line.
* Append to previous one for this token.
*/
private void getRawComment(Token* t) pure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't intended properly, also you don't need that many stars

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, missed this. Will fix.

Comment on lines +3093 to +3100
/***************************************************
* Parse a comment embedded between t.ptr and p.
* Remove trailing blanks and tabs from lines.
* Replace all newlines with \n.
* Preserve leading comment string in each line.
* Append to previous one for this token.
*/
private void getRawComment(Token* t) pure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a unittest


for (; q < p /* start of next token */ ; q++)
{
char c = *q;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this stuff not be reused from the existing comment lexing stuff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing function strips the comment prefix, and stores only the string. There's no way for us to figure out whether a comment is //, /**/, ///, or any of the other types. I couldn't see a way for us to store that information apart from using a different function to lex the comment and store it as a raw string, prefix included.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to take inspiration from libdparse, it stores the raw tokens (as slice of tokens) for comments and whitespace and has a getter function that extracts the ddoc comments and strips them of the borders (once, then memoizes the result)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDC keeps comments as a token.

For here I would get all the comment lexing logic into one place and then return the info anything else would need rather than just having basically the same logic twice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WebFreak001 what do you mean by slice of tokens (wrt comments)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important consideration for a lexer is raw speed. It's better to pay the price in extra code and have specific lexers hand-tuned for exactly what it needs, rather than have more generic code with multiple uses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but classifying (say) a comment after when it has been lexed (in all but name) is not where slowness comes from.

SDC (as mentioned above), does lots of SIMD/SWAR tricks that make it very fast at lexing, it's still all pretty much generic — the building blocks are the same everywhere, and just complicated enough that having them done 5 times over would preclude make all 5 versions faster.

@snprajwal
Copy link
Contributor Author

So I've had a bit of a nasty bug where the comments get attached to the wrong AST node (the previous language item, to be specific). I've not had the time to debug this due to some other stuff going on, I'll get to it in the coming week. Once I iron that out it should be good to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants