feat(frontend): store raw comments #15944

snprajwal · 2023-12-22T17:48:39Z

Previously, non-doc comments were discarded and not available in the AST. Now, it is stored in a rawComment field in the top-level AST node associated with the comment, gated behind the DMDLIB version flag.

@rikkimax also suggested that we should put it behind a different version flag, but I'm not sure if everyone's okay with introducing another gate into the frontend, so wanted to clarify it once here.

Also, I've been running into an interesting bug while using these changes - the doc comments are correctly attached to their respective node, but the regular comments get attached to the node above the respective node. E.g.

// This is a comment for foo
struct Foo {}
// This is a coment for bar
void bar() {}

The first comment disappears after parsing, and the second comment is attached to the struct instead of the function. Any ideas as to what's happening here? I've tried debugging it across this week but not had much luck.

cc @WebFreak001 @RazvanN7

Previously, non-doc comments were discarded and not available in the AST. Now, it is stored in a `rawComment` field in the top-level AST node associated with the comment, gated behind the `DMDLIB` version flag. Signed-off-by: Prajwal S N <prajwalnadig21@gmail.com>

dlang-bot · 2023-12-22T17:48:42Z

Thanks for your pull request and interest in making D better, @snprajwal! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#15944"

maxhaton · 2023-12-23T22:41:27Z

Do you need comments on the AST in dfmt?

maxhaton · 2023-12-23T22:42:41Z

compiler/src/dmd/lexer.d

+        /***************************************************
+     * Parse a comment embedded between t.ptr and p.
+     * Remove trailing blanks and tabs from lines.
+     * Replace all newlines with \n.
+     * Preserve leading comment string in each line.
+     * Append to previous one for this token.
+     */
+        private void getRawComment(Token* t) pure


Isn't intended properly, also you don't need that many stars

Oops, missed this. Will fix.

maxhaton · 2023-12-23T22:43:07Z

compiler/src/dmd/lexer.d

+        /***************************************************
+     * Parse a comment embedded between t.ptr and p.
+     * Remove trailing blanks and tabs from lines.
+     * Replace all newlines with \n.
+     * Preserve leading comment string in each line.
+     * Append to previous one for this token.
+     */
+        private void getRawComment(Token* t) pure


This needs a unittest

maxhaton · 2023-12-23T22:43:46Z

compiler/src/dmd/lexer.d

+
+            for (; q < p /* start of next token */ ; q++)
+            {
+                char c = *q;


Can this stuff not be reused from the existing comment lexing stuff?

The existing function strips the comment prefix, and stores only the string. There's no way for us to figure out whether a comment is //, /**/, ///, or any of the other types. I couldn't see a way for us to store that information apart from using a different function to lex the comment and store it as a raw string, prefix included.

if you want to take inspiration from libdparse, it stores the raw tokens (as slice of tokens) for comments and whitespace and has a getter function that extracts the ddoc comments and strips them of the borders (once, then memoizes the result)

SDC keeps comments as a token.

For here I would get all the comment lexing logic into one place and then return the info anything else would need rather than just having basically the same logic twice.

@WebFreak001 what do you mean by slice of tokens (wrt comments)

An important consideration for a lexer is raw speed. It's better to pay the price in extra code and have specific lexers hand-tuned for exactly what it needs, rather than have more generic code with multiple uses.

Perhaps, but classifying (say) a comment after when it has been lexed (in all but name) is not where slowness comes from.

SDC (as mentioned above), does lots of SIMD/SWAR tricks that make it very fast at lexing, it's still all pretty much generic — the building blocks are the same everywhere, and just complicated enough that having them done 5 times over would preclude make all 5 versions faster.

snprajwal · 2024-02-02T17:30:02Z

So I've had a bit of a nasty bug where the comments get attached to the wrong AST node (the previous language item, to be specific). I've not had the time to debug this due to some other stuff going on, I'll get to it in the coming week. Once I iron that out it should be good to go.

maxhaton reviewed Dec 23, 2023

View reviewed changes

WalterBright approved these changes Feb 2, 2024

View reviewed changes

dlang-bot added the stalled label May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(frontend): store raw comments #15944

feat(frontend): store raw comments #15944

snprajwal commented Dec 22, 2023

dlang-bot commented Dec 22, 2023

maxhaton commented Dec 23, 2023

maxhaton Dec 23, 2023

snprajwal Dec 25, 2023

maxhaton Dec 23, 2023

maxhaton Dec 23, 2023

snprajwal Dec 25, 2023

WebFreak001 Dec 26, 2023

maxhaton Dec 26, 2023

maxhaton Dec 26, 2023

WalterBright Feb 2, 2024

maxhaton Feb 2, 2024

snprajwal commented Feb 2, 2024

feat(frontend): store raw comments #15944

Are you sure you want to change the base?

feat(frontend): store raw comments #15944

Conversation

snprajwal commented Dec 22, 2023

dlang-bot commented Dec 22, 2023

Bugzilla references

Testing this PR locally

maxhaton commented Dec 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snprajwal commented Feb 2, 2024