Added location to DVCondition #11907

AsterMiha · 2020-10-25T16:15:21Z

Some symbols still lack location information. This is an important feature when using dmd as a library, together with the file offset and end location allowing, for example, code refactoring.

dlang-bot · 2020-10-25T16:15:22Z

Thanks for your pull request and interest in making D better, @AsterMiha! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#11907"

ibuclaw

loc should be the first parameter to be consistent with other nodes.
const ref Loc loc

Geod24

Not sure about the location stored, CC @rainers .

Geod24 · 2020-10-25T19:57:40Z

test/unit/parser/dvcondition_location.d

+import dmd.globals : Loc;
+import dmd.visitor : SemanticTimeTransitiveVisitor;
+
+@beforeEach initializeFrontend()


Suggested change

@beforeEach initializeFrontend()

@beforeEach

void initializeFrontend()

Ditto for the other func

Geod24 · 2020-10-25T20:00:53Z

test/unit/parser/dvcondition_location.d

+    t.module_.accept(visitor);
+
+    assert(visitor.l.linnum == 1);
+    assert(visitor.l.charnum == 9);


So linnum and charnum are that of the condition ? What happens if I write:

version ( Hello )

Likewise, looking at the code, this is inconsistent with the way plain debug are treated, where the linnum / charnum will be on the start of the debug token.

(Aside: at some point, we should really have location ranges in all nodes, so that the start and end locations are recorded).

So linnum and charnum are that of the condition

I agree. The location should point to the first byte of version.

(Aside: at some point, we should really have location ranges in all nodes, so that the start and end locations are recorded).

Yes, that's what I've already suggested. It looks like @AsterMiha is working in that direction based on my feedback: #11788 (comment).

So linnum and charnum are that of the condition

I agree. The location should point to the first byte of version.

I was thinking of adding the location of version and debug to the ConditionalDeclaration nodes as these are the ones that would contain these conditions

rainers · 2020-10-26T06:54:25Z

Not sure about the location stored, CC @rainers .

The identifier location is the most important one, I have very similar changes in this commit: rainers@d4b6df8#diff-25a6634263c1b1f6fc4697a04e2b9904ea4b042a89af59dc93ec1f5d44848a26

I chose not to store a location for debug without identifier or number, but if it is set, it's likely going to be ignored anyway.

Addressed

Geod24 · 2020-10-26T07:04:07Z

Yeah, that makes sense.
I guess we can't solve that problem until we have ranges anyway, and this is better than the status quo.

jacob-carlborg · 2020-10-29T08:39:29Z

The identifier location is the most important one

@rainers @Geod24 I disagree. The most important thing is, as Iain mention as well, to have start and end location of all AST nodes. From the first byte to the last byte, not from some arbitrary location in the middle of the node. Otherwise source code refactoring tools won't work. I imagine that if you then want the location of the identifier, or any other part of an AST node, you would re-lex the range that the AST node occupies in the source buffer/file and look at the location of each individual token. It's also possible to add location to the Identifier class.

Or is it better to have a concrete syntax tree instead to build refactoring tools on?

ibuclaw · 2020-10-29T09:07:07Z

It's also possible to add location to the Identifier class.

I'd discourage that because identifiers are stored in a pool to prevent duplication. An IdentifierExp on the other hand...

The most important thing is, as Iain mention as well, to have start and end location of all AST nodes. From the first byte to the last byte, not from some arbitrary location in the middle of the node. Otherwise source code refactoring tools won't work.

Pooling locations would be the primer to that.

rainers · 2020-10-29T10:50:46Z

@rainers @Geod24 I disagree. The most important thing is, as Iain mention as well, to have start and end location of all AST nodes. From the first byte to the last byte, not from some arbitrary location in the middle of the node. Otherwise source code refactoring tools won't work.

Source range information is valuable, too, but before you can even think about refactoring, you have to make sense of all the identifiers within the code range.

I imagine that if you then want the location of the identifier, or any other part of an AST node, you would re-lex the range that the AST node occupies in the source buffer/file and look at the location of each individual token.

I don't think that's a feasible approach. You always have to interpret code in context, that's what the full frontend is needed for.

It's also possible to add location to the Identifier class.

As Iain mentioned, it's not possible for the existing Identifier class, but I've replaced Identifier references with a IdentifierAtLoc struct almost everywhere else. For the DVCondition I just filled the unused loc member, but I'm also fine with making it a separate identifier location member and reserving the existing loc for the full range of the condition.

jacob-carlborg · 2020-10-29T10:55:15Z

I don't think that's a feasible approach. You always have to interpret code in context, that's what the full frontend is needed for.

You already have the context. It's just that the location information has been lost.

rainers · 2020-10-29T10:56:56Z

You already have the context. It's just that the location information has been lost.

The same identifier can have multliple very different meanings in a very short code snippet, e.g. a.a.a.

jacob-carlborg · 2020-10-30T19:21:15Z

The same identifier can have multliple very different meanings in a very short code snippet, e.g. a.a.a.

We already have the AST nodes and the semantic analyzer to figure out the meaning of these identifiers. I'm saying that after those steps, after lexing, parsing and running semantic analysis one should be able to extract the tokens from the AST nodes. One way to do that, since the original tokens are not store in the AST nodes, is to re-lex the source range that the AST node occupies.

Either one needs to be build a concrete syntax tree or use the approach I mentioned above. Or do you have other suggestions how to implement source code refactoring?

rainers · 2020-10-31T07:23:38Z

Or do you have other suggestions how to implement source code refactoring?

I haven't implemented any refactoring yet, or even thought about it much. Providing good introspection and completion is already tough ;-)

I agree the source range can probably help, but might not even be good enough (e.g. when comments before/after the range need to be kept). Also beware of lowerings during the semantic phase that can make the connection to the original parsed syntax tree difficult. I suspect that you have to look at the requirements for each type of refactoring, e.g. for renaming a symbol, the identifier locations are probably good enough. Others might need to work on the token level, and lexing the range again can help.

BTW: the C/C++-header/idl conversion tool used when building Visual D keeps the range of the AST nodes as pointers to a list of tokens with whitespace and comments attached to the token.

jacob-carlborg · 2020-10-31T07:40:45Z

I haven't implemented any refactoring yet, or even thought about it much. Providing good introspection and completion is already tough ;-)

Yes, I know 😃.

I agree the source range can probably help, but might not even be good enough (e.g. when comments before/after the range need to be kept)

The lexer can be configured to produce tokens for comments. Not sure if that's enough for the scenario you have in mind.

Also beware of lowerings during the semantic phase that can make the connection to the original parsed syntax tree difficult.

Yes, I know. Unfortunately the semantic analyzer is destructive. I already had to deal with this problem in one of my tools. In that case I could iterate the AST twice, once just after parsing (to collect the data I needed) and once after semantic analysis. Then more or less compare the data before and after the semantic analysis.

I suspect that you have to look at the requirements for each type of refactoring, e.g. for renaming a symbol, the identifier locations are probably good enough. Others might need to work on the token level, and lexing the range again can help.

Yes, most likely.

As far as I see. If you have an AST node representing a version condition, the full source range of the whole node and want to rename the identifier used in the version condition. You can re-lex the source range, pick the first identifier token, get the location of that token and then just modify the source buffer. Should be enough for this use case at least. Since you're starting out with a known AST node and deriving the tokens from that, you have more knowledge of the tokens than during the initial lexing in the beginning.

BTW: the C/C++-header/idl conversion tool used when building Visual D keeps the range of the AST nodes as pointers to a list of tokens with whitespace and comments attached to the token.

Almost sounds like a concrete syntax tree.

AsterMiha requested review from Geod24 and RazvanN7 as code owners October 25, 2020 16:15

AsterMiha force-pushed the DVConditionLoc branch from ae32072 to 599f87a Compare October 25, 2020 16:30

ibuclaw previously requested changes Oct 25, 2020

View reviewed changes

Added location to DVCondition

c9b5d88

AsterMiha force-pushed the DVConditionLoc branch from 599f87a to c9b5d88 Compare October 25, 2020 19:16

Geod24 reviewed Oct 25, 2020

View reviewed changes

Geod24 approved these changes Oct 26, 2020

View reviewed changes

Geod24 merged commit e6075d3 into dlang:master Oct 26, 2020

AsterMiha mentioned this pull request Oct 29, 2020

Added location to ConditionalDeclaration and StaticIfDeclaration #11914

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added location to DVCondition #11907

Added location to DVCondition #11907

AsterMiha commented Oct 25, 2020

dlang-bot commented Oct 25, 2020

ibuclaw left a comment

Geod24 left a comment

Geod24 Oct 25, 2020

Geod24 Oct 25, 2020

ibuclaw Oct 25, 2020

jacob-carlborg Oct 29, 2020

jacob-carlborg Oct 29, 2020

AsterMiha Oct 29, 2020

rainers commented Oct 26, 2020

Geod24 commented Oct 26, 2020

jacob-carlborg commented Oct 29, 2020 •

edited

ibuclaw commented Oct 29, 2020

rainers commented Oct 29, 2020

jacob-carlborg commented Oct 29, 2020

rainers commented Oct 29, 2020

jacob-carlborg commented Oct 30, 2020

rainers commented Oct 31, 2020

jacob-carlborg commented Oct 31, 2020

	@beforeEach initializeFrontend()
	@beforeEach
	void initializeFrontend()

Added location to DVCondition #11907

Added location to DVCondition #11907

Conversation

AsterMiha commented Oct 25, 2020

dlang-bot commented Oct 25, 2020

Bugzilla references

Testing this PR locally

ibuclaw left a comment

Choose a reason for hiding this comment

Geod24 left a comment

Choose a reason for hiding this comment

Geod24 Oct 25, 2020

Choose a reason for hiding this comment

Geod24 Oct 25, 2020

Choose a reason for hiding this comment

ibuclaw Oct 25, 2020

Choose a reason for hiding this comment

jacob-carlborg Oct 29, 2020

Choose a reason for hiding this comment

jacob-carlborg Oct 29, 2020

Choose a reason for hiding this comment

AsterMiha Oct 29, 2020

Choose a reason for hiding this comment

rainers commented Oct 26, 2020

Geod24 commented Oct 26, 2020

jacob-carlborg commented Oct 29, 2020 • edited

ibuclaw commented Oct 29, 2020

rainers commented Oct 29, 2020

jacob-carlborg commented Oct 29, 2020

rainers commented Oct 29, 2020

jacob-carlborg commented Oct 30, 2020

rainers commented Oct 31, 2020

jacob-carlborg commented Oct 31, 2020

jacob-carlborg commented Oct 29, 2020 •

edited