Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added location to DVCondition #11907

Merged
merged 1 commit into from Oct 26, 2020
Merged

Added location to DVCondition #11907

merged 1 commit into from Oct 26, 2020

Conversation

AsterMiha
Copy link
Contributor

Some symbols still lack location information. This is an important feature when using dmd as a library, together with the file offset and end location allowing, for example, code refactoring.

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @AsterMiha! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#11907"

Copy link
Member

@ibuclaw ibuclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. loc should be the first parameter to be consistent with other nodes.
  2. const ref Loc loc

Copy link
Member

@Geod24 Geod24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the location stored, CC @rainers .

import dmd.globals : Loc;
import dmd.visitor : SemanticTimeTransitiveVisitor;

@beforeEach initializeFrontend()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@beforeEach initializeFrontend()
@beforeEach
void initializeFrontend()

Ditto for the other func

t.module_.accept(visitor);

assert(visitor.l.linnum == 1);
assert(visitor.l.charnum == 9);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So linnum and charnum are that of the condition ? What happens if I write:

version (
Hello
)

Likewise, looking at the code, this is inconsistent with the way plain debug are treated, where the linnum / charnum will be on the start of the debug token.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Aside: at some point, we should really have location ranges in all nodes, so that the start and end locations are recorded).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So linnum and charnum are that of the condition

I agree. The location should point to the first byte of version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Aside: at some point, we should really have location ranges in all nodes, so that the start and end locations are recorded).

Yes, that's what I've already suggested. It looks like @AsterMiha is working in that direction based on my feedback: #11788 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So linnum and charnum are that of the condition

I agree. The location should point to the first byte of version.

I was thinking of adding the location of version and debug to the ConditionalDeclaration nodes as these are the ones that would contain these conditions

@rainers
Copy link
Member

rainers commented Oct 26, 2020

Not sure about the location stored, CC @rainers .

The identifier location is the most important one, I have very similar changes in this commit: rainers@d4b6df8#diff-25a6634263c1b1f6fc4697a04e2b9904ea4b042a89af59dc93ec1f5d44848a26

I chose not to store a location for debug without identifier or number, but if it is set, it's likely going to be ignored anyway.

@Geod24
Copy link
Member

Geod24 commented Oct 26, 2020

Yeah, that makes sense.
I guess we can't solve that problem until we have ranges anyway, and this is better than the status quo.

@Geod24 Geod24 merged commit e6075d3 into dlang:master Oct 26, 2020
@jacob-carlborg
Copy link
Contributor

jacob-carlborg commented Oct 29, 2020

The identifier location is the most important one

@rainers @Geod24 I disagree. The most important thing is, as Iain mention as well, to have start and end location of all AST nodes. From the first byte to the last byte, not from some arbitrary location in the middle of the node. Otherwise source code refactoring tools won't work. I imagine that if you then want the location of the identifier, or any other part of an AST node, you would re-lex the range that the AST node occupies in the source buffer/file and look at the location of each individual token. It's also possible to add location to the Identifier class.

Or is it better to have a concrete syntax tree instead to build refactoring tools on?

@ibuclaw
Copy link
Member

ibuclaw commented Oct 29, 2020

It's also possible to add location to the Identifier class.

I'd discourage that because identifiers are stored in a pool to prevent duplication. An IdentifierExp on the other hand...

The most important thing is, as Iain mention as well, to have start and end location of all AST nodes. From the first byte to the last byte, not from some arbitrary location in the middle of the node. Otherwise source code refactoring tools won't work.

Pooling locations would be the primer to that.

@rainers
Copy link
Member

rainers commented Oct 29, 2020

@rainers @Geod24 I disagree. The most important thing is, as Iain mention as well, to have start and end location of all AST nodes. From the first byte to the last byte, not from some arbitrary location in the middle of the node. Otherwise source code refactoring tools won't work.

Source range information is valuable, too, but before you can even think about refactoring, you have to make sense of all the identifiers within the code range.

I imagine that if you then want the location of the identifier, or any other part of an AST node, you would re-lex the range that the AST node occupies in the source buffer/file and look at the location of each individual token.

I don't think that's a feasible approach. You always have to interpret code in context, that's what the full frontend is needed for.

It's also possible to add location to the Identifier class.

As Iain mentioned, it's not possible for the existing Identifier class, but I've replaced Identifier references with a IdentifierAtLoc struct almost everywhere else. For the DVCondition I just filled the unused loc member, but I'm also fine with making it a separate identifier location member and reserving the existing loc for the full range of the condition.

@jacob-carlborg
Copy link
Contributor

I don't think that's a feasible approach. You always have to interpret code in context, that's what the full frontend is needed for.

You already have the context. It's just that the location information has been lost.

@rainers
Copy link
Member

rainers commented Oct 29, 2020

You already have the context. It's just that the location information has been lost.

The same identifier can have multliple very different meanings in a very short code snippet, e.g. a.a.a.

@jacob-carlborg
Copy link
Contributor

The same identifier can have multliple very different meanings in a very short code snippet, e.g. a.a.a.

We already have the AST nodes and the semantic analyzer to figure out the meaning of these identifiers. I'm saying that after those steps, after lexing, parsing and running semantic analysis one should be able to extract the tokens from the AST nodes. One way to do that, since the original tokens are not store in the AST nodes, is to re-lex the source range that the AST node occupies.

Either one needs to be build a concrete syntax tree or use the approach I mentioned above. Or do you have other suggestions how to implement source code refactoring?

@rainers
Copy link
Member

rainers commented Oct 31, 2020

Or do you have other suggestions how to implement source code refactoring?

I haven't implemented any refactoring yet, or even thought about it much. Providing good introspection and completion is already tough ;-)

I agree the source range can probably help, but might not even be good enough (e.g. when comments before/after the range need to be kept). Also beware of lowerings during the semantic phase that can make the connection to the original parsed syntax tree difficult. I suspect that you have to look at the requirements for each type of refactoring, e.g. for renaming a symbol, the identifier locations are probably good enough. Others might need to work on the token level, and lexing the range again can help.

BTW: the C/C++-header/idl conversion tool used when building Visual D keeps the range of the AST nodes as pointers to a list of tokens with whitespace and comments attached to the token.

@jacob-carlborg
Copy link
Contributor

I haven't implemented any refactoring yet, or even thought about it much. Providing good introspection and completion is already tough ;-)

Yes, I know 😃.

I agree the source range can probably help, but might not even be good enough (e.g. when comments before/after the range need to be kept)

The lexer can be configured to produce tokens for comments. Not sure if that's enough for the scenario you have in mind.

Also beware of lowerings during the semantic phase that can make the connection to the original parsed syntax tree difficult.

Yes, I know. Unfortunately the semantic analyzer is destructive. I already had to deal with this problem in one of my tools. In that case I could iterate the AST twice, once just after parsing (to collect the data I needed) and once after semantic analysis. Then more or less compare the data before and after the semantic analysis.

I suspect that you have to look at the requirements for each type of refactoring, e.g. for renaming a symbol, the identifier locations are probably good enough. Others might need to work on the token level, and lexing the range again can help.

Yes, most likely.

As far as I see. If you have an AST node representing a version condition, the full source range of the whole node and want to rename the identifier used in the version condition. You can re-lex the source range, pick the first identifier token, get the location of that token and then just modify the source buffer. Should be enough for this use case at least. Since you're starting out with a known AST node and deriving the tokens from that, you have more knowledge of the tokens than during the initial lexing in the beginning.

BTW: the C/C++-header/idl conversion tool used when building Visual D keeps the range of the AST nodes as pointers to a list of tokens with whitespace and comments attached to the token.

Almost sounds like a concrete syntax tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants