New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added location to DVCondition #11907
Conversation
Thanks for your pull request and interest in making D better, @AsterMiha! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + dmd#11907" |
ae32072
to
599f87a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loc
should be the first parameter to be consistent with other nodes.const ref Loc loc
599f87a
to
c9b5d88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the location stored, CC @rainers .
import dmd.globals : Loc; | ||
import dmd.visitor : SemanticTimeTransitiveVisitor; | ||
|
||
@beforeEach initializeFrontend() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beforeEach initializeFrontend() | |
@beforeEach | |
void initializeFrontend() |
Ditto for the other func
t.module_.accept(visitor); | ||
|
||
assert(visitor.l.linnum == 1); | ||
assert(visitor.l.charnum == 9); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So linnum and charnum are that of the condition ? What happens if I write:
version (
Hello
)
Likewise, looking at the code, this is inconsistent with the way plain debug
are treated, where the linnum / charnum will be on the start of the debug
token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Aside: at some point, we should really have location ranges in all nodes, so that the start and end locations are recorded).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So linnum and charnum are that of the condition
I agree. The location should point to the first byte of version
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Aside: at some point, we should really have location ranges in all nodes, so that the start and end locations are recorded).
Yes, that's what I've already suggested. It looks like @AsterMiha is working in that direction based on my feedback: #11788 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So linnum and charnum are that of the condition
I agree. The location should point to the first byte of
version
.
I was thinking of adding the location of version
and debug
to the ConditionalDeclaration nodes as these are the ones that would contain these conditions
The identifier location is the most important one, I have very similar changes in this commit: rainers@d4b6df8#diff-25a6634263c1b1f6fc4697a04e2b9904ea4b042a89af59dc93ec1f5d44848a26 I chose not to store a location for |
Yeah, that makes sense. |
@rainers @Geod24 I disagree. The most important thing is, as Iain mention as well, to have start and end location of all AST nodes. From the first byte to the last byte, not from some arbitrary location in the middle of the node. Otherwise source code refactoring tools won't work. I imagine that if you then want the location of the identifier, or any other part of an AST node, you would re-lex the range that the AST node occupies in the source buffer/file and look at the location of each individual token. It's also possible to add location to the Or is it better to have a concrete syntax tree instead to build refactoring tools on? |
I'd discourage that because identifiers are stored in a pool to prevent duplication. An IdentifierExp on the other hand...
Pooling locations would be the primer to that. |
Source range information is valuable, too, but before you can even think about refactoring, you have to make sense of all the identifiers within the code range.
I don't think that's a feasible approach. You always have to interpret code in context, that's what the full frontend is needed for.
As Iain mentioned, it's not possible for the existing Identifier class, but I've replaced Identifier references with a IdentifierAtLoc struct almost everywhere else. For the DVCondition I just filled the unused loc member, but I'm also fine with making it a separate identifier location member and reserving the existing loc for the full range of the condition. |
You already have the context. It's just that the location information has been lost. |
The same identifier can have multliple very different meanings in a very short code snippet, e.g. |
We already have the AST nodes and the semantic analyzer to figure out the meaning of these identifiers. I'm saying that after those steps, after lexing, parsing and running semantic analysis one should be able to extract the tokens from the AST nodes. One way to do that, since the original tokens are not store in the AST nodes, is to re-lex the source range that the AST node occupies. Either one needs to be build a concrete syntax tree or use the approach I mentioned above. Or do you have other suggestions how to implement source code refactoring? |
I haven't implemented any refactoring yet, or even thought about it much. Providing good introspection and completion is already tough ;-) I agree the source range can probably help, but might not even be good enough (e.g. when comments before/after the range need to be kept). Also beware of lowerings during the semantic phase that can make the connection to the original parsed syntax tree difficult. I suspect that you have to look at the requirements for each type of refactoring, e.g. for renaming a symbol, the identifier locations are probably good enough. Others might need to work on the token level, and lexing the range again can help. BTW: the C/C++-header/idl conversion tool used when building Visual D keeps the range of the AST nodes as pointers to a list of tokens with whitespace and comments attached to the token. |
Yes, I know 😃.
The lexer can be configured to produce tokens for comments. Not sure if that's enough for the scenario you have in mind.
Yes, I know. Unfortunately the semantic analyzer is destructive. I already had to deal with this problem in one of my tools. In that case I could iterate the AST twice, once just after parsing (to collect the data I needed) and once after semantic analysis. Then more or less compare the data before and after the semantic analysis.
Yes, most likely. As far as I see. If you have an AST node representing a version condition, the full source range of the whole node and want to rename the identifier used in the version condition. You can re-lex the source range, pick the first identifier token, get the location of that token and then just modify the source buffer. Should be enough for this use case at least. Since you're starting out with a known AST node and deriving the tokens from that, you have more knowledge of the tokens than during the initial lexing in the beginning.
Almost sounds like a concrete syntax tree. |
Some symbols still lack location information. This is an important feature when using dmd as a library, together with the file offset and end location allowing, for example, code refactoring.