New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the _split_prefix public and documented to make comment parsing easy. #53
Comments
There are some issues getting the line number for a comment too in the current system. You'd think I can just subtract 1 from the node I have but this didn't work for me. It seems the end line node a the very end of a file is different. |
I absolutely agree with you that a class would be nice. It's just not really possible to write a proper parser with all the comment semantics. Or if you write one you will end up with a very weird syntax file. My solution for this problem is an innofficial API: It will give you positions of all comments/whitespace/other special symbols before a leaf. Let me know what you think. Otherwise I'm very happy to discuss ideas how we could make such an API usable for a bigger amount of people. Maybe a |
I don't understand what you mean... is the parsing of comments incorrect in parso today?
Ah, well that seems like exactly what I need. Thanks!
That would probably do yea. It's a bit awkward because you go past the point of the comment and then kind of backtrack so traversing the tree is now not in increasing position. But I don't know if that actually matters, it just feels a bit weird :P |
Yep, I just now had time to try out |
Maybe this API should just be stabilized and documented and then this ticket can be closed that way? |
Yeah probably. I will think about that.
It's definitely awkward. But I don't have any idea how that could be done better. The comment nodes would just be annoying, because they would be everywhere (e.g. between a multiplication). This is definitely not what you want. So there's really no other way than parsing them later. If you have better ideas about the API (maybe comments on the node that contains them? or the leaf before), I'm happy to hear them. |
Yea ok, that is not what I want indeed!
Having them on a node that is on the same line as the comment is would make it a lot less confusing at least! And if there was a |
I can definitely see this. Maybe we just need to add a property like However if we try to generalize this, it's a bit more complicated. What about comments on lines without nodes? These would then also be after the node. The problem we end up with is where do we put comments at the beginning of the file i.e. before any node. Thanks for the ideas, though. |
Yea, a public and documented API is enough for now I think. I managed to get it working with the existing API so it seems fine. |
I also noticed that FStringStart doesn't have |
You are absolutely right. Changed in b5d8175. FStringStart actually just had the wrong classes. Sorry for that. I have currently not the time to make |
I'm moving Considering that
Could you outline complicated cases where it won't work maybe? It will at least help a lot, when I'd work on edgecases for UPD. |
I can understand your concern, but I guarantee you that it will be a nightmare. Try creating just the grammar for that. That's not fun. It's actually possible with parso, just modify the grammar that already exists and modify the tokenizer. Working with a CST (concrete syntax tree) that has comments everwhere is also just really really horrible. So that actually leaves us with much better tooling than we have now.
(1) makes kind of sense, because it's the only way how we can access comments in the beginning. (2) is something we could improve, we could just return comments. If you have good ideas for an API for comments I'm happy to discuss. |
What about instead of modifying grammars, just do
So,
will be parsed into
If there's no
parsed as
All other stuff, like form-feeds and whitespaces would stay in the prefixes of the corresponding objects. |
While this would be possible, I'm strongly opposed to it because it modifies the syntax tree in ways that are very random in some cases (like comment leaves somewhere in parentheses, etc. I feel like even in this case there's so many edge cases that nobody except you would be happy with. The reason you would be happy with is probably, because you want to work only with imports. If you only work with imports, it's not nearly as bad where comments can appear (except maybe when you use stuff like:
In this case your comments would just be in some crazy places. |
Ok, you're right. I think I just don't have enough perspective of how parso is used in the wild. Anyway, |
Ok, last attempt, won't pursue it further.
Goals:
Currently, there are two unobvious things for me:
all comments will be in prefix of first leaf of func scope.
comment will be in the first leaf of func scope. In addition, nodes in the
So, what makes sense to me is to add
|
@davidhalter Would you approve
if I submit a PR? Everything else is not critical for me (isort), |
No I don't approve that. I feel like I want a solution that also works for all cases and is not just a band-aid. Comments are not semantically relevant for tree nodes. For something like that the parser has to significantly change (as well as the tokenizer). |
I'm closing this one, because having done a lot of work on refactoring now in Jedi, I don't think this is needed for now. So people should just implement their own "split_prefix" logic and work with comments. |
Comments being baked into the prefix member of nodes is unintuitive and awkward to use for parsing I think. It would make much more sense to have a Comment class.
(This has come up while I'm working on mutmut).
The text was updated successfully, but these errors were encountered: