Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

On EndPosition-s again #142

Open
vmarkovtsev opened this issue Jan 5, 2018 · 12 comments
Open

On EndPosition-s again #142

vmarkovtsev opened this issue Jan 5, 2018 · 12 comments

Comments

@vmarkovtsev
Copy link

For example, function save() in https://github.com/src-d/jgscm/blob/master/jgscm/__init__.py has StartPosition but does not have EndPosition.

Fact: I need EndPosition otherwise Hercules is not able to build the function->line mapping.

I think of two ways of getting the ending line:

  1. Take StartPosition of the next node
  2. Take the biggest StartPosition of all the children

Which way is better in your opinion? Is there a third way?

@vmarkovtsev
Copy link
Author

I worked around this following (2). https://github.com/src-d/hercules/blob/master/shotness.go#L229

So Python driver behaves in three different ways:

  1. EndPosition is present and is valid - rare, but happens
  2. EndPosition is present but the line equals to StartPosition
  3. EndPosition is null

@juanjux
Copy link
Contributor

juanjux commented Jan 8, 2018

After a certain meeting with the CTO it was decided that at the current stage we would stop to do time expensive and hard to maintain position workarounds to correct missing positions from the native driver. Actually, before that was decided the function position was calculated as you told, with the position of the last element of the subtree (so the code for that specific case is in the history and could be recovered if its decided that we provide the EndPosition for functions).

One exception that was done for the Python driver, which doesn't provide end positions at all, was to keep a pass we do with the tokenizer to synchronize the ending positions of nodes with tokens. Actually in the current bblfsh specification the End Position is the position where the node Token ends:

"End position, if present in a token node, is the position of the last character of the token in the original source code." So this is why most End Positions are in the same line. The ones that doesn't have and EndPosition is because the tokenizer doesn't provide one.

@juanjux
Copy link
Contributor

juanjux commented Jan 8, 2018

PS: If you want to check what elements have or have not positions to see what you would need without going mad with complex UASTs the best way is to check each driver integration tests which has the source code, native AST, and UAST fixtures:

https://github.com/bblfsh/python-driver/tree/master/fixtures

@vmarkovtsev
Copy link
Author

Wow, this means my Java driver may be outdated because it returns correct EndPositions. Thanks for the consultation! This means that I cannot rely on EndPosition at all and always calculate it myself, doesn't it?

@juanjux
Copy link
Contributor

juanjux commented Jan 8, 2018

The Java native AST provides EndPositions, unlike the Python one. No need for snarky remarks.

@vmarkovtsev
Copy link
Author

No-no, not being snarky at all. I am really asking about Java driver because I don't want to break after the next release and I want to understand the situation better. And really being grateful for your help :)

So seriously, should I rather remove the usage of EndPosition-s from my code completely and always calculate them? As you cited in the docs, I should...

@juanjux
Copy link
Contributor

juanjux commented Jan 8, 2018

I don't think that's the best option. If you ask me, we should probably check what EndPositions you need more urgently for Hercules and readd the code for them if it's manageable (for example, functions, classes and other nodes with subtrees were not very problematic, other nodes were more hard to fix) but I'm not the one calling the shots.

@vmarkovtsev
Copy link
Author

OK, I got it. Not changing then. Let's return to this on the SoS/retrospective/any other meeting that we should have. Not closing till the final decision is made.

@vmarkovtsev
Copy link
Author

Thanks again!

@abeaumont
Copy link
Contributor

We currently provide the positions that the native driver provides and it was decided that it was not worth the effort to change that per driver for the time being. This could be done in a language agnostic way by the user that needs it, so it's something we won't provide for now. @mcuadros can confirm if that's still the case.

I think it's best to reuse those positions that are already provided by the driver and recalculate only when that's not the case, but that's up to you, of course.

@vmarkovtsev
Copy link
Author

I have just thought: maybe we should return some meta information in the parse response which tells if I can or cannot rely on the EndPosition-s? That would simplify and strengthen the client code.

@vmarkovtsev
Copy link
Author

@creachadair To give you some context.

The current Python driver does not provide accurate positions for each node, which makes it unusable for e.g. style-analyzer. @juanjux proposed switching to jedi a long time ago.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants