Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

Node positions not being according with specs #30

Closed
dpordomingo opened this issue Jun 27, 2017 · 10 comments
Closed

Node positions not being according with specs #30

dpordomingo opened this issue Jun 27, 2017 · 10 comments
Assignees
Labels

Comments

@dpordomingo
Copy link
Member

dpordomingo commented Jun 27, 2017

It is not retrieved start_position.offset field nor end_position as documented by https://doc.bblf.sh/uast/specification.html
All token positions on the same line has the same start_position

[...] it is guaranteed that nodes in a UAST either have no position attached or they have a position with valid offset, line and col. [...]
End position, if present in a token node, is the position of the last character of the token in the original source code.

pasted image at 2017_06_27 08_58 am

example:

import sys
sys.stdout.write("Hello world!\n")

client: client-python
It is retrieved the same positions for second line sys and stdout Nodes

start_position {
    line: 2
    col: 1
}
@abeaumont
Copy link
Contributor

abeaumont commented Jun 27, 2017

As commented in bblfsh/java-driver#27, it's probably a serialization problem.

@abeaumont abeaumont self-assigned this Jun 27, 2017
@abeaumont abeaumont added the bug label Jun 27, 2017
@juanjux
Copy link
Contributor

juanjux commented Jun 27, 2017

Unfortunately, in this case it's given like that from the Python native AST of attributed calls:

from ast import parse
root = parse("sys.stdout.write()")
stdout_node = root.body[0].value.func.value
sys_node = root.body[0].value.func.value.value
# wont fail:
assert(stdout_node.col_offset == sys_node.col_offset)

So the issue is really a Python one.

One possible solution for this case would be to extend the synchronized tokenizer that I use for adding the comments and noops to the UAST to also update the correct positions. But that would be a workaround for an external issue, so I'm changing this to feature.

@juanjux juanjux added enhancement and removed bug labels Jun 27, 2017
@juanjux
Copy link
Contributor

juanjux commented Jun 27, 2017

Ok, so it's a bug.

@juanjux juanjux added bug and removed enhancement labels Jun 27, 2017
@juanjux
Copy link
Contributor

juanjux commented Jun 27, 2017

Looks like their concept of col_offset for attributes is a little different than ours:

https://bugs.python.org/issue21295

ast.Attribute node actually means "the atribute of something", ie. the node includes this "something" as subnode

So the best solution will probably be to extend the sinchronized tokenizer for this (and maybe we can also get the end_position too that way, trough I'm not sure if that's possible without reimplementing a full python parser).

@smola
Copy link
Member

smola commented Jun 28, 2017

See my comment here for clarification on end position, as well as possible fix for missing offset (FillOffsetFromLineCol).
bblfsh/java-driver#27 (comment)

@juanjux
Copy link
Contributor

juanjux commented Jun 28, 2017

@smola, will try it, thanks.

@abeaumont abeaumont assigned juanjux and unassigned abeaumont Jun 28, 2017
@juanjux
Copy link
Contributor

juanjux commented Jul 3, 2017

Starting to work on this. It'll probably take a while so I'll keep this subtask list updated:

  • Use the SDK, if possible, to calculate the offset.
  • Fix newly discovered issue Some nodes have the Column location at 0 #35 so the above really works.
  • Modify the synchronized tokenizer to also add (or correct) the startposition of nodes.

@juanjux
Copy link
Contributor

juanjux commented Jul 4, 2017

Since after the checking the specs endposition is not mandatory I'll open a feature request for it and remove it from this bug task list.

@juanjux
Copy link
Contributor

juanjux commented Jul 6, 2017

PR #37 should fix two of the three problems related with this issue (offsets at 0 and some columns at 0 too). Next PR should fix the remaining issue (questionable positions given by the Python native AST module).

@juanjux
Copy link
Contributor

juanjux commented Jul 12, 2017

PR bblfsh/python-pydetector/pull/9 should fix the third problem when merged so this could be closed. I've also seen an option for implementing #33 for most, if not all, Python nodes; I'll work on that soon.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants