[feature] Include the information about full tokens #291

vmarkovtsev · 2018-08-14T09:40:13Z

Strings, comments, etc. have their Token set to the inner value of the token. E.g. in Python
"hello" has Token hello (no quotes). This is all good and logical.

However, we discard information about the real token - quote characters, comment characters, etc.
It is needed to reproduce the original source code from a UAST. I have two possible solution proposals:

Add "FullToken" for those nodes which need it.
Add "TokenPrefix" and "TokenSuffix".

The text was updated successfully, but these errors were encountered:

juanjux · 2018-08-14T09:57:10Z

Comments should have the character used, prefix and suffix in the semantic UAST "Comment" object. For strings, at least in the Python and Ruby drivers, unfortunately the native AST doesn't provide the string type so this won't be possible for all drivers unless we parse the source code ourselves.

I'll leave this open just in case we find a workable solution in the future.

vmarkovtsev · 2018-08-14T10:07:42Z

The current workaround is simple: I look at the difference between file_contents[start_position.offset:end_position.offset] and Token and record prefixes and suffixes.

dennwc · 2018-08-14T10:15:48Z

Token as a concept won't work in the long run, so I think we should provide a helper that selects a source file content based on positions of nodes, as @vmarkovtsev mentioned.

For example, what is the token of do ... while? This will get more and more complex once we start working with semantic concepts for classes.

juanjux · 2018-08-14T10:18:23Z

They work pretty well... for identifiers and literals. For statements and reserved words, as you proved, they're problematic (same happens with "from x import y" in Python which is a single node with children).

Maybe we should make a distinction between a token and a representation.

dennwc · 2018-08-14T10:22:54Z

The token is something that exists in the source code, Egor mentioned a few times that he expects tokens to be valid for all node types, which cannot be the case with the current model.

I would rather go with semantic concepts, so Comments have text, prefix, etc and String (literal) has a value and quotes. Tokens can be provided with positional info. Since UAST v2 allows more than 2 positional fields, we can define few more to represent start/end positions of different keywords in the statement.

juanjux · 2018-08-14T10:27:14Z

Even with semantic objects it would be nice to keep the concept either as a single unified name or as some kind of field metadata so XPath queries doesn't have to match every semantic object to retrieve a different field in each which happens now as @smacker said the other day.

juanjux added the enhancement label Aug 14, 2018

dennwc self-assigned this Aug 14, 2018

dennwc mentioned this issue Oct 17, 2018

Proposal: remove @token in favor of positional info #318

Open

EgorBu mentioned this issue Jan 3, 2019

Umbrella issue for ML team bblfsh/bblfshd#231

Closed

36 tasks

bzz mentioned this issue Apr 9, 2019

Fix other escape strings from JS bblfsh/javascript-driver#81

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Include the information about full tokens #291

[feature] Include the information about full tokens #291

vmarkovtsev commented Aug 14, 2018

juanjux commented Aug 14, 2018 •

edited

Loading

vmarkovtsev commented Aug 14, 2018 •

edited

Loading

dennwc commented Aug 14, 2018

juanjux commented Aug 14, 2018

dennwc commented Aug 14, 2018

juanjux commented Aug 14, 2018 •

edited

Loading

[feature] Include the information about full tokens #291

[feature] Include the information about full tokens #291

Comments

vmarkovtsev commented Aug 14, 2018

juanjux commented Aug 14, 2018 • edited Loading

vmarkovtsev commented Aug 14, 2018 • edited Loading

dennwc commented Aug 14, 2018

juanjux commented Aug 14, 2018

dennwc commented Aug 14, 2018

juanjux commented Aug 14, 2018 • edited Loading

juanjux commented Aug 14, 2018 •

edited

Loading

vmarkovtsev commented Aug 14, 2018 •

edited

Loading

juanjux commented Aug 14, 2018 •

edited

Loading