Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

untokenize returns all tokens up to a given one #4

Closed
jrmoserbaltimore opened this issue Apr 3, 2021 · 1 comment
Closed

untokenize returns all tokens up to a given one #4

jrmoserbaltimore opened this issue Apr 3, 2021 · 1 comment

Comments

@jrmoserbaltimore
Copy link

The following code fragment in useful context:

t = tokens[4]
print(token_utils.untokenize([t]))

will print t.tokens[0:4] for some reason.

It is unclear to me precisely how the thing works, as it can untokenize the array tokens, while passing a list containing only one token from tokens will untokenize all elements of tokens up to that point. When iterating over tokens, untokenize seems to emit nothing if given tokens[n] where n hasn't been iterated yet…or something.

I have no idea how this thing behaves. It's doing nothing logical.

@aroberge
Copy link
Owner

aroberge commented Apr 3, 2021

token_utils is designed to be able to 1) recreate exactly the original source from a list of tokens; 2) easily allow the substitution of the string content of one or more tokens by other strings, recreating a transformed source with all the previous spacing between previous tokens as they were in the original source.

Single tokens are objects that carry information including the entire content of the line where they came from. The untokenize function makes use of this content to know where to insert the string content of a given token, thus preserving the spacing between tokens.

For token_utils, untokenizing and printing a single token amounts to simply doing print(token.string).

Alternatively, to untokenize a single token as index 4, like you do, an overcomplicated way of doing it would be:

new_tokens = []
for index, tok in tokens:
    if index == 4:
        new_tokens.append(tok)
    else:
        new_tokens.append('')
print(token_utils.untokenize(new_tokens))

However, instead of using token_utils, Python's own tokenize module would likely be a more appropriate way of doing such an operation.

@aroberge aroberge closed this as completed Apr 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants