Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Tweak to parser skipped_idx + PEP8 cleanup #435
One part of the cleanup mentioned in PR #419. I implemented the
def _recombine_skipped_queue(tokens, skipped_idxs): """ >>> tokens = ["foo", " ", "bar", " ", "19June2000", "baz"] >>> skipped_idxs = set([0, 1, 2, 5]) >>> _recombine_skipped(tokens, skipped_idxs) ["foo bar", "baz"] """ # This groups consecutive values skipped_tokens =  idx_queue =  for idx in skipped_idxs: if idx_queue and idx - 1 == [-1]: if len(idx_queue) and idx - 1 != idx_queue[-1]: skipped_tokens.append(''.join(map(tokens.__getitem__, idx_queue))) idx_queue =  idx_queue.append(idx) if idx_queue: skipped_tokens.append(''.join(map(tokens.__getitem__, idx_queue))) return skipped_tokens
I think we should keep
Despite the fact that this is not part of the Python spec, it seems that it's been implemented in Python 2.7 and 3.6 as well as pypy2 and pypy3. Using a loop that randomly generates token strings and skipped indices to test this, here are some profiling results:
The code to run this can be found here.