don't unnecessary fetch all tokens when grabbing an interval of text … #2532

WalterCouto · 2019-04-18T00:13:46Z

…(match C# logic)

When an exception is thrown, it calls getText to get the text of the tokens invloved but the current C++ and Java implemention first fetches the rest of the tokens instead of just using the tokens it already has. C#'s version is correct (fill() is called only when asking for the entire stream's text).

…(match C# logic) When an exception is thrown, it calls getText to get the text of the tokens invloved but the current C++ and Java implemention first fetches the rest of the tokens instead of just using the tokens it already has. C#'s version is correct (fill() is called only when asking for the entire stream's text).

mike-lischke · 2019-12-09T07:56:23Z

@parrt This patch can be merged.

parrt · 2019-12-09T17:11:52Z

Hmm...i'm a bit nervous about changing getTest() (at least for Java). not sure what ramifications are for fetching text normally. maybe slower? maybe different?

mike-lischke · 2019-12-09T18:20:59Z

This is only about filling the token buffer and it mirrors what's already used in C#. The tests succeed so I think it should be save.

parrt · 2019-12-09T19:04:43Z

Doesn't the 2nd file affect the java target?

WalterCouto · 2019-12-09T22:10:37Z

Yes but the call to the overridden method handles the fetching of the tokens for the given interval so this call was duplicating logic. That overriden method always called fill() prior to this change which fetches all tokens instead of fetching only up to the stop index.

We caught this as a performance problem when the exception happened at the beginning of a file and it went to get the text of the token at fault it ended up tokenizing the entire file at that time which is completely unnecessary. I guess the C# implementors caught this issue and resolved it independently of the Java and C++ runtimes which both had this problem.

parrt · 2019-12-09T22:22:48Z

Hmm...yeah, I'm just too scared to change it. can you just override it?

parrt · 2019-12-09T23:58:20Z

@WalterCouto can you back out the Java change? thanks!

WalterCouto · 2019-12-10T00:56:07Z

I could back out of the java change attached to this change, but I don't agree that it should be reversed. The bug is a large performance hit on both Java and C++.

The java bug is what started the performance hit for us; we also use the C++ runtime and it had the same problem. We started using the C# runtime which doesn't have the bug. getText() with no params should "The text of all tokens in the stream." so it should be calling fill() which will fill the tokens to the end of the stream, not doing so is a bug and that is what Java and C++ had.

That bug was not noticed before because of the second bug which is the getText() that takes a range was calling fill() but according to that method, it will retrieve "The text of all tokens lying between the specified start and stop tokens" so it should not have fetched all the tokens to the end of the stream but only up to the stop index using sync()

So the "fear" should be leaving that bug as is, with our parser tests across multiple grammars with large files shows huge performance hit to stop after the first error as it had to tokenize the large file completely to get the token text the error occurred on before return back to us it failed early on. If you were waiting for the full parse, you wouldn't have noticed that early on you took the hit of tokenizing the whole file, but if you are using callbacks you will notice a wait as memory is allocated and lexing occurs for the entire file.

If you still feel like to leave this change out for this PR, I can but the original bug back in and raise an java only PR.

parrt · 2019-12-10T01:41:11Z

Well, if @sharwell has the time to approve, I'm ok with it but I'm too distracted to review such an important issue.

parrt · 2019-12-10T21:31:17Z

Ok, looks like Sam is busy. @WalterCouto thanks for your thoughtful response. Can you make separate PR for me to evaluate in future? thanks!

WalterCouto · 2019-12-11T05:36:56Z

@parrt only the C++ change is part of the PR request now

…(match C# logic) When an exception is thrown, it calls getText to get the text of the tokens invloved but the current Java implemention first fetches the rest of the tokens instead of just using the tokens it already has. C#'s version is correct (fill() is called only when asking for the entire stream's text). C++ runtime had the same bug but was fixed by antlr#2532

mike-lischke approved these changes Oct 14, 2019

View reviewed changes

parrt added the target:cpp label Dec 9, 2019

parrt added this to the 4.8 milestone Dec 9, 2019

Separate bug fix to Java runtimes

f650a0b

WalterCouto mentioned this pull request Dec 11, 2019

Java: don't unnecessary fetch all tokens when grabbing an interval of text (match C# logic) #2699

Open

parrt merged commit 8be6080 into antlr:master Dec 11, 2019

WalterCouto deleted the test2 branch December 12, 2019 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't unnecessary fetch all tokens when grabbing an interval of text … #2532

don't unnecessary fetch all tokens when grabbing an interval of text … #2532

WalterCouto commented Apr 18, 2019

mike-lischke commented Dec 9, 2019

parrt commented Dec 9, 2019

mike-lischke commented Dec 9, 2019 •

edited

Loading

parrt commented Dec 9, 2019

WalterCouto commented Dec 9, 2019

parrt commented Dec 9, 2019

parrt commented Dec 9, 2019

WalterCouto commented Dec 10, 2019

parrt commented Dec 10, 2019

parrt commented Dec 10, 2019

WalterCouto commented Dec 11, 2019 •

edited

Loading

don't unnecessary fetch all tokens when grabbing an interval of text … #2532

don't unnecessary fetch all tokens when grabbing an interval of text … #2532

Conversation

WalterCouto commented Apr 18, 2019

mike-lischke commented Dec 9, 2019

parrt commented Dec 9, 2019

mike-lischke commented Dec 9, 2019 • edited Loading

parrt commented Dec 9, 2019

WalterCouto commented Dec 9, 2019

parrt commented Dec 9, 2019

parrt commented Dec 9, 2019

WalterCouto commented Dec 10, 2019

parrt commented Dec 10, 2019

parrt commented Dec 10, 2019

WalterCouto commented Dec 11, 2019 • edited Loading

mike-lischke commented Dec 9, 2019 •

edited

Loading

WalterCouto commented Dec 11, 2019 •

edited

Loading