Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't unnecessary fetch all tokens when grabbing an interval of text … #2532

Merged
merged 2 commits into from
Dec 11, 2019

Conversation

WalterCouto
Copy link
Contributor

…(match C# logic)

When an exception is thrown, it calls getText to get the text of the tokens invloved but the current C++ and Java implemention first fetches the rest of the tokens instead of just using the tokens it already has. C#'s version is correct (fill() is called only when asking for the entire stream's text).

…(match C# logic)

When an exception is thrown, it calls getText to get the text of the tokens invloved but the current C++ and Java implemention first fetches the rest of the tokens instead of  just using the tokens it already has.  C#'s version is correct (fill() is called only when asking for the entire stream's text).
@mike-lischke
Copy link
Member

@parrt This patch can be merged.

@parrt
Copy link
Member

parrt commented Dec 9, 2019

Hmm...i'm a bit nervous about changing getTest() (at least for Java). not sure what ramifications are for fetching text normally. maybe slower? maybe different?

@mike-lischke
Copy link
Member

mike-lischke commented Dec 9, 2019

This is only about filling the token buffer and it mirrors what's already used in C#. The tests succeed so I think it should be save.

@parrt
Copy link
Member

parrt commented Dec 9, 2019

Doesn't the 2nd file affect the java target?

Screen Shot 2019-12-09 at 11 04 09 AM

@WalterCouto
Copy link
Contributor Author

Yes but the call to the overridden method handles the fetching of the tokens for the given interval so this call was duplicating logic. That overriden method always called fill() prior to this change which fetches all tokens instead of fetching only up to the stop index.

We caught this as a performance problem when the exception happened at the beginning of a file and it went to get the text of the token at fault it ended up tokenizing the entire file at that time which is completely unnecessary. I guess the C# implementors caught this issue and resolved it independently of the Java and C++ runtimes which both had this problem.

@parrt
Copy link
Member

parrt commented Dec 9, 2019

Hmm...yeah, I'm just too scared to change it. can you just override it?

@parrt
Copy link
Member

parrt commented Dec 9, 2019

@WalterCouto can you back out the Java change? thanks!

@parrt parrt added this to the 4.8 milestone Dec 9, 2019
@WalterCouto
Copy link
Contributor Author

I could back out of the java change attached to this change, but I don't agree that it should be reversed. The bug is a large performance hit on both Java and C++.

The java bug is what started the performance hit for us; we also use the C++ runtime and it had the same problem. We started using the C# runtime which doesn't have the bug. getText() with no params should "The text of all tokens in the stream." so it should be calling fill() which will fill the tokens to the end of the stream, not doing so is a bug and that is what Java and C++ had.

That bug was not noticed before because of the second bug which is the getText() that takes a range was calling fill() but according to that method, it will retrieve "The text of all tokens lying between the specified start and stop tokens" so it should not have fetched all the tokens to the end of the stream but only up to the stop index using sync()

So the "fear" should be leaving that bug as is, with our parser tests across multiple grammars with large files shows huge performance hit to stop after the first error as it had to tokenize the large file completely to get the token text the error occurred on before return back to us it failed early on. If you were waiting for the full parse, you wouldn't have noticed that early on you took the hit of tokenizing the whole file, but if you are using callbacks you will notice a wait as memory is allocated and lexing occurs for the entire file.

If you still feel like to leave this change out for this PR, I can but the original bug back in and raise an java only PR.

@parrt
Copy link
Member

parrt commented Dec 10, 2019

Well, if @sharwell has the time to approve, I'm ok with it but I'm too distracted to review such an important issue.

@parrt
Copy link
Member

parrt commented Dec 10, 2019

Ok, looks like Sam is busy. @WalterCouto thanks for your thoughtful response. Can you make separate PR for me to evaluate in future? thanks!

@WalterCouto
Copy link
Contributor Author

WalterCouto commented Dec 11, 2019

@parrt only the C++ change is part of the PR request now

WalterCouto added a commit to WalterCouto/antlr4 that referenced this pull request Dec 11, 2019
…(match C# logic)

When an exception is thrown, it calls getText to get the text of the tokens invloved but the current Java implemention first fetches the rest of the tokens instead of just using the tokens it already has. C#'s version is correct (fill() is called only when asking for the entire stream's text).  C++ runtime had the same bug but was fixed by antlr#2532
@parrt parrt merged commit 8be6080 into antlr:master Dec 11, 2019
@WalterCouto WalterCouto deleted the test2 branch December 12, 2019 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants