New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak in PredictionContextCache #499
Comments
After a bit more research...this issue is caused by two variables that are declared static in the generated java parser. Changing these to instance variables (and moving the initializer of "decisionToDFA" to the constructor) has solved my issue.
|
That change would result in an enormous (negative) performance impact when parsing large numbers of files. If you don't want to use the shared MyParser parser = new MyParser(tokens);
DFA[] decisionToDFA = parser.getInterpreter().decisionToDFA;
parser.setInterpreter(new ParserATNSimulator(
parser, parser.getATN(), decisionToDFA, new PredictionContextCache())); |
Sam, should we make a function that wipes the DFA cache? Otherwise they grow forever in a long running server. It'll simply adapt to new input. |
I'm planning to modify the caching mechanisms in a later release. Changing it now would just increase the chances that the future change will break users applications. Considering that a workaround already exists for server applications and such, I don't think there's any need to do anything else at this point. |
Given that I can provide my own implementation of the cache, would it cause a problem if items disappear from the cache? In other words, what happens when get() returns null? Can I write a SoftReference cache? |
I tried your suggested workaround, however, I'm still getting an out of memory exception. I haven't been able to track exactly why, but I think it has something to do with the DFA array ("decisionToDFA"). The problem with the PredictionContextCache goes away when using my own, but something else is still running free causing the DFA array to grow. I'm going to have to stick with making those variables instance rather than static for the time being. I may lose performance, but I won't crash. Please consider re-opening this issue unless there's something else with your workaround that I'm missing. |
The antlr team have plans in the future to enhance the caching mechanism. In the meantime, this workaround will clean the cache for each instantiation. There may be performance implications, but certainly better than crashing. antlr/antlr4#499
@danielsun1106 It would be difficult to use a cache eviction policy for The use of |
IMO, the increased memory usage because of the use of SoftReference could be accepted than the OutOfMemoryError :) @sharwell |
@danielsun1106 There can easily be hundreds of thousands or even millions of The For reference, are you using the reference release or my fork for your evaluation? The most substantial work in my fork is on memory reduction. It's not uncommon to see 100:1 ratios in memory requirements between the two for large grammars. In many parsing scenarios the memory requirements are so small it doesn't make much difference, but it sounds like you may fall into the group of known exceptions. |
@sharwell When we try to apply two-stage parsing strategy, the reference release hangs but the optimized release is quite efficient, which is very impressive to me. It's pity that the optimized release have to deserialize the ATN string by call the following code to avoid OutOfMemoryError. As a result, the performance of optimized release(applied two-stage parsing strategy) is almost same with the reference release(not applied two-stage parsing strategy). I wish optimized release would provide some constructor like the following ones in the future, then we could create decisionToDFA array from the deserialized ATN and manage the public ParserATNSimulator(Parser parser, ATN atn, DFA[] decisionToDFA, PredictionContextCache sharedContextCache)
public LexerATNSimulator(Lexer recog, ATN atn, DFA[] decisionToDFA, PredictionContextCache sharedContextCache) As of the usage of memory, how about wrapping the DFA with a DfaWrapper. The count of DFA is not very many(about /**
* The rationale of DfaWrapper is to use SoftReference to avoid DFA cache growing forever.
* If DFA instance is GCed, recreate one when it is needed
*/
class DfaWrapper extends DFA {
private volatile SoftReference<DFA> dfaSR
private ATN atn;
private int decision;
public DfaWrapper(ATN atn, int decision) {
this.dfaSR = new SoftReference(atn.getDecisionState(decision), decision);
this.atn = atn;
this.decision = decision;
}
public DFA getDFA() {
DFA dfa = dfaSR.get();
if (null != dfa) return dfa;
synchronized (this) {
if (null == dfaSR.get()) {
dfaSR = new SoftReference(atn.getDecisionState(decision), decision);
}
return dfaSR.get()
}
}
// deletate DfaWrapper' methods to DFA's methods
public List<DFAState> getStates() {
return this.getDFA().getStates();
}
// ...
}
// in the generated GroovyParser
// create the _decisionToDFA
static {
_decisionToDFA = new DfaWrapper[_ATN.getNumberOfDecisions()];
for (int i = 0; i < _ATN.getNumberOfDecisions(); i++) {
_decisionToDFA[i] = new DfaWrapper(_ATN.getDecisionState(i), i);
}
} ps: The brand new groovy parser's repository is hosted at https://github.com/danielsun1106/groovy-parser |
@parrt @sharwell public class XXXParser extends Parser {
static { RuntimeMetaData.checkVersion("4.5.3", RuntimeMetaData.VERSION); }
protected static final DFA[] _decisionToDFA;
protected static final PredictionContextCache _sharedContextCache =
new PredictionContextCache();
...
} I think switching to WeakHashMap (with Dummy value) or Guava WeakInterner will solve the memory issue. Although they can introduce a little bit of memory overhead because of the Thoughts? |
@hsyuan well we can't really make these DFA states weak refs I don't think and would be a huge change, which I can't consider at this time. |
BTW, I'm assuming you know to clear the cache? |
Thank you for your quick response, @parrt.
I am not sure if I understand your question correctly. Clearing the cache is the last thing I'd like to do, because that may impact the performance.
By 'huge change', do you mean it is a breaking change, not because the amount of code to be changed? I see |
I'd graph the memory growth and clear if it gets close to too big. not sure why it's growing forever but it could with huge grammar (SQL?) and long running server. Huge change means possibly breaking, possibly slower, and too much for me to consider doing. |
Correct. Tons of huge SQL queries and long running server. |
Looks like Presto faces similar issue: trinodb/trino#3186 |
We have long running servers with large queries too.
Do we consider changing the DFA array to a different structure? Also allowing flexible cache/structure for DFA states? So that we could try to plugin external cache(like redis) to help? |
@xuanziranhan Typically when the DFA is getting very large, one or more rules in the DFA is requiring long lookahead to make the decision. ANTLR only caches the minimum amount of information necessary to make decisions for the actual inputs seen since the last time the DFA was cleared. Restructuring the rules to reduce lookahead in those cases could also reduce the DFA size. Another option you might try is using my optimized fork of ANTLR. It contains some logic to reduce the size of the cached DFA for some common cases we've seen. It often doesn't matter in practice, but for the edge cases where a grammar happens to fall into a specific pattern which is very bad in the reference version of ANTLR but closely matches the optimizations in my fork, you could see an order of magnitude working set reduction or better. |
After using an antlr generated parser to process several files, the JVM eventually throws an out of memory exception. Analyzing the heap show the PredictionContextCache containing a HashMap with over 1 million items. There needs to be some (hopefully thread safe) way to clear this cache between runs.
https://github.com/antlr/antlr4/blob/master/runtime/Java/src/org/antlr/v4/runtime/atn/PredictionContextCache.java#L40-66
The text was updated successfully, but these errors were encountered: