-
Notifications
You must be signed in to change notification settings - Fork 1.3k
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTSDBv2.0.0 : java.lang.StackOverflowError: null #334
Comments
Ended up increasing stack size in |
Likely need to re-work the query code so that it isn't overflowing the stack. Users should never have to change the stack size. |
What part of the query code causes the stack overflow? I'm also getting this exception, and would like to work on a patch. |
…that we can continue in a different thread. It has a thread pool for now but I'd like to eventually share that with Netty. This avoids the stack overflow problem in OpenTSDB#334 where the callback chain simply grew too long.
I happened to discuss this issue with @manolama today, I'm not sure I agree with the fix and the analysis being done here. Can you post a longer section of the stack trace, to understand why there was a |
hello, i'm experiencing the same issue, the version is: OpenTSDB version [2.0.0] built from revision 14fd1b1 in a MINT state here is my stack:
and so on |
Thanks, this helps. So we see that we have a long chain of Let me dig some more in the code... |
i've fixed increasing JVMARGS as @dennismphil |
Curiously I upped my stack size and that didn't address the issue. Still getting a StackOverflow, with exactly 1024 Deferred entries in the chain. |
Additional curiosity: Bumping Xss didn't help me, but bumping VMThreadStackSize to 4096 did result in an unexpected change. It still threw a StackOverflow, but there are now 2048 Deferred entries in the StackOverflow error output instead of 1024. |
Also running into this in production with 2.1.0 final. Any new info/workarounds? |
I chatted with @manolama about this in IRC. My issue is that I'm making a query which is fetching a tonne of rows and it will always hit the stackoverflow. For my case, he recommended I try out this patch: manolama@c591503 Once I get that built and give it a try I'll provide an update. |
I suspect this cluster might have an HBase issue, but it's hard to see from the OpenTSDB log itself. Running a fsck now and we get a couple of the following per second. The HBase cluster is busy, but not that busy.
|
Still investigating, but it seems that our issues were caused by some queries of certain metrics (potentially over invalid or future time ranges) which were failing, causing valid queries (maybe in the same RPC batch?) to hang forever too. |
…that we can continue in a different thread. It has a thread pool for now but I'd like to eventually share that with Netty. This avoids the stack overflow problem in OpenTSDB#334 where the callback chain simply grew too long.
…that we can continue in a different thread. It has a thread pool for now but I'd like to eventually share that with Netty. This avoids the stack overflow problem in OpenTSDB#334 where the callback chain simply grew too long.
…that we can continue in a different thread. It has a thread pool for now but I'd like to eventually share that with Netty. This avoids the stack overflow problem in OpenTSDB#334 where the callback chain simply grew too long.
Hi, Many Thanks, |
Hi, Exception in thread "pool-7-thread-1" java.lang.OutOfMemoryError: Java heap space In JHat we noticed many-many CallBack objects. Please advise, |
Hi all, From OpenTSDB log: Someone can help us? |
I'm also having this issue, same logs as those given above. Running OpenTSDB 2.1 on CentOS 6 with an HDP 2.0 cluster. The Hbase cluster looks completely fine. |
Hello guys, do you have any news on that problem ? |
Finally I change the JVM args to prevent from this problem. JVMARGS ... -Xss16m ... |
Already done for me (-Xss100m), and the problem is still present. |
Found once case of this happening with 2.0 where too many time series in the output (i.e. 16k or more) will throw this SOFE. https://github.com/OpenTSDB/opentsdb/blob/master/src/tsd/HttpJsonSerializer.java#L842-L850 |
We are hitting this issue on 2.2. could you please update on when this will be fixed. thanks |
We are using collectd with a "tag" patch that allows adding tags to any metric. Our stack failure occurs in a case where the tag value set is huge (30k) for a one tag and a query is requested for a day or two of data. |
same issue on v2.3.0. 2017-11-20 15:21:07,485 ERROR [AsyncHBase I/O Worker #9] RegionClient: Unexpected exception from downstream on [id: 0x8d650fd6, /10.108.2.18:46612 => /10.105.39.139:60020]
java.lang.StackOverflowError: null
at net.opentsdb.core.Span.seekRow(Span.java:365) ~[tsdb-2.3.0.jar:]
at net.opentsdb.core.Span.access$100(Span.java:36) ~[tsdb-2.3.0.jar:]
at net.opentsdb.core.Span$Iterator.seek(Span.java:444) ~[tsdb-2.3.0.jar:]
at net.opentsdb.core.AggregationIterator.<init>(AggregationIterator.java:363) ~[tsdb-2.3.0.jar:]
at net.opentsdb.core.AggregationIterator.create(AggregationIterator.java:325) ~[tsdb-2.3.0.jar:]
at net.opentsdb.core.SpanGroup.iterator(SpanGroup.java:487) ~[tsdb-2.3.0.jar:]
at net.opentsdb.core.SpanGroup.iterator(SpanGroup.java:54) ~[tsdb-2.3.0.jar:]
at net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver$WriteToBuffer.call(HttpJsonSerializer.java:787) ~[tsdb-2.3.0.jar:]
at net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver$WriteToBuffer.call(HttpJsonSerializer.java:671) ~[tsdb-2.3.0.jar:]
at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.addCallbacks(Deferred.java:688) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.addCallback(Deferred.java:724) ~[async-1.4.0.jar:na]
at net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver.call(HttpJsonSerializer.java:860) ~[tsdb-2.3.0.jar:]
at net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver.call(HttpJsonSerializer.java:625) ~[tsdb-2.3.0.jar:]
at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1313) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.doCall(Deferred.java:1284) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1313) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.doCall(Deferred.java:1284) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1313) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred.doCall(Deferred.java:1284) ~[async-1.4.0.jar:na] |
@opsun the same with you |
FWIW, I can confirm @todd-richmond 's finding above (very high tagv cardinality, in my case almost 1Mio) - querying hours worked, but not days. I restructured the data using "Shift to Metric" as discussed here: http://opentsdb.net/docs/build/html/user_guide/writing/index.html#time-series-cardinality (losing the aggregation in this dimension is no issue for me) ... and it works lightning-fast again. HTH |
Yeah this is better in 3.x but still has an issue with the UID resolution if all of the UIDs are in cache. I'm re-working the pipeline one more time in 3.x and that'll clean it up. |
Hi, @manolama . I tried the latest version in 3.x and still failed. |
@manolama After adding the |
Using OpenTSDB version [2.0.0] - (Modified Const.java MAX_NUM_TAGS = 16 - if that matters)
From OpenTSDB logs:
OpenTSDB becomes unresponsive eventually until a
service opentsdb restart
The text was updated successfully, but these errors were encountered: