OpenTSDBv2.0.0 : java.lang.StackOverflowError: null #334

dennismphil · 2014-05-12T18:19:03Z

Using OpenTSDB version [2.0.0] - (Modified Const.java MAX_NUM_TAGS = 16 - if that matters)

From OpenTSDB logs:

10:52:52.111 INFO  [TsdbQuery.call] - TsdbQuery(start_time=1084632741867, end_time=1398916800000, metric=[0, 0, 1] (usage), tags={}, rate=false, aggregator=sum, group_bys=(app [0, 0, 2], user_id [0, 0, 6], )) matched 2120235 rows in 494634 spans in 9464ms
10:52:55.847 ERROR [RegionClient.exceptionCaught] - Unexpected exception from downstream on [id: 0x7f739d1e, /127.0.0.1:37253 => /127.0.0.1:60020]
java.lang.StackOverflowError: null
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
...
...

OpenTSDB becomes unresponsive eventually until a service opentsdb restart

The text was updated successfully, but these errors were encountered:

dennismphil · 2014-05-14T19:15:05Z

Ended up increasing stack size in JVMARGS /usr/share/opentsdb/bin/tsdb

manolama · 2014-06-03T16:30:18Z

Likely need to re-work the query code so that it isn't overflowing the stack. Users should never have to change the stack size.

also · 2014-07-23T00:03:45Z

What part of the query code causes the stack overflow? I'm also getting this exception, and would like to work on a patch.

…that we can continue in a different thread. It has a thread pool for now but I'd like to eventually share that with Netty. This avoids the stack overflow problem in OpenTSDB#334 where the callback chain simply grew too long.

tsuna · 2015-02-13T00:43:02Z

I happened to discuss this issue with @manolama today, I'm not sure I agree with the fix and the analysis being done here. Can you post a longer section of the stack trace, to understand why there was a StackOverflowError?

scicco · 2015-02-13T13:58:24Z

hello, i'm experiencing the same issue, the version is:

OpenTSDB version [2.0.0] built from revision 14fd1b1 in a MINT state
Built on Wed Mar 05 18:55:53 GMT+100 2014 by ..

here is my stack:

2015-02-13 14:30:32,706 ERROR [New I/O worker #32] RegionClient: Unexpected exception from downstream on [id: 0x282632ab, /192.168.0.113:37836 => /192.168.0.113:60020]
java.lang.StackOverflowError: null
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1366) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[suasync-1.4.0.jar:fe17b98]
    at com.stumbleupon.async.Deferred.access$300(Deferred.java:430) ~[suasync-1.4.0.jar:fe17b98]

and so on

tsuna · 2015-02-13T16:40:03Z

Thanks, this helps. So we see that we have a long chain of Deferreds strung together and they all seem to be firing immediately. This means it's very likely the bug is in AsyncHBase, and the right fix is probably there, not in OpenTSDB.

Let me dig some more in the code...

scicco · 2015-02-24T19:27:03Z

i've fixed increasing JVMARGS as @dennismphil

alienth · 2015-04-22T19:27:58Z

Curiously I upped my stack size and that didn't address the issue. Still getting a StackOverflow, with exactly 1024 Deferred entries in the chain.

alienth · 2015-05-15T01:39:09Z

Additional curiosity: Bumping Xss didn't help me, but bumping VMThreadStackSize to 4096 did result in an unexpected change. It still threw a StackOverflow, but there are now 2048 Deferred entries in the StackOverflow error output instead of 1024.

alberts · 2015-05-21T21:52:13Z

Also running into this in production with 2.1.0 final. Any new info/workarounds?

alienth · 2015-05-21T22:09:30Z

I chatted with @manolama about this in IRC. My issue is that I'm making a query which is fetching a tonne of rows and it will always hit the stackoverflow. For my case, he recommended I try out this patch: manolama@c591503

Once I get that built and give it a try I'll provide an update.

alberts · 2015-05-21T22:57:27Z

I suspect this cluster might have an HBase issue, but it's hard to see from the OpenTSDB log itself.

Running a fsck now and we get a couple of the following per second.

The HBase cluster is busy, but not that busy.

2015-05-21 22:54:43,061 WARN  [New I/O worker #5] Scanner: RegionInfo(table="tsdb", region_name="tsdb,\x01\x86\xFDT\xEDd\xE0\x00\x00\x01\x00\x00\x01\x00\x00\x02\x00\x05\x9E\x00\x00\x03\x00\x00\xE4\x00\x00\x05\x00\x01w\x00\x00\x1B\x00\x01o,1426021282127.d50ad79cc9dc959bf4b329f62b88b19d.", stop_key=[1, -114, 46, 84, -6, 63, -128, 0, 0, 1, 0, 0, 1, 0, 0, 2, 0, 7, -45, 0, 0, 3, 0, 4, 99, 0, 0, 24, 0, 0, -79]) pretends to not know Scanner(table="tsdb", start_key=[1, -117, 125, 84, -13, 111, -64, 0, 0, 23, 0, 0, 100, 0, 0, 100, 0, -43, 13], stop_key="\x01\x94z", columns={"t"}, populate_blockcache=true, max_num_rows=128, max_num_kvs=4096, region=null, filter=null, scanner_id=0x000000000000016E).  I will retry to open a scanner but this is typically because you've been holding the scanner open and idle for too long (possibly due to a long GC pause on your side or in the RegionServer)
org.hbase.async.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 33134, already closed?
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3166)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30808)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2029)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
        at java.lang.Thread.run(Thread.java:745)

Caused by RPC: GetNextRowsRequest(scanner_id=0x000000000000016E, max_num_rows=128, region=null, attempt=0)
        at org.hbase.async.UnknownScannerException.make(UnknownScannerException.java:60) ~[asynchbase-1.6.0.jar:na]
        at org.hbase.async.UnknownScannerException.make(UnknownScannerException.java:32) ~[asynchbase-1.6.0.jar:na]
        at org.hbase.async.RegionClient.makeException(RegionClient.java:1448) [asynchbase-1.6.0.jar:na]
        at org.hbase.async.RegionClient.decodeException(RegionClient.java:1468) [asynchbase-1.6.0.jar:na]
        at org.hbase.async.RegionClient.decode(RegionClient.java:1299) [asynchbase-1.6.0.jar:na]
        at org.hbase.async.RegionClient.decode(RegionClient.java:89) [asynchbase-1.6.0.jar:na]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) [netty-3.9.4.Final.jar:na]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) [netty-3.9.4.Final.jar:na]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.9.4.Final.jar:na]
        at org.hbase.async.RegionClient.handleUpstream(RegionClient.java:1082) [asynchbase-1.6.0.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.9.4.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.9.4.Final.jar:na]
        at org.hbase.async.HBaseClient$RegionClientPipeline.sendUpstream(HBaseClient.java:2677) [asynchbase-1.6.0.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.9.4.Final.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.9.4.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [netty-3.9.4.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [netty-3.9.4.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) [netty-3.9.4.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [netty-3.9.4.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.9.4.Final.jar:na]
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.9.4.Final.jar:na]
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.9.4.Final.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]

alberts · 2015-05-23T20:17:13Z

Still investigating, but it seems that our issues were caused by some queries of certain metrics (potentially over invalid or future time ranges) which were failing, causing valid queries (maybe in the same RPC batch?) to hang forever too.

…that we can continue in a different thread. It has a thread pool for now but I'd like to eventually share that with Netty. This avoids the stack overflow problem in OpenTSDB#334 where the callback chain simply grew too long.

liorsav · 2015-06-02T10:20:36Z

Hi,
We are working with v 2.1.0, getting the same StackOverflowError.
Could you please advise what's the recommended solution/workaround ?
Is there a patch that can be installed ?

Many Thanks,
Lior

liorsav · 2015-06-08T07:07:25Z

Hi,
After applying manolama patch (manolama/opentsdb@c591503 freezes stopped, , however we started getting OOM error.

Exception in thread "pool-7-thread-1" java.lang.OutOfMemoryError: Java heap space

In JHat we noticed many-many CallBack objects.

Please advise,
Thanks,
Lior

GiuVi · 2015-06-12T08:44:45Z

Hi all,
we are working with a 4 nodes Hadoop cluster and a metric with 5 tags: domain, node (cardinality = 33000), group, counter(cardinality = 10), subCounter(cardinality = 3).
10000 metrics per second are written. Each metric has a size of about 100 byte.
There is a production of 33000x10x3~1000000 metrics per minute.

From OpenTSDB log:
2015-06-11 17:18:34,215 ERROR [New I/O worker #34] RegionClient: Unexpected exception from downstream on [id: 0xd72ae84b, /163.162.107.244:52916 => /163.162.107.240:60020]
java.lang.StackOverflowError: null
at com.stumbleupon.async.Deferred.access$100(Deferred.java:430) ~[async-1.4.0.jar:na]
at com.stumbleupon.async.Deferred$Continue.call(Deferred.java:1358) ~[async-1.4.0.jar:na]

Someone can help us?
Thanks,
Giulia

fbobobo · 2015-08-14T15:52:36Z

I'm also having this issue, same logs as those given above. Running OpenTSDB 2.1 on CentOS 6 with an HDP 2.0 cluster. The Hbase cluster looks completely fine.

fbobobo · 2015-09-15T12:56:28Z

Hello guys, do you have any news on that problem ?

xicabin · 2015-09-15T12:58:37Z

Finally I change the JVM args to prevent from this problem.

JVMARGS ... -Xss16m ...

fbobobo · 2015-09-15T13:05:08Z

Already done for me (-Xss100m), and the problem is still present.
We will make a regular daily rolling restart, somewhat ugly, but we have no other choices for the moment

manolama · 2016-04-07T19:56:07Z

Found once case of this happening with 2.0 where too many time series in the output (i.e. 16k or more) will throw this SOFE. https://github.com/OpenTSDB/opentsdb/blob/master/src/tsd/HttpJsonSerializer.java#L842-L850

ntirupattur · 2016-07-06T21:06:31Z

We are hitting this issue on 2.2. could you please update on when this will be fixed. thanks

todd-richmond · 2016-07-07T05:15:10Z

We are using collectd with a "tag" patch that allows adding tags to any metric. Our stack failure occurs in a case where the tag value set is huge (30k) for a one tag and a query is requested for a day or two of data.

opsun · 2017-11-20T06:29:07Z

same issue on v2.3.0.

2017-11-20 15:21:07,485 ERROR [AsyncHBase I/O Worker #9] RegionClient: Unexpected exception from downstream on [id: 0x8d650fd6, /10.108.2.18:46612 => /10.105.39.139:60020]
java.lang.StackOverflowError: null
        at net.opentsdb.core.Span.seekRow(Span.java:365) ~[tsdb-2.3.0.jar:]
        at net.opentsdb.core.Span.access$100(Span.java:36) ~[tsdb-2.3.0.jar:]
        at net.opentsdb.core.Span$Iterator.seek(Span.java:444) ~[tsdb-2.3.0.jar:]
        at net.opentsdb.core.AggregationIterator.<init>(AggregationIterator.java:363) ~[tsdb-2.3.0.jar:]
        at net.opentsdb.core.AggregationIterator.create(AggregationIterator.java:325) ~[tsdb-2.3.0.jar:]
        at net.opentsdb.core.SpanGroup.iterator(SpanGroup.java:487) ~[tsdb-2.3.0.jar:]
        at net.opentsdb.core.SpanGroup.iterator(SpanGroup.java:54) ~[tsdb-2.3.0.jar:]
        at net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver$WriteToBuffer.call(HttpJsonSerializer.java:787) ~[tsdb-2.3.0.jar:]
        at net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver$WriteToBuffer.call(HttpJsonSerializer.java:671) ~[tsdb-2.3.0.jar:]
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.addCallbacks(Deferred.java:688) ~[async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.addCallback(Deferred.java:724) ~[async-1.4.0.jar:na]
        at net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver.call(HttpJsonSerializer.java:860) ~[tsdb-2.3.0.jar:]
        at net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver.call(HttpJsonSerializer.java:625) ~[tsdb-2.3.0.jar:]
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1278) ~[async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1313) ~[async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1284) ~[async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1313) ~[async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1284) ~[async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1257) ~[async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1313) ~[async-1.4.0.jar:na]
        at com.stumbleupon.async.Deferred.doCall(Deferred.java:1284) ~[async-1.4.0.jar:na]

yangzj · 2018-03-02T02:53:30Z

@opsun the same with you

chroth7 · 2019-01-29T13:58:26Z

FWIW, I can confirm @todd-richmond 's finding above (very high tagv cardinality, in my case almost 1Mio) - querying hours worked, but not days.

I restructured the data using "Shift to Metric" as discussed here: http://opentsdb.net/docs/build/html/user_guide/writing/index.html#time-series-cardinality

(losing the aggregation in this dimension is no issue for me)

... and it works lightning-fast again.

HTH

manolama · 2019-02-01T22:45:06Z

Yeah this is better in 3.x but still has an issue with the UID resolution if all of the UIDs are in cache. I'm re-working the pipeline one more time in 3.x and that'll clean it up.

asdf2014 · 2019-04-12T08:53:41Z

Hi, @manolama . I tried the latest version in 3.x and still failed.

asdf2014 · 2019-04-12T10:43:44Z

@manolama After adding the -Xss=16m option, it works! However, the same JVM arguments are used on OpenTSDB 2.x and still fail.. 😅

manolama added the bug label Jun 3, 2014

scicco mentioned this issue Feb 24, 2015

java.lang.ArrayIndexOutOfBoundsException: null using http api #445

Closed

dlmarion mentioned this issue Nov 4, 2015

Pluggable storage implementations #597

Closed

vrajesh1989 mentioned this issue Nov 7, 2023

Skipping exceptions in RegionClient.exceptionCaught #2293

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenTSDBv2.0.0 : java.lang.StackOverflowError: null #334

OpenTSDBv2.0.0 : java.lang.StackOverflowError: null #334

dennismphil commented May 12, 2014

dennismphil commented May 14, 2014

manolama commented Jun 3, 2014

also commented Jul 23, 2014

tsuna commented Feb 13, 2015

scicco commented Feb 13, 2015

tsuna commented Feb 13, 2015

scicco commented Feb 24, 2015

alienth commented Apr 22, 2015

alienth commented May 15, 2015

alberts commented May 21, 2015

alienth commented May 21, 2015

alberts commented May 21, 2015

alberts commented May 23, 2015

liorsav commented Jun 2, 2015

liorsav commented Jun 8, 2015

GiuVi commented Jun 12, 2015

fbobobo commented Aug 14, 2015

fbobobo commented Sep 15, 2015

xicabin commented Sep 15, 2015

fbobobo commented Sep 15, 2015

manolama commented Apr 7, 2016

ntirupattur commented Jul 6, 2016

todd-richmond commented Jul 7, 2016

opsun commented Nov 20, 2017

yangzj commented Mar 2, 2018

chroth7 commented Jan 29, 2019

manolama commented Feb 1, 2019

asdf2014 commented Apr 12, 2019

asdf2014 commented Apr 12, 2019

OpenTSDBv2.0.0 : java.lang.StackOverflowError: null #334

OpenTSDBv2.0.0 : java.lang.StackOverflowError: null #334

Comments

dennismphil commented May 12, 2014

dennismphil commented May 14, 2014

manolama commented Jun 3, 2014

also commented Jul 23, 2014

tsuna commented Feb 13, 2015

scicco commented Feb 13, 2015

tsuna commented Feb 13, 2015

scicco commented Feb 24, 2015

alienth commented Apr 22, 2015

alienth commented May 15, 2015

alberts commented May 21, 2015

alienth commented May 21, 2015

alberts commented May 21, 2015

alberts commented May 23, 2015

liorsav commented Jun 2, 2015

liorsav commented Jun 8, 2015

GiuVi commented Jun 12, 2015

fbobobo commented Aug 14, 2015

fbobobo commented Sep 15, 2015

xicabin commented Sep 15, 2015

fbobobo commented Sep 15, 2015

manolama commented Apr 7, 2016

ntirupattur commented Jul 6, 2016

todd-richmond commented Jul 7, 2016

opsun commented Nov 20, 2017

yangzj commented Mar 2, 2018

chroth7 commented Jan 29, 2019

manolama commented Feb 1, 2019

asdf2014 commented Apr 12, 2019

asdf2014 commented Apr 12, 2019