Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Added support for timing out long-running queries (disabled by default) #290

Closed
wants to merge 1 commit into from

2 participants

Jan Mangs Chris Larsen
Jan Mangs

Addresses feature request in #289

Problem Description:

Basically we had a Ruby script running against our DB looking for metrics to clean-up - I was running 8 concurrent requests to OpenTSDB. We managed to hit a few metrics that have millions of data points - basically it keeps retrieving data from HBase even though it's been running for 500-1000 seconds already.

The VM that this node was on ran out of memory so random parts starting throwing OOM errors without actually crashing the Deferred queries; we had 8 queries each occupying about 400-500MB of JVM memory. In our case, we'd rather have these queries fail so we don't lock up all of our workers running queries that will never finish in a reasonable time.

Feature:
What I did was to add a configurable timeout property that errors out ScannerCB() after a certain amount of time.

By default, this property tsd.query.timeout defaults to -1 since that is the normal functionality in OpenTSDB. This results in ScannerCB() never timing out.

If the time spent in HBase exceeds this timeout then an exception is thrown

if (timeout >= 0 && hbase_time > timeout) {
             throw new InterruptedException("Query timeout exceeded!");
}

Hopefully InterruptedException is acceptable. Right now you get "java.lang.RuntimeException: Shouldn't be here" from the timeout because there's no specific catch for this exception... which makes sense to me since it shouldn't be a common intended occurrence unless you put timeouts to 10ms or something.

I'd consider this a useful feature for production databases. We have many different teams using these metrics and there's a possibility someone could accidentally query something with millions of data points and eat up a lot of resources on a node. In my case, I ran into this problem while trying to search a wide date range for unused metrics (I was checking every single metric one by one)

Chris Larsen
Owner

Awesome, thanks! Would it make sense to use a default of 0 to disable it instead of -1?

Jan Mangs

Yea I agree. I can push a commit tomorrow to change it to 0 for disable (I originally had 0 as the disable and switched it to -1 later, hah)

Chris Larsen
Owner

Merged in 0bdb855. Note that this only handles long running scans whereas a lot of time is actually spent in the aggregation and serialization routines. We'll ultimately need timeouts there as well.

Chris Larsen manolama closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Feb 10, 2015
  1. Jan Mangs

    Added support for timing out long-running queries (disabled by default)

    jan-mangs authored jmangs committed
    Conflicts:
    	src/utils/Config.java
This page is out of date. Refresh to see the latest.
Showing with 6 additions and 0 deletions.
  1. +5 −0 src/core/TsdbQuery.java
  2. +1 −0  src/utils/Config.java
5 src/core/TsdbQuery.java
View
@@ -367,6 +367,7 @@ private void findGroupBys(final Map<String, String> tags) {
boolean seenAnnotation = false;
int hbase_time = 0; // milliseconds.
long starttime = System.nanoTime();
+ long timeout = tsdb.getConfig().getLong("tsd.query.timeout");
/**
* Starts the scanner and is called recursively to fetch the next set of
@@ -403,6 +404,10 @@ public Object call(final ArrayList<ArrayList<KeyValue>> rows)
return null;
}
+ if (timeout > 0 && hbase_time > timeout) {
+ throw new InterruptedException("Query timeout exceeded!");
+ }
+
for (final ArrayList<KeyValue> row : rows) {
final byte[] key = row.get(0).key();
if (Bytes.memcmp(metric, key, 0, metric_width) != 0) {
1  src/utils/Config.java
View
@@ -457,6 +457,7 @@ protected void setDefaults() {
default_map.put("tsd.http.request.cors_headers", "Authorization, "
+ "Content-Type, Accept, Origin, User-Agent, DNT, Cache-Control, "
+ "X-Mx-ReqToken, Keep-Alive, X-Requested-With, If-Modified-Since");
+ default_map.put("tsd.query.timeout", "0");
for (Map.Entry<String, String> entry : default_map.entrySet()) {
if (!properties.containsKey(entry.getKey()))
Something went wrong with that request. Please try again.