-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Returned data inconsistent with provided "opentsdb.interval" option #4
Comments
My source data file. All values in millisecond resolution. |
Hi Alexey,
I'll give it a look, in the test cases I tried to check exactly this kind of scenarios. Sometime managing the timezone can be tricky.
BTW, adding adding additional tests definitely helps.
I'll let you know
David
…Sent from my iPhone
On 24 Mar 2017, at 9:33 AM, Alexey Savartsov ***@***.***> wrote:
adr.zip
My source data file. All values in millisecond resolution.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I've discovered a pattern: returning data interval starts at the 'from' boundary rounded to the next hour except cases when 'from' set exactly to 00mm00ss000ms of an hour. For example, if I set from to 12:00:00, everythings ok, if I set it to 12:00:01, I get values starting at 13:00:00 timestamp. 'to' boundary handling works fine. Similarly, on my sample dataset I cat request data from 13:00:00 to 13:00:01, but not in range of 13:00:01 to 13:00:02. |
In fact googling around the opentsdb documentation I understood that the row key is generated from the hour than all the metrics for that hour are in the same row, so somehow the data are organised per hour.
So, it shouldn’t be a defect in my implementation right but we should understand how to formulate the right query. You could try to run a similar query passing through the daemon.
If you run the daemon on the same hbase instance don’t forget to configure it with the right number of salting buckets.
David
… On 24 Mar 2017, at 12:21, Alexey Savartsov ***@***.***> wrote:
I've discovered a pattern: returning data interval starts at the 'from' boundary rounded to the next hour except cases when 'from' set exactly to 00mm00ss000ms of an hour. For example, if I set from to 12:00:00, everythings ok, if I set it to 12:00:01, I get values starting at 13:00:00 timestamp. 'to' boundary handling works fine. Similarly, on my sample dataset I cat request data from 13:00:00 to 13:00:01, but not in range of 13:00:01 to 13:00:02.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHuwb8RcJhpb2ytkF646hJQcXeiJ2wzks5ro6c6gaJpZM4Mn4s->.
|
Thanks to pointing out to the cause of the problem. Turns out issue may be fixed quite easily on the library side. See referenced pull request. |
I merged your pull request in the spark-2.x branch |
I'm trying to get specific range of data with spark-opentsdb by providing "opentsdb.interval" but returned data somehow doesn't exactly match interval.
The code I use to insert data
adr.csv
contains data from 2017-02-02T09:20:00.000Z (12:20:00 at my timezone, GMT+3) to 2017-02-02T10:20:00.000Z (13:20:00 at my timezone)The code I use to read data
Results are
I'm trying to read data from around 12:02 local time, but results start from 13:00. If I mangle from-to values I can get different ranges but they kind of random. Omitting interval option gives all data.
I run code on Spark 2.1.0, Hadoop 2.6.0-cdh5.10.0, HBase 1.2.0-cdh5.10.0, OpenTSDB 2.3.0 in yarn-client mode in Zeppelin notebook, running in local mode in spark shell gives the same results.
The text was updated successfully, but these errors were encountered: