Align downsampling intervals to the Gregorian calendar. #657

Closed
wants to merge 1 commit into
from

Projects

None yet

3 participants

@cpdevoto
Contributor

This feature builds on the skeleton provided by @moarcaccio in
Pull Request #548, adding in all of the functionality requested by
@manolama in his Pull Request comments, and resolving a number of
defects which rendered the Pull Request unusable. After waiting
several months for @moarcaccio to complete the proposed
feature, we decided to move forward with our own Pull Request.

This feature supports the alignment of downsampling intervals to the
Gregorian calendar based on four different time categories:

  • DAILY: The start time of each interval is computed as the start of the
    day in which the first data point occurs, based on a specified time zone
    (or the default JVM time zone, if no time zone has been specified).
    The end time of each interval is computed as the end of the day in which
    the first data point occurs. For instance, if the specified time zone
    is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z,
    then start of the interval will be computed as 2016-01-05T00:00:00.000Z,
    while the end of the interval will be computed as 2016-01-05T23:59:59.999Z.
  • WEEKLY: The start time of each interval is computed as the start of the
    week in which the first data point occurs, based on a specified time zone
    (or the default JVM time zone, if no time zone has been specified).
    The end time of each interval is computed as the end of the week in which
    the first data point occurs. Weeks are considered to begin on Sundays (in
    the future, it might be a good idea to allow for variations based on a
    configuration setting). For instance, if the specified time zone is UTC,
    and the timestamp of the first data point is 2016-01-05T05:32:00Z, then
    start of the interval will be computed as 2016-01-03T00:00:00.000Z,
    while the end of the interval will be computed as 2016-01-09T23:59:59.999Z.
  • MONTHLY: The start time of each interval is computed as the start of the
    month in which the first data point occurs, based on a specified time zone
    (or the default JVM time zone, if no time zone has been specified).
    The end time of each interval is computed as the end of the month in which
    the first data point occurs. For instance, if the specified time zone
    is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z,
    then start of the interval will be computed as 2016-01-01T00:00:00.000Z,
    while the end of the interval will be computed as 2016-01-31T23:59:59.999Z.
  • YEARLY: The start time of each interval is computed as the start of the
    year in which the first data point occurs, based on a specified time zone
    (or the default JVM time zone, if no time zone has been specified).
    The end time of each interval is computed as the end of the year in which
    the first data point occurs. For instance, if the specified time zone
    is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z,
    then start of the interval will be computed as 2016-01-01T00:00:00.000Z,
    while the end of the interval will be computed as 2016-12-31T23:59:59.999Z.

This feature also allows for the alignment of intervals that are multiples
of one year, one month, one week, or one day
. In cases where a given
interval is a multiple of more than one time category, the larger time
category will be used. For instance, an interval of 24 months will be
interpreted as an interval of two years, and will be aligned to the calendar
accordingly. As such, if the specified time zone is UTC,
and the timestamp of the first data point is 2016-03-05T05:32:00Z, then
the start of the interval will be computed as 2016-01-01T00:00:00.000Z,
while the end of the interval will be computed as 2017-12-31T23:59:59.999Z.
This is in keeping with the principle of least astonishment.

To specify the time zone for a given HTTP query, include a query string
parameter named "tz"
with a value equal to a JVM time zone id (e.g. "UTC").
If a time zone is not included in the query string, the default JVM time zone will
be used.

To specify that a given HTTP query should use the calendar alignment feature
for downsampling, include a query string parameter named "use_calendar" with
a value of "true". You can stipulate that all HTTP queries should use the
calendar alignment feature by including a "tsd.query.downsample.use_calendar"
configuration setting
within the opentsdb.conf file and by setting its value
to "true" (the default value is "false"). The value of this config file setting can be
overridden on a per-query basis by including the "use_calendar" parameter in
the query string as specified above.

@cpdevoto
Contributor

Note that, for performance reasons, it might be desirable to use Joda DateTime
instead of java.util.Calendar for the interval computations. This
would, of course, entail introducing a new third-party dependency, albeit one
that is widely accepted as a de facto standard.

@cpdevoto cpdevoto Align downsampling intervals to the Gregorian calendar.
This feature supports the alignment of downsampling intervals to the
Gregorian calendar based on four different time categories:

- DAILY: The start time of each interval is computed as the start of the
  day in which the first data point occurs, based on a specified time zone
  (or the default JVM time zone, if no time zone has been specified).
  The end time of each interval is computed as the end of the day in which
  the first data point occurs.  For instance, if the specified time zone
  is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z,
  then start of the interval will be computed as 2016-01-05T00:00:00.000Z,
  while the end of the interval will be computed as 2016-01-05T23:59:59.999Z.

- WEEKLY: The start time of each interval is computed as the start of the
  week in which the first data point occurs, based on a specified time zone
  (or the default JVM time zone, if no time zone has been specified).
  The end time of each interval is computed as the end of the week in which
  the first data point occurs.  Weeks are considered to begin on Sundays (in
  the future, it might be a good idea to allow for variations based on a
  configuration setting). For instance, if the specified time zone is UTC,
  and the timestamp of the first data point is 2016-01-05T05:32:00Z, then
  start of the interval will be computed as 2016-01-03T00:00:00.000Z,
  while the end of the interval will be computed as 2016-01-09T23:59:59.999Z.

- MONTHLY: The start time of each interval is computed as the start of the
  month in which the first data point occurs, based on a specified time zone
  (or the default JVM time zone, if no time zone has been specified).
  The end time of each interval is computed as the end of the month in which
  the first data point occurs.  For instance, if the specified time zone
  is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z,
  then start of the interval will be computed as 2016-01-01T00:00:00.000Z,
  while the end of the interval will be computed as 2016-01-31T23:59:59.999Z.

- YEARLY: The start time of each interval is computed as the start of the
  year in which the first data point occurs, based on a specified time zone
  (or the default JVM time zone, if no time zone has been specified).
  The end time of each interval is computed as the end of the year in which
  the first data point occurs.  For instance, if the specified time zone
  is UTC, and the timestamp of the first data point is 2016-01-05T05:32:00Z,
  then start of the interval will be computed as 2016-01-01T00:00:00.000Z,
  while the end of the interval will be computed as 2016-12-31T23:59:59.999Z.

This feature also allows for the alignment of intervals that are multiples
of one year, one month, one week, or one day.  In cases where a given
interval is a multiple of more than one time category, the larger time
category will be used. For instance, an interval of 24 months will be
interpreted as a interval of two years, and will be aligned to the calendar
accordingly. As such, if the specified time zone is UTC,
and the timestamp of the first data point is 2016-03-05T05:32:00Z, then
start of the interval will be computed as 2016-01-01T00:00:00.000Z,
while the end of the interval will be computed as 2017-12-31T23:59:59.999Z.
This is in keeping with the principle of least astonishment.

To specify the time zone for a given HTTP query, include a query string
parameter named "tz" with a value equal to a JVM time zone id (e.g. "UTC").
If a time zone is not included in the query string, the default JVM time zone
will be used.

To specify that a given HTTP query should use the calendar alignment feature
for downsampling, include a query string parameter named "use_calendar" with
a value of "true". You can stipulate that all HTTP queries should use the
calendar alignment feature by including a "tsd.query.downsample.use_calendar"
configuration setting within the opentsdb.conf file and by setting its value
to "true" (the default value is "false").  This config file setting can be
overridden on a per-query basis by including the "use_calendar" parameter in
the query string as specified above.
749a54d
@johann8384
Member

@manolama Could we get this merged into the 2.3.0 branch (currently put)?

I'm pulling it into our internal build and will test it there.

@cpdevoto
Contributor
cpdevoto commented Feb 8, 2016

@johann8384 Out of curiosity, how did your testing go? We have been using this patch in a production system for a couple of months now, and have encountered no problems so far.

@johann8384
Member

I didn't find any issues with it.

@manolama
Member
manolama commented Mar 8, 2016

Finally taking a crack at this for v2.3. I'm rebasing it to work with the added "all" downsampling and there are some test cases I need to run it through to make sure it's good. Thanks!

@manolama
Member

Ok, it looked pretty good but there were a few tweaks I needed to make such as handling hourly downsampling and aligning to useful boundaries if someone gives an odd interval. Also cleaned it up a bit so it's a little faster, advancing calendars instead of creating a new one. I'll post that after I fix up the UTs.
Committed yours in 5012f7d. Thanks!

@manolama manolama closed this Mar 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment