Skip to content

Smart query layer with rolled up data #6368

@amitchopraait

Description

@amitchopraait

Our use case is to provide users with capability to slice and dice data over varying time intervals. The user may look at certain metric over last month and then zoom in to a specific week, day or hour to further analyze the data.

For this we plan to store raw segments, as well have rollup jobs (using minion) to have aggregated data for day, month etc. With this, we will only lose the granularity of the time column but will not lose any of the old dimensions.

To take an example:

Event Timestamp Org Device Rule Id Process Name Process Hash Count
2020/05/01 00:13:11 Coke Amit-01 111 cmd.exe 12345678 3
2020/05/01 00:20:11 Pepsi Rahul-01 222 java.exe 98765432 1
2020/05/01 00:30:11 Coke Amit-01 111 cmd.exe 12345678 1
2020/05/01 00:44:11 Coke Amit-01 111 cmd.exe 12345678 1
2020/05/01 00:55:11 Coke Amit-01 222 java.exe 98765432 1

But if we rollup the data to hour granularity from second granularity in the above example, we will have the following data in rolled up segment. As you can see, no loss of dimensions, only loss of granularity of time:

Event Timestamp Org Device Rule Id Process Name Process Hash Count
2020/05/01 0000 Coke Amit-01 111 cmd.exe 12345678 5
2020/05/01 0000 Pepsi Rahul-01 222 java.exe 98765432 1
2020/05/01 0000 Coke Amit-01 222 java.exe 98765432 1

Now, given these raw as well rolled up segments (for day, week, hour), it would be great if the broker can understand and decide which segment to use, depending on the query time interval.

Also attached a diagram to show the rollup and smart query pictorially
Screen Shot 2020-12-17 at 10 46 34 PM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions