METRON-690: Create a DSL-based timestamp lookup for profiler to enable sparse windows #450
Conversation
For reviewers, please note that as this is an antlr DSL, you may skip the generated code (the java files in the |
I wanted to point out one thing; while this is a DSL called from a DSL (stellar), the way the parser is written, we cache the output of the parser and can reuse them for evaluation, thus saving us much of the impact of the parsing of this DSL. |
…line...and it's the right thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great addition, thanks a lot. Couple minor comments to look over.
I'll want to run this up, so I look forward to the acceptance testing plan
add(Range.between(20L, 30L)); | ||
add(Range.between(40L, 50L)); | ||
}}; | ||
IntervalPredicate predicate = new IntervalPredicate.Identity(intervals); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this (and the similar instances) IntervalPredicate so my compiler likes me again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but your compiler will never like you.
LinkedList<Function<Long, Predicate<Long>>> predicates = new LinkedList<>(); | ||
while (true) { | ||
Token<?> token = stack.pop(); | ||
if (token == LIST_MARKER) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be .equals()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it's reference equals. LIST_MARKER
is a specific instance of Token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should say, it's a specific static instance used for the parser to denote the end of the list of values in the stack.
|
||
while (true) { | ||
Token<?> token = stack.pop(); | ||
if (token == DAY_SPECIFIER_MARKER) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same deal, .equals()
I think the API looks great. Can we provide a grammar in the README for constructing the PROFILE_WINDOW function? I think the API is so flexible that it may be hard to wrap your head around how to construct the profile retrieve string |
The Readme additions were intended to break the expressions down into the possible phrases. Do you think those sections need to be structured differently? |
I think taking the string as an argument is really powerful, but it's also really flexible. "1 hour window every 24 hours starting from 14 days ago including the current day of the week excluding weekends, holidays:us" is a really long string. I was thinking maybe define a grammar to look something like: cmd = <window_duration> <window_period> <window_limit> <window_inclusions> <window_exlcusions> <holidays_spec> ...or something like that...a grammar for constructing this string |
@james-sirota Yep, agreed. I'll modify the docs. |
Testing Instructions beyond the normal smoke test (i.e. letting data Free Up Space on the virtual machineFirst, let's free up some headroom on the virtual machine. If you are running this on a
Preliminaries
This will generate random JSON maps with a numeric field called
Deploy the custom parser
Start the profiler
Test Case
For me, the following was the result:
Here we've validated that the new window function can be called from the relevant topologies as well as the REPL and give consistent results that make sense. |
Not for nothing @cestella but those steps should just be a script if we are going to need them all the time |
INCLUDE : 'include' | 'INCLUDE' | 'includes' | 'INCLUDES' | 'including' | 'INCLUDING'; | ||
EXCLUDE : 'exclude' | 'EXCLUDE' | 'excludes' | 'EXCLUDES' | 'excluding' | 'EXCLUDING'; | ||
|
||
NOW : 'NOW' | 'now'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like "NOW" is used. Can we drop this token?
Ran this up and tested per @cestella instructions, along with testing out the language a bit more. Everything worked well for me. My only gotcha, and this is more a general Stellar issue, rather than an issue here is that if you make a mistake with a syntax error, e.g. Otherwise, I'm +1 on this. It was very flexible, easy to use, and a great contribution. Thanks a lot! |
@justinleet yeah, there's a bug right now with stellar not bubbling up the inner exception. I created https://issues.apache.org/jira/browse/METRON-732 to track it. |
@ottobackwards Yes, your observation is a good one. I wanted to toe the line between "obvious step-by-step" and "automated" and ended up on the side of "step-by-step obvious" because
It's probably worth discussing acceptance testing scripts as a general category of testing that we might be able to do with some of these scripts forming the beginnings of them. |
Honestly, I was thinking more about the 'kill everything' part, which seems to be repeated a lot |
Oh, well, I was honestly hoping that'd go away naturally if we can trim down vagrant to give us enough headroom to not have to kill things to have it run smoothly, but your point is well taken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice grammar. Seems like it was a lot of work. Nice job.
@@ -0,0 +1,17 @@ | |||
COMMA=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to put Window.tokens
and WindowLexer.tokens
in the package that you created (org/apache/metron/profiler/client/window
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found a way. I looked into it with Stellar and it didn't seem possible at the time.
import java.util.function.Function; | ||
import java.util.function.Predicate; | ||
|
||
public class IntervalPredicate<T> implements Predicate<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class seems important. Can you add a brief javadoc describing what this class does? Same for Window
, WindowProcessor
, HolidaysPredicate
, etc. Would go along way helping someone figure out how all this logic fits together without having to read through the entirety of the code to understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair ask. I'll rethink the docs and add javadocs tomorrow AM.
@Stellar( | ||
namespace="PROFILE", | ||
name="WINDOW", | ||
description="The profiler periods associated with a fixed lookback starting from now.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description needs updated. Seems like copy-paste from PROFILE_LOOKBACK
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
return ret; | ||
} | ||
|
||
public static Window parse(String statement) throws ParseException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you call this parse
? If I am reading the code correctly it seems like this tokenizes, parses, executes the expression and returns a result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you like it to be called? Maybe process
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure process
or execute
sound good to me.
return treeBuilder.getWindow(); | ||
} | ||
|
||
public static String syntaxTree(String statement) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe my in-browser search function isn't working, but I don't see who callssyntaxTree
. I also don't understand how it is different from parse
. When do I use this versus parse
? Comments explaining this would be super awesome.
There also seems to be some common code between parse
and syntaxTree
that could be factored out somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nobody is using syntaxTree, that's currently only for debugging issues with the grammar going forward for people adapting the language for new use-cases. Also, @mattf-horton asked for it during the dev discussion. It was marginally useful during the construction of the grammar to understand how things were being parsed. I'll document it as such.
|
||
* <span style="color:red">`time_interval WINDOW?`</span><span style="color:purple">`(INCLUDING specifier_list)? (EXCLUDING specifier_list)?`</span> | ||
* <span style="color:red">`time_interval WINDOW?`</span><span style="color:green">`EVERY time_interval`</span><span style="color:blue">`FROM time_interval (TO time_interval)?`</span><span style="color:purple">`(INCLUDING specifier_list)? (EXCLUDING specifier_list)?`</span> | ||
* <span style="color:blue">`FROM time_interval (TO time_interval)?`</span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heck of a job trying to explain the grammar. I unfortunately still find it a little hard to grasp through the README. Unfortunately, this is probably part of the burden of defining a DSL.
I had a few random thoughts on what might help.
(1) You defined the 4 major components; Total Temporal Duration, Temporal Window Width, etc. I don't see how those relate to where you are describing "the language fits the following three forms". Would it help to define a high-level view of the DSL using those exact terms?
(2) Adding self-referencing links within the document itself might go a long way. For example, when describing the "language fits the following three forms" include links to where each section is described.
(3) It seems like time_interval
and specifier_list
are sub-components of your 4 major components. Maybe don't use those terms until describing each major component specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding 2, that is the intention of the color coding of the examples, which is lost in github but available in the site-book.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to think on these good suggestions and tackle them tomorrow am.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to checkout the site book. Its too bad, Github doesn't render that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was bummed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the color coding should link the three major forms with the types of clauses used to construct the major forms, so for 1 it becomes more clear if you look at it from the site-book as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and made it color coded and made links. I also verified that it renders properly in the site-book.
* <span style="color:blue">`from 1 hour ago until 30 minutes ago`</span> | ||
* <span style="color:blue">`from 30 minutes ago until 1 hour ago`</span> | ||
* <span style="color:blue">`starting from 1 hour ago to 30 minutes ago`</span> | ||
* <span style="color:blue">`starting from 1 hour to 30 minutes`</span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I be able to do this? It does not seem to like floats and throws an exception.
from 1.5 hours ago
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not support floats just yet as we utilize TimeUnit
for our time conversion. If you want more granular than hours, then it was intended that you go down to minutes: 90 minutes ago
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this more explicit in the docs.
Due to the vagaries of the english language, the from and the to portions, if both specified, are interchangeable | ||
with regard to which one specifies the start and which specifies the end. | ||
|
||
In other words <span style="color:blue">`starting from 1 hour ago to 30 minutes ago`</span> and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the time intervals exclusive or inclusive? Might be good to call that out specifically, if you haven't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All time intervals are inclusive, but it's worth mentioning in the docs.
* `TO` - Can be the words "until" or "to" | ||
* `AGO` - Optionally the word "ago" | ||
|
||
The `TO time_interval AGO` portion is optional. If unspecified then it is expected that the time interval ends now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to consider is that we seem to be assuming "processing time" with the grammar. For example, in the testing notes that you provided, we are enriching the message with an expression like this.
STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('5 minute window every 10 minutes starting from 2 minutes ago until 32 minutes ago excluding holidays:us'))))
When we grab the profile from "2 minutes ago" this will be 2 minutes ago from processing time (aka system time) rather than the event time of the message.
Is there any way to support event time now? I'm not sure that this is needed right now, but thought we should have a discussion around this at the very least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can pass the event time as the second argument to PROFILE_WINDOW
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great.
|
||
**Examples** | ||
|
||
* A repeating 30 minute window starting 2 hours ago and repeating every hour until now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I select the last 1 hour window for the past 8 Tuesdays (assuming today is Tuesday)?
How do I select the last 1 hour window for the past 8 'current day of the week'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 hour window every 24 hours starting from 56 days ago including tuesdays
1 hour window every 24 hours starting from 56 days ago including this day of the week
This was by far my favorite question so far. A subtle overflow bug was caught during its answer. There's now a unit test for this one. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. If there is not an example like this in the README (I probably missed it) it would be good to add. This is closer to the type of use case that you were trying to solve initially with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good feedback, I'll add that.
Ok, good catch, I've made the functionality and some of the doc changes requested @nickwallen and I'll finish the rest in the morning. |
Ok @nickwallen I think I have adjusted the docs and functionality sufficiently well that I have addressed your issues. Would you please give it another look and let me know if I missed anything? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Very nice. Love it.
Creating a small DSL to allow specifying profiles from windows of time that may:
This also provides a
PROFILE_WINDOW
Stellar function which accepts this DSL as a string and can return the set of profiler periods based on the times selected and suitable for passing toPROFILE_GET
. To be clear, you can compose the two functions.i.e.
PROFILE_GET('profile', 'entity', PROFILE_WINDOW('1 hour window every 24 hours starting from 14 days ago including the current day of the week excluding weekends, holidays:us'))
would retrieve all the profile measurements written for the profile "profile" and entity "entity" for the last hour on the same weekday excluding weekends and US holidays across the last 14 daysFor a complete description with examples, see here.
Acceptance testing plan will be submitted as a follow-on comment.