Has anyone given any thought to adding an RPN capability to OpenTSDB. This would be an RPN expression specified per query that is applied to values retrieved during the query. As an example, if the query wanted to return the percentage of the values queried calculated against a literal, the RPN "1234,/" would be specified. The first value is implicitly the value retrieved from the scan and this would then divide each value by the literal 1234.
Is there any reason why you'd want reverse polish notation rather than just plain normal syntax for arithmetic expressions?
I was actually thinking along the lines of a "plugable" calculation engine. Maybe you could specify the expression such as "rpn:100,/" or "arith:/100" and use the prefix to load / call the appropriate calculation engine. The only reason I stated RPN is then it might be easier to port from say RRDTool to OpenTSDB as RRD already supports RPN. RPN is also easier, imho, to implement as you don't have to deal with precedence rules, etc.
Ah I didn't even know RRDTool supported RPN. What we talked about with Chris (@manolama), and what the guys at Turn partially implemented, is the ability to bind a time series to a variable, so you can use it an arithmetic expression with a normal notation.
I don't think parsing with precedence rules and such is hard. In fact it's one of the first exercises you do when you learn parsers in CS curriculums. The hard part here is defining exactly how the operations are carried out when they involve multiple time series, aggregation, etc.
Has it been decided what to do about this feature request, or are we really just waiting for an implementation to show up? I have built this capability on top of OpenTSDB (outside of OpenTSDB). Some interesting things to consider include how to pair up metrics when they are referenced in an arithmetic operation. I.e. if metric1 is collected at t0 and metric2 is collected at t0+1 minute, and i want a "virtual" metric (metric3) that is metric1 + metric2, what is the timestamp and what is the result?
I have addressed this by using bucketing, which I consider similar to downsampling. Essentially, i round down time stamp by a configurable factor, and average all values for a given metric within that bucket. These buckets essentially give me the timestamps for the results. I can then walk these time buckets and where a bucket contains metric1 and metric2 I can generate a value for virtual metric3. If a bucket doesn't contain all the metrics I could (but don't currently) use interpolation to fill in a value.
Additionally, I allow an option on each metric that determines if the results are emitted, so in the example, it might be that the only result emitted back to the client is metric3.
So, why all this description? Does it make sense, and I think it does, to move this capability into core OpenTSDB? Is someone else working on this or something similar? Should I be thinking about putting this into OpenTSDB?