Add support for metric hierarchies with more than 2 levels#119
Add support for metric hierarchies with more than 2 levels#119
Conversation
unleashed
left a comment
There was a problem hiding this comment.
Looking good - check comments. Only minor concerns are ensuring we always behave like usages from the full hierarchy are being taken into account (ie. see comment about adding children usages) and making tests a bit more generic to cover N-level depths, with N > 3, possibly chosen at random.
|
|
||
| res += children | ||
| pending += children | ||
| end |
There was a problem hiding this comment.
This is pretty unidiomatic code and can be improved a fair bit. The suggestions below are totally untested, so take them with a grain of salt and consider any needed fixes when comparing to the original above.
The pending array could just use indexes to point to the next res element to handle, breaking when the result is nil, avoiding the duplicated memory and instead dealing with integers:
idx = 0
loop do
metric = res[idx] || break res # `res` as return value will not be needed after the loop
children = metric_hiearchy[metric] || next
res += children
idx += 1
endYou could do a similar thing if you use an enumerator over res and keep the same object. Note that in Ruby it is safe to modify the underlying array for our purposes as only a reference to the object is kept (but only useful for us to push at the end), but creating a new array and assigning res to it wouldn't work (ie. using +). This also avoids duplicating temporary arrays.
metrics = res.each
loop do
metric = metrics.next # StopIteration is handled by `loop`, no rescues needed
children = metric_hiearchy[metric] || next
res.push(*children) # splat children to have a flattened push
end
res # this way we have to return res hereYou can shrink it further by using the usual enumerator form:
res.each do |m|
children = metric_hiearchy[m] || next
res.push(*children)
end # return value is already the `res` array we've been working onThere was a problem hiding this comment.
I don't agree. I think that my solution clearly separates the result and the queue of pending metrics. It has more lines, yes, but I think it's easier to understand.
In the solutions that you proposed you are iterating over a growing array directly or via pointers. I find that more difficult to reason about.
There was a problem hiding this comment.
The number of lines is not super important, but it is an indicator of complexity. I remember that I struggled to understand this code when I went over it. The concept of "readability" can be very subjective, sometimes depending on the familiarity of people with different code and styles, so I have gone over the code again and noted down some things. For the sake of discussion I've extracted the relevant code into a method, and lacking a better term, I've named it zipped_deep_map:
def descendants(service_id, metric_name)
metrics_hierarchy = hierarchy(service_id)
children = metrics_hierarchy[metric_name] || []
zipped_deep_map(children) do |m|
metrics_hierarchy[m] || []
end
end
private
# zip each element with its mapped results
def zipped_deep_map(ary, &blk)
# ...
endOk, here's the PR's version:
def zipped_deep_map(ary, &blk)
result = ary
pending = ary.dup
until pending.empty?
e = pending.shift
new_elements = blk.call e
pending.push(*new_elements)
result.push(*new_elements)
end
result
endNote: I've fixed the 2x duplicate array creation in each loop done by the
+=operations by replacing them withArray#pushwith a splat argument.
So this is what I've noticed from the POV of readability/ease of understanding:
- The loop is not an enumerator method or a combination of them, but a manual loop in imperative style.
- Because of the above, the loop condition depends on understanding what effect the body has.
- The loop requires a set-up prologue that is only understood when you understand the loop body and condition.
- The body performs 2 different operations 3 times involving 3 different arrays (actually 4 but
resultis an alias ofary). - The original array is modified as an intended side effect of the loop.
- The array over which the iteration is being done is modified twice in each iteration.
Well, here's a version in less lines that clearly separates the queue of pending metrics, does not even modify the looping array, uses a well-known enumerator rather than a manual loop requiring context or set-up, and only does one explicit unary array modification referred to by method name, flatten. This is arguably a more understandable version by the mentioned metrics:
def zipped_deep_map(ary, &blk)
ary.map do |e|
new_elements = blk.call e
[e, zipped_deep_map(new_elements, &blk)]
end.flatten
endOf course, this snippet uses recursion, which in this particular case might be acceptable considering the depth levels, and it ends up combining a lot of arrays and generating a different order, but the key in readability is that it is not some arbitrary behaviour that needs to be parsed and analysed with multiple pieces needing to be considered together. It uses the well-known map and flatten methods instead.
The snippet below is the last proposed solution in my original comment:
def zipped_deep_map(ary, &blk)
ary.each do |e|
new_elements = blk.call e
ary.push(*new_elements)
end
endThis is shorter and simpler than the previous snippets in terms of number of lines and number of operations. The things I've noted are:
- It modifies the input array.
- The modified array is the one being iterated on.
- The loop only does 1 single explicit array operation involving 2 different arrays.
- It exploits the not-so-well-known yet logical behaviour of pushing to the array being enumerated.
Importantly, there is no explicit manual control over the loop, the behaviour of each is well-known and understanding the method would at most require the clarification in the last bullet point. The body is small, there is no other duplicated array, and it is more efficient in both time and very much in space, with the latter point, the avoidance of recursion and reduced surface for human error, and the fact you can embed it directly and be the same complexity as invoking it with a method being the reasons I'd prefer this version overall.
I don't feel strongly about this, so at this point I'll just accept any of the three, but the point is that readability/understandability has a subjective meaning and it helps noting down what you feel subtracts from it so that you can approach an objective notion (ie. like last&.save, which is odd to someone used to pre-2.3 code and is more common in recent versions, just depends on whether your brain is used to it and how it feels in combination of the rest of the code).
There was a problem hiding this comment.
Regarding your first solution. I also thought about a recursive solution. I thought that it should not be a problem given the number of levels that we should expect, but was not too sure. If you think that it's not a problem, I think that something like this is better because it clearly expresses the idea that the descendants of a metrics are its children plus the descendants of each child:
def descendants(service_id, metric_name)
metrics_hierarchy = hierarchy(service_id)
children = metrics_hierarchy[metric_name] || []
children.reduce(children) do |acc, child|
acc + descendants(service_id, child)
end
endThere was a problem hiding this comment.
My concern is not the "clearly expresses" part but the performance - recursion is probably never going to be very deep, but the overhead of that in performance should be controlled (ie. you are adding a big K in time). Plus we should use Array#push rather than + to avoid duplicating arrays while dropping temporary ones all the time.
There was a problem hiding this comment.
We can't measure everything, so we need to develop some sense about what is performant and what is not. The sentence "not worrying (much) about performance until we demonstrate an issue" is featured in many horror tales ending with "and that's why we rewrote the whole thing". :P
My sense of performance says this is not O(kN) but O(Kn) with a big K, and the recursive function not being tail-call means an extra penalty (but I think Ruby does not even optimize it if it was).
I'd accept the recursive version with Array#push with the caveat that the next guy that will touch this code will not realize that it is recursive when we finally have arbitrary depths, so at least this would warrant a big note in case someone comes back at you saying "this crashed in production and you didn't warn me!".
There was a problem hiding this comment.
For completeness, here's a quick benchmark:
Comparison:
each: 2163979.4 i/s
orig: 1428006.8 i/s - 1.52x slower
rec: 1317247.2 i/s - 1.64x slower
Benchmarks finished
The first being the .each solution, the second the original PR and the last the fully recursive version (all of them using Array#push).
Not that important unless this sits in the hot path of auths.
There was a problem hiding this comment.
It might be ~1.5x slower but we don't really know if that is a problem.
What if this method represents only ~0.00...% of the total CPU time?
There was a problem hiding this comment.
What if it does not? What if it does represent a lot in certain cases with lots of metrics? :D
Also, why should one piece of code be slow just because every other place is slow? :/
Death by a thousand cuts, ok, you win now - add it already!
There was a problem hiding this comment.
That's why I mentioned several comments above that we need profiling to be sure about this kind of things.
Changed.
111b878 to
2d011d0
Compare
|
|
||
| res += children | ||
| pending += children | ||
| end |
There was a problem hiding this comment.
The number of lines is not super important, but it is an indicator of complexity. I remember that I struggled to understand this code when I went over it. The concept of "readability" can be very subjective, sometimes depending on the familiarity of people with different code and styles, so I have gone over the code again and noted down some things. For the sake of discussion I've extracted the relevant code into a method, and lacking a better term, I've named it zipped_deep_map:
def descendants(service_id, metric_name)
metrics_hierarchy = hierarchy(service_id)
children = metrics_hierarchy[metric_name] || []
zipped_deep_map(children) do |m|
metrics_hierarchy[m] || []
end
end
private
# zip each element with its mapped results
def zipped_deep_map(ary, &blk)
# ...
endOk, here's the PR's version:
def zipped_deep_map(ary, &blk)
result = ary
pending = ary.dup
until pending.empty?
e = pending.shift
new_elements = blk.call e
pending.push(*new_elements)
result.push(*new_elements)
end
result
endNote: I've fixed the 2x duplicate array creation in each loop done by the
+=operations by replacing them withArray#pushwith a splat argument.
So this is what I've noticed from the POV of readability/ease of understanding:
- The loop is not an enumerator method or a combination of them, but a manual loop in imperative style.
- Because of the above, the loop condition depends on understanding what effect the body has.
- The loop requires a set-up prologue that is only understood when you understand the loop body and condition.
- The body performs 2 different operations 3 times involving 3 different arrays (actually 4 but
resultis an alias ofary). - The original array is modified as an intended side effect of the loop.
- The array over which the iteration is being done is modified twice in each iteration.
Well, here's a version in less lines that clearly separates the queue of pending metrics, does not even modify the looping array, uses a well-known enumerator rather than a manual loop requiring context or set-up, and only does one explicit unary array modification referred to by method name, flatten. This is arguably a more understandable version by the mentioned metrics:
def zipped_deep_map(ary, &blk)
ary.map do |e|
new_elements = blk.call e
[e, zipped_deep_map(new_elements, &blk)]
end.flatten
endOf course, this snippet uses recursion, which in this particular case might be acceptable considering the depth levels, and it ends up combining a lot of arrays and generating a different order, but the key in readability is that it is not some arbitrary behaviour that needs to be parsed and analysed with multiple pieces needing to be considered together. It uses the well-known map and flatten methods instead.
The snippet below is the last proposed solution in my original comment:
def zipped_deep_map(ary, &blk)
ary.each do |e|
new_elements = blk.call e
ary.push(*new_elements)
end
endThis is shorter and simpler than the previous snippets in terms of number of lines and number of operations. The things I've noted are:
- It modifies the input array.
- The modified array is the one being iterated on.
- The loop only does 1 single explicit array operation involving 2 different arrays.
- It exploits the not-so-well-known yet logical behaviour of pushing to the array being enumerated.
Importantly, there is no explicit manual control over the loop, the behaviour of each is well-known and understanding the method would at most require the clarification in the last bullet point. The body is small, there is no other duplicated array, and it is more efficient in both time and very much in space, with the latter point, the avoidance of recursion and reduced surface for human error, and the fact you can embed it directly and be the same complexity as invoking it with a method being the reasons I'd prefer this version overall.
I don't feel strongly about this, so at this point I'll just accept any of the three, but the point is that readability/understandability has a subjective meaning and it helps noting down what you feel subtracts from it so that you can approach an objective notion (ie. like last&.save, which is odd to someone used to pre-2.3 code and is more common in recent versions, just depends on whether your brain is used to it and how it feels in combination of the rest of the code).
| memo[p_id] = memo[id] | ||
| is_set_op = Usage.is_set?(val) | ||
|
|
||
| while p_id |
There was a problem hiding this comment.
This would be a legitimate case for while p_id = parent_id(id) removing the assignments both above and below.
There was a problem hiding this comment.
Not really, because id is defined in the outer loop.
There was a problem hiding this comment.
But this is p_id, which you only use inside the loop :?
There was a problem hiding this comment.
I did. It doesn't work because id is in the outer loop.
So if we wanted to put the condition in the while, we'd need to do something like
...
current_metric_id = id
while (current_metric_id = parent_id(current_metric_id))
...
end
...Notice that we'd need to initialize the variable anyway, and whether that makes the code more readable is debatable.
There was a problem hiding this comment.
You don't need to, it's enough to just name it id and avoid the special casing of the first id not being a real parent id, since you don't do anything particular with it. There's enough context to not need any distinction.
Edit: while id = parent_id(id)
There was a problem hiding this comment.
How's that more readable? It shadows the id defined in do |memo, (id, val)|
There was a problem hiding this comment.
It does not shadow it, the id yielded is never used again at all, just to bootstrap the process. IOW, you can extract the inner loop to a method and you'll have exactly the same code, and more readable.
There was a problem hiding this comment.
I don't have a strong opinion about this. To me, the 2 solutions are roughly equivalent and it kind of comes to personal preference. I've changed it so we can move on.
2d011d0 to
1535cc9
Compare
50a4562 to
df760e7
Compare
df760e7 to
19264ee
Compare
19264ee to
a57ab99
Compare
|
bors r=@unleashed |
119: Add support for metric hierarchies with more than 2 levels r=unleashed a=davidor This PR addresses some of the points in #114 The PR adds support for metric hierarchies with more than 2 levels. For example: `m1 --child_of--> m2 --child_of --> m3` In particular: - Auth, authrep, and report calls now take into account the whole hierarchy when applying limits. - The XML returned in authrep calls now shows an updated value for the `current_value` field in all the metrics affected in the hierarchy. This PR does not adapt the limits and the hierarchy extensions to work with metric hierarchies of more than 2 levels. That will be done in a separate PR. Co-authored-by: David Ortiz <z.david.ortiz@gmail.com>
Build succeeded |
This PR addresses some of the points in #114
The PR adds support for metric hierarchies with more than 2 levels. For example:
m1 --child_of--> m2 --child_of --> m3In particular:
current_valuefield in all the metrics affected in the hierarchy.This PR does not adapt the limits and the hierarchy extensions to work with metric hierarchies of more than 2 levels. That will be done in a separate PR.