Add support for metric hierarchies with more than 2 levels by davidor · Pull Request #119 · 3scale/apisonator

davidor · 2019-08-28T08:44:34Z

This PR addresses some of the points in #114

The PR adds support for metric hierarchies with more than 2 levels. For example:
m1 --child_of--> m2 --child_of --> m3

In particular:

Auth, authrep, and report calls now take into account the whole hierarchy when applying limits.
The XML returned in authrep calls now shows an updated value for the current_value field in all the metrics affected in the hierarchy.

This PR does not adapt the limits and the hierarchy extensions to work with metric hierarchies of more than 2 levels. That will be done in a separate PR.

unleashed

Looking good - check comments. Only minor concerns are ensuring we always behave like usages from the full hierarchy are being taken into account (ie. see comment about adding children usages) and making tests a bit more generic to cover N-level depths, with N > 3, possibly chosen at random.

unleashed · 2019-08-28T15:54:37Z

+
+            res += children
+            pending += children
+          end


This is pretty unidiomatic code and can be improved a fair bit. The suggestions below are totally untested, so take them with a grain of salt and consider any needed fixes when comparing to the original above.

The pending array could just use indexes to point to the next res element to handle, breaking when the result is nil, avoiding the duplicated memory and instead dealing with integers:

idx = 0 loop do metric = res[idx] || break res # `res` as return value will not be needed after the loop children = metric_hiearchy[metric] || next res += children idx += 1 end

You could do a similar thing if you use an enumerator over res and keep the same object. Note that in Ruby it is safe to modify the underlying array for our purposes as only a reference to the object is kept (but only useful for us to push at the end), but creating a new array and assigning res to it wouldn't work (ie. using +). This also avoids duplicating temporary arrays.

metrics = res.each loop do metric = metrics.next # StopIteration is handled by `loop`, no rescues needed children = metric_hiearchy[metric] || next res.push(*children) # splat children to have a flattened push end res # this way we have to return res here

You can shrink it further by using the usual enumerator form:

res.each do |m| children = metric_hiearchy[m] || next res.push(*children) end # return value is already the `res` array we've been working on

I don't agree. I think that my solution clearly separates the result and the queue of pending metrics. It has more lines, yes, but I think it's easier to understand.
In the solutions that you proposed you are iterating over a growing array directly or via pointers. I find that more difficult to reason about.

The number of lines is not super important, but it is an indicator of complexity. I remember that I struggled to understand this code when I went over it. The concept of "readability" can be very subjective, sometimes depending on the familiarity of people with different code and styles, so I have gone over the code again and noted down some things. For the sake of discussion I've extracted the relevant code into a method, and lacking a better term, I've named it zipped_deep_map:

def descendants(service_id, metric_name) metrics_hierarchy = hierarchy(service_id) children = metrics_hierarchy[metric_name] || [] zipped_deep_map(children) do |m| metrics_hierarchy[m] || [] end end private # zip each element with its mapped results def zipped_deep_map(ary, &blk) # ... end

Ok, here's the PR's version:

def zipped_deep_map(ary, &blk) result = ary pending = ary.dup until pending.empty? e = pending.shift new_elements = blk.call e pending.push(*new_elements) result.push(*new_elements) end result end

Note: I've fixed the 2x duplicate array creation in each loop done by the += operations by replacing them with Array#push with a splat argument.

So this is what I've noticed from the POV of readability/ease of understanding:

The loop is not an enumerator method or a combination of them, but a manual loop in imperative style.

Because of the above, the loop condition depends on understanding what effect the body has.

The loop requires a set-up prologue that is only understood when you understand the loop body and condition.

The body performs 2 different operations 3 times involving 3 different arrays (actually 4 but result is an alias of ary).

The original array is modified as an intended side effect of the loop.

The array over which the iteration is being done is modified twice in each iteration.

Well, here's a version in less lines that clearly separates the queue of pending metrics, does not even modify the looping array, uses a well-known enumerator rather than a manual loop requiring context or set-up, and only does one explicit unary array modification referred to by method name, flatten. This is arguably a more understandable version by the mentioned metrics:

def zipped_deep_map(ary, &blk) ary.map do |e| new_elements = blk.call e [e, zipped_deep_map(new_elements, &blk)] end.flatten end

Of course, this snippet uses recursion, which in this particular case might be acceptable considering the depth levels, and it ends up combining a lot of arrays and generating a different order, but the key in readability is that it is not some arbitrary behaviour that needs to be parsed and analysed with multiple pieces needing to be considered together. It uses the well-known map and flatten methods instead.

The snippet below is the last proposed solution in my original comment:

def zipped_deep_map(ary, &blk) ary.each do |e| new_elements = blk.call e ary.push(*new_elements) end end

This is shorter and simpler than the previous snippets in terms of number of lines and number of operations. The things I've noted are:

It modifies the input array.

The modified array is the one being iterated on.

The loop only does 1 single explicit array operation involving 2 different arrays.

It exploits the not-so-well-known yet logical behaviour of pushing to the array being enumerated.

Importantly, there is no explicit manual control over the loop, the behaviour of each is well-known and understanding the method would at most require the clarification in the last bullet point. The body is small, there is no other duplicated array, and it is more efficient in both time and very much in space, with the latter point, the avoidance of recursion and reduced surface for human error, and the fact you can embed it directly and be the same complexity as invoking it with a method being the reasons I'd prefer this version overall.

I don't feel strongly about this, so at this point I'll just accept any of the three, but the point is that readability/understandability has a subjective meaning and it helps noting down what you feel subtracts from it so that you can approach an objective notion (ie. like last&.save, which is odd to someone used to pre-2.3 code and is more common in recent versions, just depends on whether your brain is used to it and how it feels in combination of the rest of the code).

Regarding your first solution. I also thought about a recursive solution. I thought that it should not be a problem given the number of levels that we should expect, but was not too sure. If you think that it's not a problem, I think that something like this is better because it clearly expresses the idea that the descendants of a metrics are its children plus the descendants of each child:

def descendants(service_id, metric_name) metrics_hierarchy = hierarchy(service_id) children = metrics_hierarchy[metric_name] || [] children.reduce(children) do |acc, child| acc + descendants(service_id, child) end end

My concern is not the "clearly expresses" part but the performance - recursion is probably never going to be very deep, but the overhead of that in performance should be controlled (ie. you are adding a big K in time). Plus we should use Array#push rather than + to avoid duplicating arrays while dropping temporary ones all the time.

We can't measure everything, so we need to develop some sense about what is performant and what is not. The sentence "not worrying (much) about performance until we demonstrate an issue" is featured in many horror tales ending with "and that's why we rewrote the whole thing". :P

My sense of performance says this is not O(kN) but O(Kn) with a big K, and the recursive function not being tail-call means an extra penalty (but I think Ruby does not even optimize it if it was).

I'd accept the recursive version with Array#push with the caveat that the next guy that will touch this code will not realize that it is recursive when we finally have arbitrary depths, so at least this would warrant a big note in case someone comes back at you saying "this crashed in production and you didn't warn me!".

For completeness, here's a quick benchmark:

Comparison: each: 2163979.4 i/s orig: 1428006.8 i/s - 1.52x slower rec: 1317247.2 i/s - 1.64x slower Benchmarks finished

The first being the .each solution, the second the original PR and the last the fully recursive version (all of them using Array#push).

Not that important unless this sits in the hot path of auths.

It might be ~1.5x slower but we don't really know if that is a problem.
What if this method represents only ~0.00...% of the total CPU time?

What if it does not? What if it does represent a lot in certain cases with lots of metrics? :D
Also, why should one piece of code be slow just because every other place is slow? :/

Death by a thousand cuts, ok, you win now - add it already!

That's why I mentioned several comments above that we need profiling to be sure about this kind of things.

Changed.

unleashed · 2019-09-01T19:01:47Z

+
+            res += children
+            pending += children
+          end


The number of lines is not super important, but it is an indicator of complexity. I remember that I struggled to understand this code when I went over it. The concept of "readability" can be very subjective, sometimes depending on the familiarity of people with different code and styles, so I have gone over the code again and noted down some things. For the sake of discussion I've extracted the relevant code into a method, and lacking a better term, I've named it zipped_deep_map:

def descendants(service_id, metric_name) metrics_hierarchy = hierarchy(service_id) children = metrics_hierarchy[metric_name] || [] zipped_deep_map(children) do |m| metrics_hierarchy[m] || [] end end private # zip each element with its mapped results def zipped_deep_map(ary, &blk) # ... end

Ok, here's the PR's version:

def zipped_deep_map(ary, &blk) result = ary pending = ary.dup until pending.empty? e = pending.shift new_elements = blk.call e pending.push(*new_elements) result.push(*new_elements) end result end

Note: I've fixed the 2x duplicate array creation in each loop done by the += operations by replacing them with Array#push with a splat argument.

So this is what I've noticed from the POV of readability/ease of understanding:

The loop is not an enumerator method or a combination of them, but a manual loop in imperative style.

Because of the above, the loop condition depends on understanding what effect the body has.

The loop requires a set-up prologue that is only understood when you understand the loop body and condition.

The body performs 2 different operations 3 times involving 3 different arrays (actually 4 but result is an alias of ary).

The original array is modified as an intended side effect of the loop.

The array over which the iteration is being done is modified twice in each iteration.

Well, here's a version in less lines that clearly separates the queue of pending metrics, does not even modify the looping array, uses a well-known enumerator rather than a manual loop requiring context or set-up, and only does one explicit unary array modification referred to by method name, flatten. This is arguably a more understandable version by the mentioned metrics:

def zipped_deep_map(ary, &blk) ary.map do |e| new_elements = blk.call e [e, zipped_deep_map(new_elements, &blk)] end.flatten end

Of course, this snippet uses recursion, which in this particular case might be acceptable considering the depth levels, and it ends up combining a lot of arrays and generating a different order, but the key in readability is that it is not some arbitrary behaviour that needs to be parsed and analysed with multiple pieces needing to be considered together. It uses the well-known map and flatten methods instead.

The snippet below is the last proposed solution in my original comment:

def zipped_deep_map(ary, &blk) ary.each do |e| new_elements = blk.call e ary.push(*new_elements) end end

This is shorter and simpler than the previous snippets in terms of number of lines and number of operations. The things I've noted are:

It modifies the input array.

The modified array is the one being iterated on.

The loop only does 1 single explicit array operation involving 2 different arrays.

It exploits the not-so-well-known yet logical behaviour of pushing to the array being enumerated.

Importantly, there is no explicit manual control over the loop, the behaviour of each is well-known and understanding the method would at most require the clarification in the last bullet point. The body is small, there is no other duplicated array, and it is more efficient in both time and very much in space, with the latter point, the avoidance of recursion and reduced surface for human error, and the fact you can embed it directly and be the same complexity as invoking it with a method being the reasons I'd prefer this version overall.

I don't feel strongly about this, so at this point I'll just accept any of the three, but the point is that readability/understandability has a subjective meaning and it helps noting down what you feel subtracts from it so that you can approach an objective notion (ie. like last&.save, which is odd to someone used to pre-2.3 code and is more common in recent versions, just depends on whether your brain is used to it and how it feels in combination of the rest of the code).

unleashed · 2019-09-01T22:38:51Z

-                memo[p_id] = memo[id]
+            is_set_op = Usage.is_set?(val)
+
+            while p_id


This would be a legitimate case for while p_id = parent_id(id) removing the assignments both above and below.

Not really, because id is defined in the outer loop.

But this is p_id, which you only use inside the loop :?

You haven't answered this comment, @davidor

I did. It doesn't work because id is in the outer loop.

So if we wanted to put the condition in the while, we'd need to do something like

... current_metric_id = id while (current_metric_id = parent_id(current_metric_id)) ... end ...

Notice that we'd need to initialize the variable anyway, and whether that makes the code more readable is debatable.

You don't need to, it's enough to just name it id and avoid the special casing of the first id not being a real parent id, since you don't do anything particular with it. There's enough context to not need any distinction.

Edit: while id = parent_id(id)

How's that more readable? It shadows the id defined in do |memo, (id, val)|

It does not shadow it, the id yielded is never used again at all, just to bootstrap the process. IOW, you can extract the inner loop to a method and you'll have exactly the same code, and more readable.

I don't have a strong opinion about this. To me, the 2 solutions are roughly equivalent and it kind of comes to personal preference. I've changed it so we can move on.

…hierarchy

unleashed

👍

davidor · 2019-09-03T14:09:58Z

bors r=@unleashed

119: Add support for metric hierarchies with more than 2 levels r=unleashed a=davidor This PR addresses some of the points in #114 The PR adds support for metric hierarchies with more than 2 levels. For example: `m1 --child_of--> m2 --child_of --> m3` In particular: - Auth, authrep, and report calls now take into account the whole hierarchy when applying limits. - The XML returned in authrep calls now shows an updated value for the `current_value` field in all the metrics affected in the hierarchy. This PR does not adapt the limits and the hierarchy extensions to work with metric hierarchies of more than 2 levels. That will be done in a separate PR. Co-authored-by: David Ortiz <z.david.ortiz@gmail.com>

bors · 2019-09-03T14:13:18Z

Build succeeded

ci/circleci

davidor requested a review from unleashed August 28, 2019 08:44

unleashed reviewed Aug 28, 2019

View reviewed changes

davidor force-pushed the multi-level-metric-hierarchies branch 4 times, most recently from 111b878 to 2d011d0 Compare August 29, 2019 14:39

davidor requested a review from unleashed August 29, 2019 14:59

unleashed reviewed Sep 2, 2019

View reviewed changes

test/test_helpers: add helper to generate a metrics hierarchy

ee6bc0d

davidor force-pushed the multi-level-metric-hierarchies branch from 2d011d0 to 1535cc9 Compare September 2, 2019 09:46

davidor mentioned this pull request Sep 2, 2019

Test that the hierarchy extension works well for hierarchies with more than 2 levels #121

Merged

davidor force-pushed the multi-level-metric-hierarchies branch 2 times, most recently from 50a4562 to df760e7 Compare September 2, 2019 17:41

davidor requested a review from unleashed September 2, 2019 21:12

davidor force-pushed the multi-level-metric-hierarchies branch from df760e7 to 19264ee Compare September 3, 2019 10:04

davidor mentioned this pull request Sep 3, 2019

Limits Header extension: add support for metric hierarchies with more than 2 levels #122

Merged

davidor added 10 commits September 3, 2019 15:43

metric/collection: change #process_parents to process all the hierarchy

cb7f89f

test/unit/metric/collection: test hierarchies with n_levels >2

b2457bd

metric: add #descendants

4715357

test/unit/metric: test #descendants

259716e

usage_report: calculate usages for all descendants

aa8f02c

test/test_helpers/fixtures: add helper to set service with a metrics …

011697a

…hierarchy

test/integration/authorize: test metric hierarchies with >2 levels

78c7e18

test/integration/oauth: test metric hierarchies with >2 levels

b1f6034

test/integration/authrep: test metric hierarchies with >2 levels

c8b3d00

test/integration/report: test metric hierarchies with >2 levels

a57ab99

davidor force-pushed the multi-level-metric-hierarchies branch from 19264ee to a57ab99 Compare September 3, 2019 13:43

unleashed approved these changes Sep 3, 2019

View reviewed changes

Comment thread lib/3scale/backend/metric/collection.rb

bors Bot merged commit a57ab99 into master Sep 3, 2019

bors Bot deleted the multi-level-metric-hierarchies branch September 3, 2019 14:13

davidor mentioned this pull request Sep 3, 2019

Ensure that metric hierarchies of more than 2 levels are supported #114

Closed

Conversation

davidor commented Aug 28, 2019

Uh oh!

unleashed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidor Sep 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidor Sep 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

unleashed Sep 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

unleashed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

davidor commented Sep 3, 2019

Uh oh!

bors Bot commented Sep 3, 2019

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

davidor Sep 2, 2019 •

edited

Loading

davidor Sep 2, 2019 •

edited

Loading

unleashed Sep 3, 2019 •

edited

Loading