Skip to content

Add support for metric hierarchies with more than 2 levels#119

Merged
bors[bot] merged 11 commits intomasterfrom
multi-level-metric-hierarchies
Sep 3, 2019
Merged

Add support for metric hierarchies with more than 2 levels#119
bors[bot] merged 11 commits intomasterfrom
multi-level-metric-hierarchies

Conversation

@davidor
Copy link
Copy Markdown
Contributor

@davidor davidor commented Aug 28, 2019

This PR addresses some of the points in #114

The PR adds support for metric hierarchies with more than 2 levels. For example:
m1 --child_of--> m2 --child_of --> m3

In particular:

  • Auth, authrep, and report calls now take into account the whole hierarchy when applying limits.
  • The XML returned in authrep calls now shows an updated value for the current_value field in all the metrics affected in the hierarchy.

This PR does not adapt the limits and the hierarchy extensions to work with metric hierarchies of more than 2 levels. That will be done in a separate PR.

@davidor davidor requested a review from unleashed August 28, 2019 08:44
Copy link
Copy Markdown
Contributor

@unleashed unleashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good - check comments. Only minor concerns are ensuring we always behave like usages from the full hierarchy are being taken into account (ie. see comment about adding children usages) and making tests a bit more generic to cover N-level depths, with N > 3, possibly chosen at random.

Comment thread lib/3scale/backend/metric/collection.rb Outdated
Comment thread test/unit/metric/collection_test.rb Outdated
Comment thread test/test_helpers/fixtures.rb Outdated
Comment thread lib/3scale/backend/metric.rb Outdated
Comment thread lib/3scale/backend/metric.rb Outdated

res += children
pending += children
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty unidiomatic code and can be improved a fair bit. The suggestions below are totally untested, so take them with a grain of salt and consider any needed fixes when comparing to the original above.

The pending array could just use indexes to point to the next res element to handle, breaking when the result is nil, avoiding the duplicated memory and instead dealing with integers:

idx = 0
loop do
  metric = res[idx] || break res  # `res` as return value will not be needed after the loop
  children = metric_hiearchy[metric] || next
  res += children
  idx += 1
end

You could do a similar thing if you use an enumerator over res and keep the same object. Note that in Ruby it is safe to modify the underlying array for our purposes as only a reference to the object is kept (but only useful for us to push at the end), but creating a new array and assigning res to it wouldn't work (ie. using +). This also avoids duplicating temporary arrays.

metrics = res.each
loop do
  metric = metrics.next # StopIteration is handled by `loop`, no rescues needed
  children = metric_hiearchy[metric] || next
  res.push(*children) # splat children to have a flattened push
end
res  # this way we have to return res here

You can shrink it further by using the usual enumerator form:

res.each do |m|
  children = metric_hiearchy[m] || next
  res.push(*children)
end  # return value is already the `res` array we've been working on

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree. I think that my solution clearly separates the result and the queue of pending metrics. It has more lines, yes, but I think it's easier to understand.
In the solutions that you proposed you are iterating over a growing array directly or via pointers. I find that more difficult to reason about.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of lines is not super important, but it is an indicator of complexity. I remember that I struggled to understand this code when I went over it. The concept of "readability" can be very subjective, sometimes depending on the familiarity of people with different code and styles, so I have gone over the code again and noted down some things. For the sake of discussion I've extracted the relevant code into a method, and lacking a better term, I've named it zipped_deep_map:

def descendants(service_id, metric_name)
  metrics_hierarchy = hierarchy(service_id)
  children = metrics_hierarchy[metric_name] || []

  zipped_deep_map(children) do |m|
    metrics_hierarchy[m] || []
  end
end

private

# zip each element with its mapped results
def zipped_deep_map(ary, &blk)
  # ...
end

Ok, here's the PR's version:

def zipped_deep_map(ary, &blk)
  result = ary
  pending = ary.dup

  until pending.empty?
    e = pending.shift
    new_elements = blk.call e
    pending.push(*new_elements)
    result.push(*new_elements)
  end

  result
end

Note: I've fixed the 2x duplicate array creation in each loop done by the += operations by replacing them with Array#push with a splat argument.

So this is what I've noticed from the POV of readability/ease of understanding:

  • The loop is not an enumerator method or a combination of them, but a manual loop in imperative style.
  • Because of the above, the loop condition depends on understanding what effect the body has.
  • The loop requires a set-up prologue that is only understood when you understand the loop body and condition.
  • The body performs 2 different operations 3 times involving 3 different arrays (actually 4 but result is an alias of ary).
  • The original array is modified as an intended side effect of the loop.
  • The array over which the iteration is being done is modified twice in each iteration.

Well, here's a version in less lines that clearly separates the queue of pending metrics, does not even modify the looping array, uses a well-known enumerator rather than a manual loop requiring context or set-up, and only does one explicit unary array modification referred to by method name, flatten. This is arguably a more understandable version by the mentioned metrics:

def zipped_deep_map(ary, &blk)
  ary.map do |e|
    new_elements = blk.call e
    [e, zipped_deep_map(new_elements, &blk)]
  end.flatten
end

Of course, this snippet uses recursion, which in this particular case might be acceptable considering the depth levels, and it ends up combining a lot of arrays and generating a different order, but the key in readability is that it is not some arbitrary behaviour that needs to be parsed and analysed with multiple pieces needing to be considered together. It uses the well-known map and flatten methods instead.

The snippet below is the last proposed solution in my original comment:

def zipped_deep_map(ary, &blk)
  ary.each do |e|
    new_elements = blk.call e
    ary.push(*new_elements)
  end
end

This is shorter and simpler than the previous snippets in terms of number of lines and number of operations. The things I've noted are:

  • It modifies the input array.
  • The modified array is the one being iterated on.
  • The loop only does 1 single explicit array operation involving 2 different arrays.
  • It exploits the not-so-well-known yet logical behaviour of pushing to the array being enumerated.

Importantly, there is no explicit manual control over the loop, the behaviour of each is well-known and understanding the method would at most require the clarification in the last bullet point. The body is small, there is no other duplicated array, and it is more efficient in both time and very much in space, with the latter point, the avoidance of recursion and reduced surface for human error, and the fact you can embed it directly and be the same complexity as invoking it with a method being the reasons I'd prefer this version overall.

I don't feel strongly about this, so at this point I'll just accept any of the three, but the point is that readability/understandability has a subjective meaning and it helps noting down what you feel subtracts from it so that you can approach an objective notion (ie. like last&.save, which is odd to someone used to pre-2.3 code and is more common in recent versions, just depends on whether your brain is used to it and how it feels in combination of the rest of the code).

Copy link
Copy Markdown
Contributor Author

@davidor davidor Sep 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your first solution. I also thought about a recursive solution. I thought that it should not be a problem given the number of levels that we should expect, but was not too sure. If you think that it's not a problem, I think that something like this is better because it clearly expresses the idea that the descendants of a metrics are its children plus the descendants of each child:

        def descendants(service_id, metric_name)
          metrics_hierarchy = hierarchy(service_id)
          children = metrics_hierarchy[metric_name] || []

          children.reduce(children) do |acc, child|
            acc + descendants(service_id, child)
          end
        end

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is not the "clearly expresses" part but the performance - recursion is probably never going to be very deep, but the overhead of that in performance should be controlled (ie. you are adding a big K in time). Plus we should use Array#push rather than + to avoid duplicating arrays while dropping temporary ones all the time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't measure everything, so we need to develop some sense about what is performant and what is not. The sentence "not worrying (much) about performance until we demonstrate an issue" is featured in many horror tales ending with "and that's why we rewrote the whole thing". :P

My sense of performance says this is not O(kN) but O(Kn) with a big K, and the recursive function not being tail-call means an extra penalty (but I think Ruby does not even optimize it if it was).

I'd accept the recursive version with Array#push with the caveat that the next guy that will touch this code will not realize that it is recursive when we finally have arbitrary depths, so at least this would warrant a big note in case someone comes back at you saying "this crashed in production and you didn't warn me!".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, here's a quick benchmark:

Comparison:
                each:  2163979.4 i/s
                orig:  1428006.8 i/s - 1.52x  slower
                 rec:  1317247.2 i/s - 1.64x  slower

Benchmarks finished

The first being the .each solution, the second the original PR and the last the fully recursive version (all of them using Array#push).

Not that important unless this sits in the hot path of auths.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be ~1.5x slower but we don't really know if that is a problem.
What if this method represents only ~0.00...% of the total CPU time?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it does not? What if it does represent a lot in certain cases with lots of metrics? :D
Also, why should one piece of code be slow just because every other place is slow? :/

Death by a thousand cuts, ok, you win now - add it already!

Copy link
Copy Markdown
Contributor Author

@davidor davidor Sep 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I mentioned several comments above that we need profiling to be sure about this kind of things.

Changed.

Comment thread lib/3scale/backend/metric.rb Outdated
Comment thread test/unit/metric_test.rb Outdated
Comment thread lib/3scale/backend/transactor/usage_report.rb
Comment thread lib/3scale/backend/transactor/usage_report.rb
@davidor davidor force-pushed the multi-level-metric-hierarchies branch 4 times, most recently from 111b878 to 2d011d0 Compare August 29, 2019 14:39
@davidor davidor requested a review from unleashed August 29, 2019 14:59

res += children
pending += children
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of lines is not super important, but it is an indicator of complexity. I remember that I struggled to understand this code when I went over it. The concept of "readability" can be very subjective, sometimes depending on the familiarity of people with different code and styles, so I have gone over the code again and noted down some things. For the sake of discussion I've extracted the relevant code into a method, and lacking a better term, I've named it zipped_deep_map:

def descendants(service_id, metric_name)
  metrics_hierarchy = hierarchy(service_id)
  children = metrics_hierarchy[metric_name] || []

  zipped_deep_map(children) do |m|
    metrics_hierarchy[m] || []
  end
end

private

# zip each element with its mapped results
def zipped_deep_map(ary, &blk)
  # ...
end

Ok, here's the PR's version:

def zipped_deep_map(ary, &blk)
  result = ary
  pending = ary.dup

  until pending.empty?
    e = pending.shift
    new_elements = blk.call e
    pending.push(*new_elements)
    result.push(*new_elements)
  end

  result
end

Note: I've fixed the 2x duplicate array creation in each loop done by the += operations by replacing them with Array#push with a splat argument.

So this is what I've noticed from the POV of readability/ease of understanding:

  • The loop is not an enumerator method or a combination of them, but a manual loop in imperative style.
  • Because of the above, the loop condition depends on understanding what effect the body has.
  • The loop requires a set-up prologue that is only understood when you understand the loop body and condition.
  • The body performs 2 different operations 3 times involving 3 different arrays (actually 4 but result is an alias of ary).
  • The original array is modified as an intended side effect of the loop.
  • The array over which the iteration is being done is modified twice in each iteration.

Well, here's a version in less lines that clearly separates the queue of pending metrics, does not even modify the looping array, uses a well-known enumerator rather than a manual loop requiring context or set-up, and only does one explicit unary array modification referred to by method name, flatten. This is arguably a more understandable version by the mentioned metrics:

def zipped_deep_map(ary, &blk)
  ary.map do |e|
    new_elements = blk.call e
    [e, zipped_deep_map(new_elements, &blk)]
  end.flatten
end

Of course, this snippet uses recursion, which in this particular case might be acceptable considering the depth levels, and it ends up combining a lot of arrays and generating a different order, but the key in readability is that it is not some arbitrary behaviour that needs to be parsed and analysed with multiple pieces needing to be considered together. It uses the well-known map and flatten methods instead.

The snippet below is the last proposed solution in my original comment:

def zipped_deep_map(ary, &blk)
  ary.each do |e|
    new_elements = blk.call e
    ary.push(*new_elements)
  end
end

This is shorter and simpler than the previous snippets in terms of number of lines and number of operations. The things I've noted are:

  • It modifies the input array.
  • The modified array is the one being iterated on.
  • The loop only does 1 single explicit array operation involving 2 different arrays.
  • It exploits the not-so-well-known yet logical behaviour of pushing to the array being enumerated.

Importantly, there is no explicit manual control over the loop, the behaviour of each is well-known and understanding the method would at most require the clarification in the last bullet point. The body is small, there is no other duplicated array, and it is more efficient in both time and very much in space, with the latter point, the avoidance of recursion and reduced surface for human error, and the fact you can embed it directly and be the same complexity as invoking it with a method being the reasons I'd prefer this version overall.

I don't feel strongly about this, so at this point I'll just accept any of the three, but the point is that readability/understandability has a subjective meaning and it helps noting down what you feel subtracts from it so that you can approach an objective notion (ie. like last&.save, which is odd to someone used to pre-2.3 code and is more common in recent versions, just depends on whether your brain is used to it and how it feels in combination of the rest of the code).

Comment thread lib/3scale/backend/transactor/usage_report.rb
Comment thread test/test_helpers/metrics_hierarchy.rb
Comment thread test/unit/metric/collection_test.rb Outdated
Comment thread test/unit/metric_test.rb Outdated
Comment thread test/test_helpers/fixtures.rb Outdated
Comment thread test/test_helpers/fixtures.rb Outdated
Comment thread lib/3scale/backend/metric/collection.rb Outdated
memo[p_id] = memo[id]
is_set_op = Usage.is_set?(val)

while p_id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a legitimate case for while p_id = parent_id(id) removing the assignments both above and below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, because id is defined in the outer loop.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is p_id, which you only use inside the loop :?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't answered this comment, @davidor

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did. It doesn't work because id is in the outer loop.

So if we wanted to put the condition in the while, we'd need to do something like

...
current_metric_id = id
while (current_metric_id = parent_id(current_metric_id))
...
end
...

Notice that we'd need to initialize the variable anyway, and whether that makes the code more readable is debatable.

Copy link
Copy Markdown
Contributor

@unleashed unleashed Sep 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to, it's enough to just name it id and avoid the special casing of the first id not being a real parent id, since you don't do anything particular with it. There's enough context to not need any distinction.

Edit: while id = parent_id(id)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's that more readable? It shadows the id defined in do |memo, (id, val)|

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not shadow it, the id yielded is never used again at all, just to bootstrap the process. IOW, you can extract the inner loop to a method and you'll have exactly the same code, and more readable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion about this. To me, the 2 solutions are roughly equivalent and it kind of comes to personal preference. I've changed it so we can move on.

@davidor davidor force-pushed the multi-level-metric-hierarchies branch from 19264ee to a57ab99 Compare September 3, 2019 13:43
Copy link
Copy Markdown
Contributor

@unleashed unleashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread lib/3scale/backend/metric/collection.rb
@davidor
Copy link
Copy Markdown
Contributor Author

davidor commented Sep 3, 2019

bors r=@unleashed

bors Bot added a commit that referenced this pull request Sep 3, 2019
119: Add support for metric hierarchies with more than 2 levels r=unleashed a=davidor

This PR addresses some of the points in #114 

The PR adds support for metric hierarchies with more than 2 levels. For example:
`m1 --child_of--> m2 --child_of --> m3`

In particular:
- Auth, authrep, and report calls now take into account the whole hierarchy when applying limits.
- The XML returned in authrep calls now shows an updated value for the `current_value` field in all the metrics affected in the hierarchy.

This PR does not adapt the limits and the hierarchy extensions to work with metric hierarchies of more than 2 levels. That will be done in a separate PR.

Co-authored-by: David Ortiz <z.david.ortiz@gmail.com>
@bors
Copy link
Copy Markdown
Contributor

bors Bot commented Sep 3, 2019

Build succeeded

@bors bors Bot merged commit a57ab99 into master Sep 3, 2019
@bors bors Bot deleted the multi-level-metric-hierarchies branch September 3, 2019 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants