Feature/delete service stats keys generator#74
Conversation
|
In general, I found the code a bit hard to follow. Here are a few reasons why I think the code might be more complex than it should:
|
|
Some comments: 👍 WIP 👎 The algorithm is challenging. IMO Enumerators technique is an elegant tool to implement key generators, specially when there are different key types and each key contains several parts each of which is itself a generator. Let's put off this discussion until you see tests and have an overview of all requirements of the algorithm. 👎 My friend Rubocop would complain with functions with too many arguments. They are all encapsulated in a job, which is a data container. Each generator takes only necessary part from that container. 👍 Done |
|
more comments 👍 will be updated. 👍 about 👎 Regarding 👎 We need to be very careful. Applied granularity set for a generator depends on the key type. For instance, service type keys do no have |
b31d7e4 to
f1f2cb5
Compare
|
That's all about Some serialization methods, internal private validation method and I do not see coupling issues |
mikz
left a comment
There was a problem hiding this comment.
For an outside contributor, this seems overcomplicated. I don't really see the need for enumerators or generators and factories.
This snippet does 80% of the work with 20% of the code. The only missing part is generating periods, which still can be done quite nicely in a few lines of code.
class SimpleKeyGenerator
attr_reader :service_id, :applications, :metrics, :users
def initialize(service_id: , applications: [], metrics: [],users: [], from:, to:)
@service_id = service_id
@applications = applications
@metrics = metrics
@users = users
@from = from
@until = to
end
def periods
{
hour: [ 'generate hours' ],
day: [ 'generate days']
}
end
def to_a
periods.flat_map do |granularity, start_time|
RESPONSE_CODES.flat_map do |response_code|
# stats/{service:#{service_id}}/response_code:#{response_code}/#{period_granularity}[:#{period_start_time_compacted_to_seconds}]
%W[
stats/{service:#{service_id}}/response_code:#{response_code}/#{granularity}[:#{start_time}]
] +
applications.flat_map do |application_id|
# stats/{service:#{service_id}}/cinstance:#{application_id}/response_code:#{response_code}/#{period_granularity}[:#{period_start_time_compacted_to_seconds}]
"stats/{service:#{service_id}}/cinstance:#{application_id}/response_code:#{response_code}/#{granularity}[:#{start_time}]"
end +
users.flat_map do |user_id|
# stats/{service:#{service_id}}/uinstance:#{user_id}/response_code:#{response_code}/#{period_granularity}[:#{period_start_time_compacted_to_seconds}]
"stats/{service:#{service_id}}/uinstance:#{user_id}/response_code:#{response_code}/#{granularity}[:#{start_time}]"
end
end +
metrics.flat_map do |metric_id|
# stats/{service:#{service_id}}/metric:#{metric_id}/#{period_granularity}[:#{period_start_time_compacted_to_seconds}]
%W[
stats/{service:#{service_id}}/metric:#{metric_id}/#{granularity}[:#{start_time}]
] +
applications.flat_map do |application_id|
# stats/{service:#{service_id}}/cinstance:#{application_id}/metric:#{metric_id}/#{period_granularity}[:#{period_start_time_compacted_to_seconds}]
"stats/{service:#{service_id}}/cinstance:#{application_id}/metric:#{metric_id}/#{granularity}[:#{start_time}]"
end +
users.flat_map do |user_id|
# stats/{service:#{service_id}}/uinstance:#{user_id}/metric:#{metric_id}/#{period_granularity}[:#{period_start_time_compacted_to_seconds}]
"stats/{service:#{service_id}}/uinstance:#{user_id}/metric:#{metric_id}/#{granularity}[:#{start_time}]"
end
end
end
end
end|
@mikz fair enough. I guess this needs some explanation. Mi first implementation was the same, using nested iterators and appending results. applications.each do |app|
metrics.each do |metric|
periods.each do |period|
key_formatter(metric, app, period)
end
end + ...You can even use applications.product(metrics, periods).map do |app, metric, period|
key_formatter(metric, app, period)
endThis is simple implementation and easy to understand. However, IMO, it is tightly coupled and has no separation of concerns at all. If you ever need to extend it, you would need to modify it. It breaks S, O and D of the SOLID principles. It works? yeah it does. So I took a step back and analyzed the problem to give a more generalized solutions. We need to generate keys. There are different - keytype:
name: A
parts:
- apps:
generators:
- Apps Generator
- metrics:
generators:
- Metric Generator
- period:
generators:
- year Generator
- day Generator
- month Generator
- keytype:
name: B
parts:
- metrics:
generators:
- Metric Generator
- period:
generators:
- day Generator
- month GeneratorSo, based on concepts Defining new keys or updating existing keys is not a task of developing new nested iterators, it is a definition/configuration task. More declarative style. Let's see how we define some KeyType: response_code_keypart = KeyPart.new(:response_code)
response_code_keypart << ResponseCodeKeyPartGenerator.new(job)
application_keypart = KeyPart.new(:application)
application_keypart << AppKeyPartGenerator.new(job)
key_type = KeyType.new(key_formatter)
key_type << response_code_keypart
key_type << application_keypartThis can be improved. No doubt. It should be very easy to implement a factory that generates this code out of a Regarding outsider contributor who needs to add new key type or update existing one, he/she only needs to work on configuration issues and not implementation details of nested iterators. I would be more than happy to hear the opinion of an outsider contributor. This is the value proposal of this PR. If you do not approve it, it has to be thrown away and reimplement from scratch. This proposal works. It generates keys in a stream way, instead of loading all them in memory and it tries to provide flexible and easy way to define/update keys. Finally, I propose to go on with this proposal. Once merged, feel free to open PR with alternative implementation that does 80% of the work with 20% of the code. I will be more than happy to discuss about it. |
|
@eguzki No matter how you keep explaining it is still overcomplicated. There are no new keys being added and there is no need for a generalized solution. The root of all evil is imaginary problems. Using You are right there is no need for generating this from yaml as there is no need for the generators and factories. It is a useless wall of code, that is hard to understand and make sense of when it can be implemented in 15 lines of code. |
|
Using Array#product loads all keys in memory. 300K keys of stats per metric, application, user and year. |
|
@eguzki from what I see your enumerators are doing pretty much the same, eventually the array has to be constructed anyway, right? |
|
@mikz enumerators only generate keys as long as you keep requesting them. |
|
@eguzki sure, why you can't do that with |
what do other think? @davidor @unleashed |
|
@eguzki I don't know whether you saw my comment above, but I don't think this is in a mergeable state and I think we can wait to land this with some degree of consensus. I agree with Michal this looks overly complex for the task and it could also reuse existing code. |
|
I prefer @mikz 's solution because it can be easily understood. Although I think we'd need to adapt it a bit, at least to use the Key helpers defined in https://github.com/3scale/apisonator/blob/master/lib/3scale/backend/stats/keys.rb or, alternatively, if that module cannot be easily used from the code we need in this PR, it should be adapted. I think we should adopt @mikz 's solution unless there's a compelling reason against it. |
| context 'responsecode_service keys' do | ||
| let(:expected_keys) do | ||
| %w[200 2XX 403 404 4XX 500 503 5XX].product(%i[hour day week month eternity]).map do |code, gr| | ||
| ThreeScale::Backend::Stats::Keys.service_response_code_value_key(service_id, code, ThreeScale::Backend::Period[gr].new(from)) |
There was a problem hiding this comment.
Would be good to have a comment somewhere explaining that these tests just check that a subset of the keys that are supposed to be generated are there.
There was a problem hiding this comment.
each context checks a subset of keys. That's what you mean?
|
@eguzki The code looks good to me 👍 I have just one comment. I saw that in the tests, if I'm not mistaken, you're not verifying that all the keys that should be generated actually were. Also, you're not checking that unnecessary keys were not generated. You're just checking a subset. Any reason to do that? It should probably be documented. |
|
I am verifying that all the keys that should be generated actually are. I am missing something? True that I am not verifying unnecessary keys are not generated. Will do it |
|
@eguzki Take for example
|
|
For simplicity, time frame was on 1 unit. One day, one hour, one month, one year.... Will open time window to a month. |
|
@davidor tests now check a time window of one month (do you want one year :=) ) and check extra keys are not generated |
| is_expected.to include(*expected_keys_usage_user) | ||
| end | ||
| end | ||
| context 'usage user keys' do |
There was a problem hiding this comment.
Would the contains_exactly matcher be more appropriate? https://relishapp.com/rspec/rspec-expectations/v/3-8/docs/built-in-matchers/contain-exactly-matcher
There was a problem hiding this comment.
RSpec.describe [1, 2, 3] do
it { is_expected.to contain_exactly(1, 2) }
end
does not pass.
Why are checking subsets, easier to detect errors when one subset does not generate appropriately
|
@eguzki I think that the Everything else looks good to me. Can you please squash the old commits before merging? |
|
merging to integration branch |
|
@davidor forgot to squash old commits. will do it before opening PR to master |
Stats key generator used by
Stats::PartitionEraserJobandStats::PartitionGeneratorJobbackground jobs