Feature/delete service stats keys generator by eguzki · Pull Request #74 · 3scale/apisonator

eguzki · 2019-02-06T11:39:21Z

Stats key generator used by Stats::PartitionEraserJob and Stats::PartitionGeneratorJob background jobs

davidor · 2019-02-18T14:16:04Z

In general, I found the code a bit hard to follow. Here are a few reasons why I think the code might be more complex than it should:

There are no tests.
In my opinion, the way Enumerators are used makes the code more difficult to understand and know which keys are generated where.
KeyTypesFactory is coupled with DeleteJobDef. I think that the generator of keys should only receive the params it cares about (service, apps, metrics, users, to, from).
The code generates invalid keys. It generates invalid response code "buckets" ("1XX", "9XX", etc.), and also, it generates keys for specific codes like 200, 403, etc. which are not used.
PartitionGenerator recibes an instance of KeyGenerator, but it only needs to know the total number of keys. I'd say this is an unnecessary coupling. Moreover, this is a class that only calls .step. I think we should not define a new class just for this.
There are a few things that can be simplified. For example, there's a ServiceKeyPartGenerator but it feels unnecessary to me because, in the job, there's only one service ID. Same for ResponseCodeKeyPartGenerator, the list of "buckets" for response codes is fixed and defined in a constant of the Stats::Commons module.
I think that the KeyPartGenerators duplicates code already present in the Period module. For example, the .succ method of that module could be used to replace code in KeyPartGenerators.

eguzki · 2019-02-21T18:12:56Z

Some comments:

There are no tests.

👍 WIP

In my opinion, the way Enumerators are used makes the code more difficult to understand and know which keys are generated where.

👎 The algorithm is challenging. IMO Enumerators technique is an elegant tool to implement key generators, specially when there are different key types and each key contains several parts each of which is itself a generator. Let's put off this discussion until you see tests and have an overview of all requirements of the algorithm.

KeyTypesFactory is coupled with DeleteJobDef. I think that the generator of keys should only receive the params it cares about (service, apps, metrics, users, to, from).

👎 My friend Rubocop would complain with functions with too many arguments. They are all encapsulated in a job, which is a data container. Each generator takes only necessary part from that container.

The code generates invalid keys. It generates invalid response code "buckets" ("1XX", "9XX", etc.), and also, it generates keys for specific codes like 200, 403, etc. which are not used.

👍 Done

eguzki · 2019-02-21T18:33:52Z

more comments

PartitionGenerator recibes an instance of KeyGenerator, but it only needs to know the total number of keys. I'd say this is an unnecessary coupling. Moreover, this is a class that only calls .step. I think we should not define a new class just for this.

👍 will be updated.

There are a few things that can be simplified. For example, there's a ServiceKeyPartGenerator but it feels unnecessary to me because, in the job, there's only one service ID. Same for ResponseCodeKeyPartGenerator, the list of "buckets" for response codes is fixed and defined in a constant of the Stats::Commons module.

👍 about ServiceKeyPartGenerator. ServiceID could be passed to the key formatters (those building the key string out of all parts) instead of being another part of the key with its own generator. Will update.

👎 Regarding ResponseCodeKeyPartGenerator, I disagree. This is all about generators. In this specific case, the generator just needs to iterate over a list of response codes, but still is a generator. The only difference is the source of the list. Other generators take data from resque job, while ResponseCodeKeyPartGenerator takes data source from array defined in commons. But still needs to generate items in the same way.

I think that the KeyPartGenerators duplicates code already present in the Period module. For example, the .succ method of that module could be used to replace code in KeyPartGenerators

👎 We need to be very careful. Applied granularity set for a generator depends on the key type. For instance, service type keys do no have year as period. We can not use .succ method, we need to define closed groups of granularities to be used by each key type. This is done in Stats::Commons module.
Anyway, if there is another example where we can reuse code from Period module, I would be happy to hear and avoid DRY.

…tter

eguzki · 2019-02-26T14:47:20Z

That's all about ThreeScale::Backend::Stats::DeleteJobDef. Few attributes:

ATTRIBUTES = %i[service_id applications metrics users from to context_info].freeze

Some serialization methods, internal private validation method and run_async method. Maybe run_async method should not be part of ThreeScale::Backend::Stats::DeleteJobDef. But, this is a very simple class.

I do not see coupling issues

mikz

For an outside contributor, this seems overcomplicated. I don't really see the need for enumerators or generators and factories.

This snippet does 80% of the work with 20% of the code. The only missing part is generating periods, which still can be done quite nicely in a few lines of code.

class SimpleKeyGenerator
  attr_reader :service_id, :applications, :metrics, :users

  def initialize(service_id: , applications: [], metrics: [],users: [], from:, to:)
    @service_id = service_id
    @applications = applications
    @metrics = metrics
    @users = users
    @from = from
    @until = to
  end

  def periods
    {
        hour: [ 'generate hours' ],
        day: [ 'generate days']
    }
  end

  def to_a
    periods.flat_map do |granularity, start_time|
      RESPONSE_CODES.flat_map do |response_code|
        # stats/{service:#{service_id}}/response_code:#{response_code}/#{period_granularity}[:#{period_start_time_compacted_to_seconds}]
        %W[
          stats/{service:#{service_id}}/response_code:#{response_code}/#{granularity}[:#{start_time}]
        ] +

          applications.flat_map do |application_id|
            # stats/{service:#{service_id}}/cinstance:#{application_id}/response_code:#{response_code}/#{period_granularity}[:#{period_start_time_compacted_to_seconds}]
            "stats/{service:#{service_id}}/cinstance:#{application_id}/response_code:#{response_code}/#{granularity}[:#{start_time}]"
          end +

          users.flat_map do |user_id|
            # stats/{service:#{service_id}}/uinstance:#{user_id}/response_code:#{response_code}/#{period_granularity}[:#{period_start_time_compacted_to_seconds}]
            "stats/{service:#{service_id}}/uinstance:#{user_id}/response_code:#{response_code}/#{granularity}[:#{start_time}]"
          end
      end +

        metrics.flat_map do |metric_id|
          # stats/{service:#{service_id}}/metric:#{metric_id}/#{period_granularity}[:#{period_start_time_compacted_to_seconds}]
          %W[
            stats/{service:#{service_id}}/metric:#{metric_id}/#{granularity}[:#{start_time}]
          ] +

            applications.flat_map do |application_id|
              # stats/{service:#{service_id}}/cinstance:#{application_id}/metric:#{metric_id}/#{period_granularity}[:#{period_start_time_compacted_to_seconds}]
              "stats/{service:#{service_id}}/cinstance:#{application_id}/metric:#{metric_id}/#{granularity}[:#{start_time}]"
            end +

            users.flat_map do |user_id|
              # stats/{service:#{service_id}}/uinstance:#{user_id}/metric:#{metric_id}/#{period_granularity}[:#{period_start_time_compacted_to_seconds}]
              "stats/{service:#{service_id}}/uinstance:#{user_id}/metric:#{metric_id}/#{granularity}[:#{start_time}]"
            end
        end
    end
  end
end

eguzki · 2019-02-27T10:08:22Z

@mikz fair enough. I guess this needs some explanation.

Mi first implementation was the same, using nested iterators and appending results.

applications.each do |app|
  metrics.each do |metric|
    periods.each do |period|
      key_formatter(metric, app, period)
  end
end + ...

You can even use Array.product, although you would have to load all keys in memory

applications.product(metrics, periods).map do |app, metric, period|
  key_formatter(metric, app, period)
end

This is simple implementation and easy to understand. However, IMO, it is tightly coupled and has no separation of concerns at all. If you ever need to extend it, you would need to modify it. It breaks S, O and D of the SOLID principles. It works? yeah it does.

So I took a step back and analyzed the problem to give a more generalized solutions. We need to generate keys. There are different key types. Each type is identified with a string formatter and the formatter needs parts to compute final key. Thus, key type includes some Key Parts. Elements of theKey Parts are generated by Generators. But not only one, each part can be composed of by several Generators. This is the case for period. Single key part, period, includes part elements from several generators, i.e., year, month, day, etc... So the abstract diagram would be:

- keytype:
     name: A
     parts: 
       - apps:
            generators:
               - Apps Generator
       - metrics:
            generators:
               - Metric Generator
       - period:
              generators:
               - year Generator
               - day Generator
               - month Generator
- keytype:
     name: B
     parts: 
       - metrics:
            generators:
               - Metric Generator
       - period:
              generators:
               - day Generator
               - month Generator

So, based on concepts KeyType, KeyPart and Generator this PR is the implementation of this idea. This is the engine of a KeyGenerator whose scope can not be limited to just stats. We can use this generator for any kind of keys.

Defining new keys or updating existing keys is not a task of developing new nested iterators, it is a definition/configuration task. More declarative style. Let's see how we define some KeyType:

response_code_keypart = KeyPart.new(:response_code)           
response_code_keypart << ResponseCodeKeyPartGenerator.new(job)

application_keypart = KeyPart.new(:application)    
application_keypart << AppKeyPartGenerator.new(job)

key_type = KeyType.new(key_formatter)
key_type << response_code_keypart
key_type << application_keypart

This can be improved. No doubt. It should be very easy to implement a factory that generates this code out of a yaml object, for instance. But I thought there was no need for it. Anyway it is easier to work on key parts and generators, than working with nested iterators and appending them, which is error prone.

Regarding outsider contributor who needs to add new key type or update existing one, he/she only needs to work on configuration issues and not implementation details of nested iterators. I would be more than happy to hear the opinion of an outsider contributor.

This is the value proposal of this PR. If you do not approve it, it has to be thrown away and reimplement from scratch. This proposal works. It generates keys in a stream way, instead of loading all them in memory and it tries to provide flexible and easy way to define/update keys.

Finally, I propose to go on with this proposal. Once merged, feel free to open PR with alternative implementation that does 80% of the work with 20% of the code. I will be more than happy to discuss about it.

mikz · 2019-02-27T10:14:06Z

@eguzki No matter how you keep explaining it is still overcomplicated. There are no new keys being added and there is no need for a generalized solution. The root of all evil is imaginary problems.

Using Array#product would be a nice improvement.

You are right there is no need for generating this from yaml as there is no need for the generators and factories. It is a useless wall of code, that is hard to understand and make sense of when it can be implemented in 15 lines of code.

eguzki · 2019-02-27T10:16:40Z

Using Array#product loads all keys in memory.

300K keys of stats per metric, application, user and year.

mikz · 2019-02-27T10:19:12Z

@eguzki from what I see your enumerators are doing pretty much the same, eventually the array has to be constructed anyway, right?

eguzki · 2019-02-27T10:21:02Z

@mikz enumerators only generate keys as long as you keep requesting them.

mikz · 2019-02-27T10:38:47Z

@eguzki sure, why you can't do that with Array#product? ([ 1 ] * 300).enum_for(:product, [2] * 1000) gives you enumerator that is generating keys as you take them.

eguzki · 2019-02-27T10:50:05Z

Finally, I propose to go on with this proposal. Once merged, feel free to open PR with alternative implementation that does 80% of the work with 20% of the code. I will be more than happy to discuss about it.

what do other think? @davidor @unleashed

unleashed · 2019-02-27T10:55:36Z

@eguzki I don't know whether you saw my comment above, but I don't think this is in a mergeable state and I think we can wait to land this with some degree of consensus. I agree with Michal this looks overly complex for the task and it could also reuse existing code.

davidor · 2019-02-27T11:27:40Z

I prefer @mikz 's solution because it can be easily understood. Although I think we'd need to adapt it a bit, at least to use the Key helpers defined in https://github.com/3scale/apisonator/blob/master/lib/3scale/backend/stats/keys.rb or, alternatively, if that module cannot be easily used from the code we need in this PR, it should be adapted.

I think we should adopt @mikz 's solution unless there's a compelling reason against it.

eguzki · 2019-02-27T17:55:37Z

ready for review @mikz @davidor

davidor · 2019-02-28T10:04:12Z

+  context 'responsecode_service keys' do
+    let(:expected_keys) do
+      %w[200 2XX 403 404 4XX 500 503 5XX].product(%i[hour day week month eternity]).map do |code, gr|
+        ThreeScale::Backend::Stats::Keys.service_response_code_value_key(service_id, code, ThreeScale::Backend::Period[gr].new(from))


Would be good to have a comment somewhere explaining that these tests just check that a subset of the keys that are supposed to be generated are there.

each context checks a subset of keys. That's what you mean?

davidor · 2019-02-28T10:08:14Z

@eguzki The code looks good to me 👍

I have just one comment. I saw that in the tests, if I'm not mistaken, you're not verifying that all the keys that should be generated actually were. Also, you're not checking that unnecessary keys were not generated. You're just checking a subset. Any reason to do that? It should probably be documented.

eguzki · 2019-02-28T10:17:55Z

I am verifying that all the keys that should be generated actually are. I am missing something?

True that I am not verifying unnecessary keys are not generated. Will do it

davidor · 2019-02-28T10:40:06Z

@eguzki Take for example context 'responsecode_service keys'. Anyone works because all of them follow the same pattern.

expected_keys uses the from parameter but it does not use the to one. So for example, if the period spans over several months, only the first one will appear in the keys in expected_keys.

eguzki · 2019-02-28T10:44:11Z

For simplicity, time frame was on 1 unit. One day, one hour, one month, one year....

Will open time window to a month.

eguzki · 2019-02-28T11:04:32Z

@davidor tests now check a time window of one month (do you want one year :=) ) and check extra keys are not generated

davidor · 2019-02-28T11:07:23Z

+      is_expected.to include(*expected_keys_usage_user)
+    end
+  end
+  context 'usage user keys' do


Would the contains_exactly matcher be more appropriate? https://relishapp.com/rspec/rspec-expectations/v/3-8/docs/built-in-matchers/contain-exactly-matcher

@davidor

RSpec.describe [1, 2, 3] do it { is_expected.to contain_exactly(1, 2) } end

does not pass.

Why are checking subsets, easier to detect errors when one subset does not generate appropriately

Ah right, of course 👍

davidor · 2019-02-28T11:08:54Z

@eguzki I think that the contains_exactly matcher could help simplify some of the tests.

Everything else looks good to me. Can you please squash the old commits before merging?

eguzki · 2019-02-28T11:12:53Z

merging to integration branch

eguzki · 2019-02-28T11:15:37Z

@davidor forgot to squash old commits. will do it before opening PR to master

eguzki changed the base branch from master to feature/delete-service-stats-integration February 6, 2019 11:39

eguzki requested review from davidor and unleashed and removed request for unleashed February 6, 2019 11:40

eguzki mentioned this pull request Feb 6, 2019

[WIP] Feature/delete service stats #71

Closed

3 tasks

miguelsorianod mentioned this pull request Feb 15, 2019

[WIP] Feature/stats deletion #47

Closed

davidor reviewed Feb 15, 2019

View reviewed changes

Comment thread lib/3scale/backend/stats/commons.rb Outdated

davidor reviewed Feb 15, 2019

View reviewed changes

Comment thread lib/3scale/backend/stats/commons.rb Outdated

eguzki and others added 17 commits February 25, 2019 16:22

backend/stats/key_generator: key generator enumerator

c920eb2

backend/stats/key_part: stats key part

fa15ed2

backend/stats/key_part_generators: key part generator

ac1de53

backend/stats/key_type: stats key type

f17d015

backend/stats/key_types_factory: stats key type factory

0de39d5

backend/stats/partition_generator: stats key partition generator

316f0eb

backend/stats/commons: stats commons

0b229b8

backend/stats/key_part_formatters: stats key generator key part forma…

2eca5f9

…tter

stats/aggregators/base: move structures to stats commons

6119409

stats/aggregators/response_code: move structures to stats commons

f668966

stats/keys: methods to avoid using prefixes for key generation

14bb37a

backend/usage: update reference to Stats::Keys

40fee2a

stats/keys_spec: update reference to Stats::Keys

f717012

integration/report_test: update reference to Stats::Keys

f09228d

stats/commons: tracked code groups from tracked codes

b0243db

backend, backend/stats: add stats modules

e60d2e5

remove unnused backend/stats/partition_generator

f1f2cb5

eguzki force-pushed the feature/delete-service-stats-keys-generator branch from b31d7e4 to f1f2cb5 Compare February 25, 2019 15:38

backend/stats: include partition_generator_job

ae6e9f1

stats/codes_commons, stats/period_commons: split stats commons

3f41f3e

mikz reviewed Feb 26, 2019

View reviewed changes

Comment thread lib/3scale/backend/stats/commons.rb Outdated

Comment thread lib/3scale/backend/stats/key_part_generators.rb Outdated

stats key generator: simple implementation

ec9b946

davidor reviewed Feb 28, 2019

View reviewed changes

stats/key_generator_spec: check unnecessary keys are not generated

7fecceb

stats/key_generator_spec: tests time frame opened

483c52d

eguzki requested a review from davidor February 28, 2019 10:56

davidor reviewed Feb 28, 2019

View reviewed changes

eguzki merged commit 0e109f1 into feature/delete-service-stats-integration Feb 28, 2019

bors Bot deleted the feature/delete-service-stats-keys-generator branch February 28, 2019 11:12

davidor mentioned this pull request Mar 1, 2019

Feature/delete service stats #78

Merged

Conversation

eguzki commented Feb 6, 2019

Uh oh!

Uh oh!

Uh oh!

davidor commented Feb 18, 2019

Uh oh!

eguzki commented Feb 21, 2019

Uh oh!

eguzki commented Feb 21, 2019

Uh oh!

eguzki commented Feb 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eguzki commented Feb 27, 2019

Uh oh!

mikz commented Feb 27, 2019

Uh oh!

eguzki commented Feb 27, 2019

Uh oh!

mikz commented Feb 27, 2019

Uh oh!

eguzki commented Feb 27, 2019

Uh oh!

mikz commented Feb 27, 2019

Uh oh!

eguzki commented Feb 27, 2019

Uh oh!

unleashed commented Feb 27, 2019

Uh oh!

davidor commented Feb 27, 2019

Uh oh!

eguzki commented Feb 27, 2019

Uh oh!

davidor Feb 28, 2019

Choose a reason for hiding this comment

Uh oh!

eguzki Feb 28, 2019

Choose a reason for hiding this comment

Uh oh!

davidor commented Feb 28, 2019

Uh oh!

eguzki commented Feb 28, 2019

Uh oh!

davidor commented Feb 28, 2019

Uh oh!

eguzki commented Feb 28, 2019

Uh oh!

eguzki commented Feb 28, 2019

Uh oh!

davidor Feb 28, 2019

Choose a reason for hiding this comment

Uh oh!

eguzki Feb 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidor Feb 28, 2019

Choose a reason for hiding this comment

Uh oh!

davidor commented Feb 28, 2019

Uh oh!

eguzki commented Feb 28, 2019

Uh oh!

eguzki commented Feb 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eguzki commented Feb 26, 2019 •

edited

Loading

eguzki Feb 28, 2019 •

edited

Loading