Skip to content

Feature/delete service stats worker jobs#73

Merged
eguzki merged 19 commits intofeature/delete-service-stats-integrationfrom
feature/delete-service-stats-worker-jobs
Mar 1, 2019
Merged

Feature/delete service stats worker jobs#73
eguzki merged 19 commits intofeature/delete-service-stats-integrationfrom
feature/delete-service-stats-worker-jobs

Conversation

@eguzki
Copy link
Copy Markdown
Member

@eguzki eguzki commented Feb 6, 2019

  • Stats::PartitionEraserJob
    Background job to delete stat keys

  • Stats::PartitionGeneratorJob
    Background job to generate Stats::PartitionEraserJob for each stats keys subset

@eguzki eguzki mentioned this pull request Feb 6, 2019
3 tasks
Comment thread lib/3scale/backend/stats/partition_eraser_job.rb Outdated
Comment thread lib/3scale/backend/stats/partition_eraser_job.rb Outdated
Comment thread lib/3scale/backend/stats/partition_generator_job.rb Outdated
Comment thread lib/3scale/backend/stats/partition_eraser_job.rb Outdated
@davidor
Copy link
Copy Markdown
Contributor

davidor commented Feb 18, 2019

There are mainly 2 aspects of this PR that I think should be addressed before merging.

The first one is that there are no tests. The second one is that, in my opinion, the responsibilities of the 2 types of jobs (PartitionGeneratorJob and PartitionEraserJob) are not well defined and they do a lot of duplicated work.

PartitionGeneratorJob generates all the keys, but it discards them and only passes to each of the PartitionEraserJobs an offset. Then, each PartitionEraserJob generates again all the keys and discards all of them except a few ones (just picks N starting from the 'offset' received). It's a lot of duplicated work.

Wouldn't it be better to generate all the keys in the Generator job and just pass X of them to each of the Delete jobs? That way, the jobs would not perform any duplicated work. Also, I think that conceptually it would be easier to understand, because one job type would be responsible for generating all the keys, and the other just for deleting the keys that it received.

@eguzki
Copy link
Copy Markdown
Member Author

eguzki commented Feb 21, 2019

The design was made to avoid serializing/deserializing to/from redis all the keys that have to be deleted. You are proposing to do that.

Besides, resque job size (basically json body size) can be big. The body size is controlled by PARTITION_BATCH_SIZE configuration parameter. Having PARTITION_BATCH_SIZE ~ 1000 keys, it makes ~ 1000 * (50 bytes per key) = 50k the size of a PartitionEraserJob resque job.

The trade off is

  • small PARTITION_BATCH_SIZE => small job body and lots of PartitionEraserJob jobs
  • big PARTITION_BATCH_SIZE => big job body and few PartitionEraserJob jobs. If some job fail, a new worker will be processed and the big partition will be erased again (many keys could already been deleted)

Other implementation would be passing indexes besides jobs, and eraser worker will only have to generate required keys, hence no duplicated work. Resque jobs would be small in size and no keys would be serialized to redis. However the implementation of indexes increases complexity of the key generator and I decided to leave it for now. Maybe future improvement.

WDYT @davidor ? Can we afford big jobs and serializing all the keys including them in job bodies?

@eguzki eguzki force-pushed the feature/delete-service-stats-worker-jobs branch from 80526e4 to f6e960c Compare February 21, 2019 17:22
@eguzki
Copy link
Copy Markdown
Member Author

eguzki commented Feb 21, 2019

Implementation of PARTITION_BATCH_SIZE and DELETE_BATCH_SIZE as configuration params with default values

@davidor
Copy link
Copy Markdown
Contributor

davidor commented Feb 22, 2019

Serializing/deserializing could be a problem too. This is the kind of thing that needs to be measured.

What if Stats::PartitionGeneratorJob generated to and from timestamps instead of indexes? Would that be a valid option? That way each job would be responsible for generating and deleting a part of the keys without duplication and without needing to serialize/deserialize the keys.

For example, if we need to delete a whole year of stats, we could generate one job for each month, or for each day. Not sure what would be the most appropriate granularity. Or maybe split by apps or metrics?

@eguzki
Copy link
Copy Markdown
Member Author

eguzki commented Feb 22, 2019

Partitioning by time, applications, metrics or users is rough way of partitioning. You cannot control the number of keys received by a worker to be deleted. With the current implementation, the size of a partition is controlled by a configuration parameter.

Anyway, it is a partitioning, yes.

We could leave it like it is now and write an issue to implement indexes. WDYT @davidor ?

@eguzki eguzki force-pushed the feature/delete-service-stats-worker-jobs branch from f6e960c to 5f642f1 Compare February 24, 2019 17:40
Comment thread lib/3scale/backend/stats/partition_eraser_job.rb Outdated
@eguzki eguzki force-pushed the feature/delete-service-stats-worker-jobs branch from 6b0dbd7 to 800b26c Compare February 28, 2019 11:24
Comment thread spec/integration/job_queues_spec.rb Outdated
@eguzki eguzki force-pushed the feature/delete-service-stats-worker-jobs branch from 29cb976 to 80d8627 Compare February 28, 2019 18:29
@eguzki eguzki requested a review from davidor February 28, 2019 18:32
@eguzki
Copy link
Copy Markdown
Member Author

eguzki commented Feb 28, 2019

@davidor ready for a review

Comment thread openshift/3scale_backend.conf
Comment thread lib/3scale/backend/stats/key_generator.rb
Comment thread lib/3scale/backend/stats/partition_eraser_job.rb
Comment thread lib/3scale/backend/stats/partition_generator_job.rb

stats_key_gen = KeyGenerator.new(job.to_hash)

stats_key_gen.keys.drop(offset).take(length).each_slice(configuration.stats.delete_batch_size) do |slice|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced about the idea of generating the whole set of keys on every PartitionEraserJob. It'd be good to hear other opinions on this. @unleashed ? @miguelsorianod ?

This might be good enough for a first version but we'll need to closely measure the impact on CPU time consumed by this kind of worker with something like stackprof. It's difficult to tell now because there are many factors that can have an impact like number of this kind of jobs generated, the average from-to interval, etc.

At least we should add a TODO here explaining what this does and mention that this is probably something that could be improved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't just a subset of the keys generated/removed (length) ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is generating all keys up to offset + length.

I will write an issue about improving the algorithm to use indexes.

Deleting service stats is not a job that will be created often

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a subset is deleted, but all of them are generated stats_key_gen.keys.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 , I was thinking on the old implementation, when enumerators were used and not all keys were generated.

Comment thread spec/integration/stats_partition_generator_job_spec.rb Outdated
Comment thread spec/integration/stats_partition_generator_job_spec.rb
Comment thread spec/integration/stats_partition_eraser_job_spec.rb Outdated
Comment thread spec/integration/stats_partition_eraser_job_spec.rb Outdated
@davidor
Copy link
Copy Markdown
Contributor

davidor commented Mar 1, 2019

Good job @eguzki 👍
I think this can be merged to the integration branch.

@eguzki eguzki merged commit c1fec4d into feature/delete-service-stats-integration Mar 1, 2019
@bors bors Bot deleted the feature/delete-service-stats-worker-jobs branch March 1, 2019 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants