Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch planning of sub tasks #214

Merged
merged 8 commits into from Mar 19, 2017
Merged

Conversation

adamruzicka
Copy link
Contributor

@adamruzicka adamruzicka commented Feb 9, 2017

Quickstart Guide

  • include the ::Dynflow::Action::WithBulkSubPlans module into the action.
  • implement total_count method returning the total count of plans we want to create
  • implement entries(FROM, SIZE) method returning the batch starting at position FROM and spanning SIZE items
  • the BATCH_SIZE constant can be redefined to change the size of the batch
  • use current_batch inside create_sub_plans to get the current batch

TODO

  • resuming paused tasks
  • tests
  • fix cancelling
  • fix concurrency control after restart

@iNecas
Copy link
Member

iNecas commented Feb 9, 2017

ACK on the approach. Please proceed with finishing the PRs

end

def run_progress
if counts_set? && output[:total_count] > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use total_count method in the condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition was there just to refrain from dividing by zero, this shouldn't happen anymore, removed it

BATCH_SIZE=10

# Should return a slice of size items starting from item with index from
def entries(from, size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about calling this method simply batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

@iNecas
Copy link
Member

iNecas commented Feb 20, 2017

The builds are failing on rubocop

@iNecas
Copy link
Member

iNecas commented Feb 24, 2017

While testing, I've noticed the current implementation plans the next batch once some plans finished: we can do better just by making sure we send the event to plan next batch sooner https://gist.github.com/iNecas/4ad20c1451f46ef91b198ed5e9fdd413

Copy link
Member

@iNecas iNecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to wait for some plans to finish before we proceed with planning of next batch, see https://gist.github.com/iNecas/4ad20c1451f46ef91b198ed5e9fdd413

Copy link
Member

@iNecas iNecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I've noticed is that after restarting the executor while the task is running, the concurrency level is not taken into account when the task is resumed

Copy link
Member

@iNecas iNecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another request: after the job is cancelled, there should not be new batch planned

@adamruzicka
Copy link
Contributor Author

Cancelling should now work properly.

I'm afraid the concurrency control after executor's restart never worked. I'll create a new PR for it

Copy link
Member

@iNecas iNecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two last commands: I have other changes prepared that build on top of this, but we can handle this in separate PRs


# Returns the items in the current batch
def current_batch
start_position = output[:total_count]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest keeping the total_count in output to match the total_count method, and introduce planned_count to keep the number of tasks we have planned: The total_count is misleading.

module Action::WithBulkSubPlans
include Dynflow::Action::Cancellable

BATCH_SIZE = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this to DEAFAULT_BATCH_SIZE, as it can be changed by overriding batch_size method. Also, I would recommand making default a bit higher (even 100) should be safe for most cases, when not doing too much in the planning phase.

@@ -26,16 +26,20 @@ def run(event = nil)
end

def initiate
sub_plans = create_sub_plans
sub_plans = Array[sub_plans] unless sub_plans.is_a? Array
output.update(:total_count => 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to reset the numbers here? It would break the done? method when called sooner than any planning happens. When we go with planned_count number, we should not need to reset this numbers at the beginning (just initiating the planned_count in the WithBulkSubPlans

@adamruzicka
Copy link
Contributor Author

Updated

@iNecas
Copy link
Member

iNecas commented Mar 14, 2017

Tests are faillnig

@iNecas iNecas merged commit 401c744 into Dynflow:master Mar 19, 2017
@iNecas
Copy link
Member

iNecas commented Mar 19, 2017

There are still some tasks/rex changes to be made, but this one should be good to go. Thanks @adamruzicka

@iNecas
Copy link
Member

iNecas commented Mar 19, 2017

dynflow-0.8.22 with this code pushed to rubygems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants