Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in_parallel step #3580

Merged
merged 8 commits into from Apr 17, 2019

Conversation

@itsdalmo
Copy link
Contributor

itsdalmo commented Mar 24, 2019

Closes #2628 by adding support for the following:

jobs:
  - name: example
    plan:
    - get: every-minute
      trigger: true
    - in_parallel:
        limit: 1
        fail_fast: true
        steps:
        - task: echo
          config:
           ...
        - task: echo
          config:
            ...

InParallel ensures that limit steps are running at any one time and until all steps are complete. If limit is not specified, the default behavior is identical to an aggregate step and runs everything at once. Setting limit to 1 makes it behave like do. When cancelling a job, in_parallel will not schedule (Run()) the pending tasks and instead return the ctx.Err() as soon as possible. When setting fail_fast, in_parallel will abort all currently running tasks and refrain from scheduling pending tasks.

Todo

  • Revert changes to examples/hello-world-every-minute.yml.
  • Submit a PR to concourse/docs.
  • Decide on YAML structure and naming (questions in this comment).

PS: Sorry for the big PR, but parallel is essentially a copy/paste of aggregate with very slight modifications. Was originally thinking I could refactor so both used the same underlying implementation, but I think that would only allow us to reuse the code in exec which does not have that big of an impact on the diff.

@itsdalmo itsdalmo force-pushed the itsdalmo:add-parallel-step branch from d34b313 to 4593125 Mar 24, 2019
@itsdalmo itsdalmo changed the title [WIP] Add parallel step Add parallel step Mar 24, 2019
@vito vito requested review from clarafu and pivotal-jwinters Mar 25, 2019
Copy link
Member

vito left a comment

Haven't thoroughly reviewed this yet but here's some early feedback and asked a couple other Core team members to take a look. 👍 Thanks for working on this!

re: all the tests, @clarafu and @pivotal-jwinters may have some opinions there. My take is that the coverage may be a bit excessive owing to aggregate's storied history (it's one of the oldest step types). It might make sense to replace some of the more verbose/brittle tests with testflight tests instead. Or maybe they're actually fine and not the brittle/verbose tests I'm remembering from aggregate. Like I said I haven't reviewed thoroughly yet. 🙂

atc/plan.go Outdated
@@ -5,6 +5,7 @@ type Plan struct {
Attempts []int `json:"attempts,omitempty"`

Aggregate *AggregatePlan `json:"aggregate,omitempty"`
Parallel *ParallelPlan `json:"parallel,omitempty"`

This comment has been minimized.

Copy link
@vito

vito Mar 25, 2019

Member

🤔 The name feels slightly off; most of them are adjectives or verbs (except task I guess).

What do you think of in_parallel? I kinda like that as it fits alongside on_success/do/etc.

This comment has been minimized.

Copy link
@itsdalmo

itsdalmo Mar 25, 2019

Author Contributor

Makes sense, I've changed it to in_parallel 👍

// a nested chain of steps to run in parallel
Parallel *PlanSequence `yaml:"parallel,omitempty" json:"parallel,omitempty" mapstructure:"parallel"`
// limit paralelism for a Parallel plan
MaxInParallel int `yaml:"max_in_parallel,omitempty" json:"max_in_parallel,omitempty" mapstructure:"max_in_parallel"`

This comment has been minimized.

Copy link
@vito

vito Mar 25, 2019

Member

This name feels awkward but I'm not sure what else to suggest. Maybe it just sounds weird because I'm used to max_in_flight. I thought it might make sense to reuse the same name but @clarafu pointed it it might be confusing to have both set - someone could assume max_in_flight: 3 at the job level is like setting it on all the steps in the plan. Maybe we should reserve 'in flight' terminology for builds, then? 🤔 If so this name is probably fine.

This comment has been minimized.

Copy link
@itsdalmo

itsdalmo Mar 25, 2019

Author Contributor

I see what you mean, and I agree that it's best to keep max_in_flight separate since it could get a bit confusing. Especially when you consider indentation errors in the YAML, which would produce unexpected results if the keyword is recognised in both the job and plan?

However, right now we have in_parallel and max_in_parallel - not sure if I like it or not 🤔If I could dream up some YAML that is easy on the eyes, it would look something like this:

jobs:
  - name: example
    plan:
    - in_parallel: 3
      steps:
      - task: echo
        config:
         ...
    - in_parallel: # <- Same as aggregate.
      steps:
      - task: echo
        config:
         ...
    - in_sequence: # <- Equivalent to "do"?
      steps:
      - task: echo
        config:
         ...

This comment has been minimized.

Copy link
@vito

vito Mar 26, 2019

Member

I think the last two options there are a little awkward, since knowing how the YAML is parsed, those keys actually point to blank values, and Concourse's parser would have to be careful to notice that they're there at all (since they'd be set to either empty strings or nil - I don't remember).

The first option feels natural but forces you to pick a number in scenarios where you don't really care (unless we support the later two options).

I agree that in_parallel and max_in_parallel feel a little weird, but it at least allows the common case of "I just want things to run in parallel" to look simple.

What if we allow two forms of in_parallel?

The first, simplest form which takes no configuration:

in_parallel:
- task: echo1
  file: config.yml
- task: echo2
  file: config.yml
- task: echo3
  file: config.yml

...and in more complicated scenarios, it's instead an object with limit and steps:

in_parallel:
  limit: 3
  steps:
  - task: echo1
    file: config.yml
  - task: echo2
    file: config.yml
  - task: echo3
    file: config.yml

This could also be extended to support fail_fast: true (#165).

On the other hand I could see the repetition of in_parallel in max_in_parallel being a strength of the name, given that they are very closely tied together. 🤷‍♂

Any preference?

This comment has been minimized.

Copy link
@itsdalmo

itsdalmo Mar 26, 2019

Author Contributor

I think in your example, in_parallel would only be able to cover the current aggregate and the proposed limited parallelisation, and in that case I would suggest we just stick with in_parallel and max_in_parallel - mainly because I suspect indentation might trip users up?

We could still add fail_fast to this PR and perhaps throw an error if this key is present in a non-parallel context? However, it seems like the current atc.Validate() doesn't really do strict validation for the plan configuration, so perhaps it's fine to just ignore fail_fast if it's specified in another context? 🤔

@vito vito self-assigned this Mar 25, 2019
@itsdalmo

This comment has been minimized.

Copy link
Contributor Author

itsdalmo commented Mar 25, 2019

My take is that the coverage may be a bit excessive owing to aggregate's storied history (it's one of the oldest step types). It might make sense to replace some of the more verbose/brittle tests with testflight tests instead. Or maybe they're actually fine and not the brittle/verbose tests I'm remembering from aggregate. Like I said I haven't reviewed thoroughly yet. 🙂

You are right, it is the very same tests as in aggregate 😂I'm happy to go over the tests and rewrite them so that we only keep essential tests; there are a lot of tests to ensure that other steps/hook work within a parallel, which are probably safe to remove since each step is just executed with step.Run(ctx, state).

I can leave a comment on the tests I think can be deleted.

atc/scheduler/factory/factory_parallel_test.go Outdated Show resolved Hide resolved
atc/scheduler/factory/factory_parallel_test.go Outdated Show resolved Hide resolved
atc/scheduler/factory/factory_parallel_test.go Outdated Show resolved Hide resolved
@@ -502,6 +502,86 @@ var _ = Describe("JobConfig", func() {
})
})

Context("when a simple parallel plan is the first step", func() {

This comment has been minimized.

Copy link
@itsdalmo

itsdalmo Mar 25, 2019

Author Contributor

I think this test, and the ones following, are mostly testing get in relation to PlanSequence and not aggregate or parallel specifically. But if the desire is to move from aggregate to parallel in the future, then I think it makes sense to just keep them?

This comment has been minimized.

Copy link
@itsdalmo

itsdalmo Mar 25, 2019

Author Contributor

Same with the exec_engine tests. If the plan is to eventually move from aggregate to parallel then I think it makes sense to keep the tests; unless they can be cut completely?

This comment has been minimized.

Copy link
@vito

vito Mar 26, 2019

Member

I think these tests are just covering the recursing through all the different types of steps which contain other steps. It probably makes sense to keep it, since we need to recurse through the Parallel fields as well. This is necessary for finding all the 'job inputs', which are determined by looking through the entire plan for get steps.

@itsdalmo itsdalmo force-pushed the itsdalmo:add-parallel-step branch from 30c37e8 to 293aa8d Mar 25, 2019
@jchesterpivotal

This comment has been minimized.

Copy link
Contributor

jchesterpivotal commented Mar 28, 2019

I'd feel more comfortable if aggregate got an extra dial (max_in_flight, throttle, whatever), rather than having an entirely new step name that has overlapping functionality.

@marco-m

This comment has been minimized.

Copy link
Contributor

marco-m commented Mar 28, 2019

I agree with @jchesterpivotal here. Although the name parallel is more suitable than aggregate, having both names, besides code duplication, also adds user confusion, and I see already the same pipeline sporting both. +1 to add the max_in_flight to the existing aggregate.

@pivotal-jwinters

This comment has been minimized.

Copy link
Contributor

pivotal-jwinters commented Mar 28, 2019

@vito I tend to agree with @jchesterpivotal as well. I know you wanted to migrate away from the aggregate step as per this comment, but maybe we can just alias the name and deprecate the aggregate name over time?

@itsdalmo

This comment has been minimized.

Copy link
Contributor Author

itsdalmo commented Apr 2, 2019

The code changes are minimal, so all we currently need is a decision on:

  • Add parallel step or extend aggregate?
  • What should the config flag be called: max_in_flight, max_in_parallel, limit or throttle?

And keep in mind that if we extend aggregate, we have to add the above flag at the root of atc.PlanConfig (not nested inside a struct), less we will break existing pipelines. If we add parallel we are free to do whatever we want, at the cost of some code duplication and confusion (unless we are saying that aggregate is deprecated?).

I'm happy to implement this whatever way you guys want so as to get #2628 closed. Only thing I have to add is that max_in_flight might get "overloaded" if it is used in too many places. Where do you want to go with this @vito?

@vito

This comment has been minimized.

Copy link
Member

vito commented Apr 3, 2019

@jchesterpivotal @marco-m @pivotal-jwinters I don't plan on supporting both aggregate and in_parallel forever - the point was to give users a reason to migrate to the new name by giving it more useful functionality (max-in-flight, fail-fast) and deprecate aggregate as in_parallel adoption grows.

Having an alias seems like it would be even more confusing - you've then shared the internal implementation, which users don't see, but duplicated all functionality, which users do see. Wouldn't it be clearer to have a deprecated, less-useful step and a newer step with a more meaningful name and more functionality?

@marco-m

This comment has been minimized.

Copy link
Contributor

marco-m commented Apr 3, 2019

@vito then I would suggest the following:

  • introduce a check that returns error if the same pipeline has both aggregate and in_parallel. This check should be introduced in this PR or immediately after. The error message should mention that aggregate is deprecated.
  • Deprecate aggregate immediately starting with 5.1. Add a check to fly that prints a deprecation warning and mentions when aggregate will be removed.
  • Remove aggregate in 6.0.
@vito

This comment has been minimized.

Copy link
Member

vito commented Apr 3, 2019

@marco-m Agreed. I'm not too sure about mixed aggregate/in_parallel being an error vs. a deprecation warning, but we at least have all the plumbing for emitting warnings.

@itsdalmo How's that sound?

Plumbing for deprecations:

https://github.com/concourse/concourse/blob/41f54f59330f2b6be55e05eb56166cb97494a64a/atc/validate.go#L27:17

->

warnings, errorMessages := config.Validate()

->

displayhelpers.ShowWarnings(warnings)

@clarafu

This comment has been minimized.

Copy link
Contributor

clarafu commented Apr 3, 2019

If we are trying to deprecate the aggregate step, then I would think that creating a new parallel step makes sense.

I feel like using the max_in_flight keyword again would be quite confusing especially if you have a very long pipeline config and you arn't sure if the max_in_flight is for the aggregate step or for the build plan, in which the keyword means two completely different things.

@marco-m

This comment has been minimized.

Copy link
Contributor

marco-m commented Apr 4, 2019

@clarafu I agree, max_in_parallel seems more appropriate.

@itsdalmo

This comment has been minimized.

Copy link
Contributor Author

itsdalmo commented Apr 8, 2019

@vito - I've added fail_fast in the latest commits. Struggled a bit with this one because it seems like Run() for the steps do not return an error when the task script fails - i.e. the only way I could get the expected behaviour was by using s.Succeeded(). Am I doing something wrong or is this working as intended?

In the current commit, in_parallel with fail_fast: true will interrupt currently running steps, stop scheduling pending tasks. This is the behaviour I would expect myself, so that is what I implemented, but perhaps you had something else in mind?

Also, I'm wondering if perhaps the configuration should only be your second suggestion:

in_parallel:
  limit: 3
  steps:
  - task: echo1
    file: config.yml
  - task: echo2
    file: config.yml
  - task: echo3
    file: config.yml
  fail_fast: true

Seems cleaner when related configuration goes in a map like this? What do you think?

Signed-off-by: itsdalmo <kristian@doingit.no>
@itsdalmo itsdalmo force-pushed the itsdalmo:add-parallel-step branch from 5101201 to 95b4eb8 Apr 9, 2019
@itsdalmo itsdalmo changed the title Add parallel step Add in_parallel step Apr 9, 2019
itsdalmo added 6 commits Apr 9, 2019
Signed-off-by: itsdalmo <kristian@doingit.no>
Signed-off-by: itsdalmo <kristian@doingit.no>
Signed-off-by: itsdalmo <kristian@doingit.no>
Signed-off-by: itsdalmo <kristian@doingit.no>
Signed-off-by: itsdalmo <kristian@doingit.no>
Signed-off-by: itsdalmo <kristian@doingit.no>
@itsdalmo itsdalmo force-pushed the itsdalmo:add-parallel-step branch from 8935a4d to 8267ab2 Apr 9, 2019
@vito

This comment has been minimized.

Copy link
Member

vito commented Apr 9, 2019

@itsdalmo It feels cleaner that way when other settings are present - I like that it namespaces the fields under in_parallel, since putting them on the toplevel pollutes the shared step field namespace and separates the config from the steps list.

I think it's too verbose for the common case though, and it wouldn't feel nice to switch from aggregate: to in_parallel: {steps: ...} everywhere.

If we define a special unmarshaler we could allow both:

in_parallel:
- step1
- step2
- step3
in_parallel:
  limit: 3
  fail_fast: true
  steps:
  - step1
  - step2
  - step3

It's kind of annoying to implement this unmarshaling logic now though though since our config unmarshaling logic is pretty hairy and tied to mapstructure, yaml, and json. I'm trying to clean this up in #3505 so that it can be just YAML.

We could always start with the latter format in this PR and add polish later to support the shorthand form. 🤔 But I wouldn't want to deprecate aggregate until we have the shorthand.

Signed-off-by: itsdalmo <kristian@doingit.no>
@itsdalmo

This comment has been minimized.

Copy link
Contributor Author

itsdalmo commented Apr 11, 2019

@vito I have added support for the shorthand form of in_parallel in the latest commit. I have not added custom serialization for YAML/JSON, so get-pipeline and set-pipeline will always return/diff in_parallel: [ ... ] as in_parallel: { steps: [ ... ] }. Returning the long form of the config feels clean to me since e.g. YAML anchors will be expanded on get and set-pipeline also - but let me know if you want it changed. Also added a deprecation warning if aggregate is set, as part of the validation.

Also, let me know if you want me to squash the commits in the PR (or will you squash & merge?) once you are happy with the PR.

@itsdalmo

This comment has been minimized.

Copy link
Contributor Author

itsdalmo commented Apr 16, 2019

Thoughts on this @vito? Are we approaching mergeable? 😅

@vito

This comment has been minimized.

Copy link
Member

vito commented Apr 17, 2019

@itsdalmo Sorry! I was focusing on getting 5.1 out the door, which we were on the verge of for a while, but it stretched on for longer than expected. I'll take another look at this today and merge if it's good to go. 👍

@vito
vito approved these changes Apr 17, 2019
Copy link
Member

vito left a comment

@itsdalmo This looks great!

  • The deprecation warning works as expected and is easy to notice.
  • limit: X works great - kinda fun watching it chew through a bunch of tasks in batches.
  • fail_fast: true also works great and results in the build failing, similar to a timeout. 👍

Showing the expanded form in get-pipeline feels a little weird because it's a structural change and not a YAML-level thing like anchors, but it at least doesn't seem to cause set-pipeline to think there's a diff to show, which is what I was primarily worried about. I'm happy to merge as-is and iterate on it. 👍 Don't want to force another review cycle, especially since I'll be on vacation for a bit soon.

Merging! Thanks a bunch, and thanks for waiting!

@@ -203,6 +203,29 @@ var InputsConfigDecodeHook = func(
return data, nil
}

var InParallelConfigDecodeHook = func(

This comment has been minimized.

Copy link
@vito

vito Apr 17, 2019

Member

Hopefully this can go away with #3505. Makes sense though. 👍

@vito vito merged commit d64843c into concourse:master Apr 17, 2019
5 checks passed
5 checks passed
DCO DCO
Details
WIP Ready for review
Details
concourse-ci/testflight Concourse CI build success
Details
concourse-ci/unit Concourse CI build success
Details
concourse-ci/watsjs Concourse CI build success
Details
@itsdalmo itsdalmo deleted the itsdalmo:add-parallel-step branch Apr 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.