Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent tasks modifying shared volumes produce non-deterministic results #3058

Open
JohannesRudolph opened this Issue Jan 16, 2019 · 5 comments

Comments

Projects
None yet
2 participants
@JohannesRudolph
Copy link
Contributor

JohannesRudolph commented Jan 16, 2019

Bug Report

Steps to Reproduce

Suppose I have the following Job that concurrently runs two deployments that each consist of

    1. processing a manifest stored in the repo
    1. deploying something based on the processed manifest from the output of task 1)
- name: "deploy"
  max_in_flight: 1
  plan:
  - aggregate:
    - do:
      - task: process manifest a # 1.1)
        file: repo/ci/util/manifest.yml
        params:
          MANIFEST: ci/x/manifests/a.yml
      - put: cf # 1.2)
        params:
          manifest: processed-manifests/a.yml
    - do:
      - task: process manifest b # 2.1)
        file: repo/ci/util/manifest.yml
        params:
          MANIFEST: ci/x/manifests/b.yml
      - put: cf # 2.2)
        params:
          manifest: processed-manifests/b.yml

(some details cut for brevity).

We noticed that sometimes the step 2) 's in each aggregate/do pair would fail to find the manifest from step 1). What I think is happening here is that both step 1)'s concurrently access the volume processed-manifests whenever they happen to run on the same worker. Which "branch" (assuming volumes are copy on write) of the processed-manifests volume ends up in step 2) is not deterministic and depends on task scheduling onto workers.

I think we are hit so often by this issue as we use the random-worker scheduling strategy to achieve equal task distribution in our cluster.

Expected Results

I feel either one of these two things should happen:

  • copy-on-write volumes/layers should correctly follow "task step branches" i.e. the volume that a task step sees is guaranteed to have been processed by a previous task step
  • warn the user that his tasks are concurrently accessing overlapping volumes

I'm leaning towards the second option, as I think it's possible to come up with a case where the first option can't be defined in a sound way.

Actual Results

The task outputs on the shared volume clobber each other in an non-deterministic way (task execution order/worker volumes) leading to pipeline failures. I think this is similar to #483 but also related to #1799

Version Info

  • Concourse version: 4.2.1
  • Deployment type (BOSH/Docker/binary): K8s Helm
  • Infrastructure/IaaS: Meshcloud OpenStack
@JohannesRudolph

This comment has been minimized.

Copy link
Contributor Author

JohannesRudolph commented Jan 16, 2019

Note that it's possible to work around this issue by providing the tasks with output_mappings to separate output folders (= volume mounts) as this forces a guaranteed "sequence" of volume modifications.

Another question that has popped up is: should task inputs be read-only so they can't be modified (or are they?) In case they are writable, they can potentially exhibit the same behavior.

@vito

This comment has been minimized.

Copy link
Member

vito commented Jan 16, 2019

I'm pretty sure this is behaving as expected. aggregate / do don't imply scoping, only concurrency/serialization. Everything operates in one scope when it comes to artifact names, so if you have two tasks that produce the same output names in parallel, you'll get a race. output_mapping and input_mapping are the correct way to do this.

I don't think copy-on-write semantics are at fault here and I'm pretty sure it'd happen regardless of how many workers there are; it's a fundamental race with shared in-memory state (the name -> artifact mapping). Concourse already ensures tasks get copy-on-writes and don't share the data, but the in-memory state for tracking named artifacts is subject to cases like this where there are two tasks concurrently creating the same named artifact.

@vito vito removed triage bug labels Jan 16, 2019

@vito

This comment has been minimized.

Copy link
Member

vito commented Jan 16, 2019

Maybe the docs can be more careful about this? Not sure where that would go - maybe aggregate docs?

@JohannesRudolph

This comment has been minimized.

Copy link
Contributor Author

JohannesRudolph commented Jan 16, 2019

Thanks for the fast response @vito. This issue is clearly a corner case. However because it's a bit of a heisenbug situation for users with pipelines like this I'd very much appreciate if concourse could print a warning in case it detects the task graph has this condition (aggregate tasks producing the same output folders).

In hindsight it's pretty obvious what happens but it took quite a while to track down for us. A warning like we already have for unspecified params (credentials) in the build log would probably be the ideal solution.

A simple addition to the aggregate docs to be careful about clobbered output volumes should be a good 20/80 fix though!

@vito

This comment has been minimized.

Copy link
Member

vito commented Jan 16, 2019

Ah, yeah that would be a pretty cool warning!

Probably not a high priority at the moment but a PR would be super welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.