Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the 'artifacts' abstraction for all volumes #3607

Open
ddadlani opened this issue Mar 27, 2019 · 6 comments
Open

Use the 'artifacts' abstraction for all volumes #3607

ddadlani opened this issue Mar 27, 2019 · 6 comments
Assignees
Labels

Comments

@ddadlani
Copy link
Contributor

As part of #3424 , in order to create ephemeral containers directly in Garden that are not tracked by the ATC, we need to have a way to create persistent volumes (for the Garden container images) that are not owned by any db containers.

Currently, this is not possible because of the following lines:

func (i *imageFromResource) FetchForContainer(
ctx context.Context,
logger lager.Logger,
container db.CreatingContainer,
) (worker.FetchedImage, error) {

Creating an image requires a container ID for GC purposes. Creating a resource cache also requires a container as its ContainerOwner.

We had previously discussed extending the Artifacts abstraction to encompass all types of volumes. Then, the Artifact would deal with GC logic etc, making it easier to create and manage new 'types' of volumes. We already introduced worker_artifacts for #3307.

This spike is to investigate whether extending this artifacts abstraction makes it easier to create image volumes for #3424, and to flesh out what it would look like.

@ddadlani ddadlani added this to Icebox in Runtime via automation Mar 27, 2019
@ddadlani ddadlani moved this from Icebox to In Flight in Runtime Mar 27, 2019
@ddadlani
Copy link
Contributor Author

ddadlani commented Apr 8, 2019

For now our thoughts are revolving around keeping worker_artifacts without any managed states, and leaving the 'Creating' and 'Created' states in the volume. We can just leave around artifacts that are not attached to volumes until the 12 hr mark is completed.

Artifacts are owned by builds and containers so we can on delete cascade once the container gets deleted. The difference from the current implementation is that we do not require a creating container to create an artifact in the new flow, which means that artifacts and volumes can be created before db containers.

@ddadlani
Copy link
Contributor Author

Artifact lifecycle flow:

  • an artifact can exist without an underlying volume. The assumption is that a volume will eventually be created that points to the artifact.
  • an artifact can exist without being associated to anything. If it never gets associated, it will be deleted after 12h
  • an artifact can be owned by the following objects:
    • build
    • resource cache
    • task cache
    • base resource type
    • resource cert
  • once an artifact is owned by something, it becomes initialized. Initialized volumes can be DELETED in the following cases:
    • owned by build:
      • status: aborted
      • status: failed
        • if build is not latest build for job
      • status: errored
        • if build is not latest build for job
      • status: succeeded
      • end_time: not null
    • owned by resource cache
      • resource cache is deleted
    • owned by task cache
      • task cache is deleted
    • owned by base resource type
      • never
    • owned by resource certs
      • never
    • owned by nothing
      • always

@ddadlani ddadlani changed the title Spike: use the 'artifacts' abstraction for all volumes Use the 'artifacts' abstraction for all volumes Apr 10, 2019
@ddadlani
Copy link
Contributor Author

Quick update:

So far, our work has been to expand the worker_artifact abstraction to include all volumes rather than just the ones uploaded by fly execute. We ran into a few roadblocks:

  1. We originally wanted to make artifacts non-worker specific, so that they can represent "storage" before a worker needs to be chosen.

Currently we need to choose a worker in atc/exec/*_step.Run() before we can create the container, which means that the worker selection strategy and the pool are directly exposed to the exec package which seems unnecessary as it is a runtime concern, i.e. the step shouldn't care about details such as the worker it should run on.

We wanted to remove this decoupling by having steps create non-worker-dependent artifacts. However, if an artifact is meant to govern the volume lifecycle, we need to use the concepts of worker_resource_cache, worker_base_resource_type etc which are inherently worker dependent. This means that the worker still needs to be exposed to step.Run()

  1. We do not want artifacts to be owned by containers, since they are a database concept + they shouldn't be coupled to one kind of runtime (e.g. what if we needed to create kubernetes pods?). But this would mean that artifact creation has to occur during container creation, or else there is no way to know which artifacts are needed for a container.

To overcome this, we are narrowing the definition of "artifact" to only encompass "storage types that Core concepts care about". This means that container-specific volumes such as scratch volumes (and maybe COW volumes) will not have artifacts associated to them. There will still be a containerID field in the volumes table to account for garbage collection of these volumes.

Doing this allows us to create all volumes that exist beyond the container lifecycle before container creation, while still creating "temporary" container-specific volumes during container creation.

There were some other concerns that we were hoping to address with this story, e.g. a "runtime" interface to abstract away the actual execution of steps, which will be put on hold for another story.

@ddadlani
Copy link
Contributor Author

After some discussion with the team, we have decided to only implement artifacts without any GC lifecycle. Artifacts will have a UUID which will be attached by core logic to any of the core concepts (resource caches, task caches etc). The artifact deletion lifecycle will be handled in the core, whereas the runtime will only handle volume deletion.

This removes the need for us to have knowledge of the type of information stored in the artifact and the information's associated lifecycle.

@kcmannem
Copy link
Member

kcmannem commented May 1, 2019

In the core logic, all caches, inputs, outputs etc will have an artifact associated with them. Because artifacts are the storage "interface" between runtime and core, they should not depend on a particular worker (Core doesn't care about workers). Volumes should be owned by "artifacts", which are a worker-independent storage abstraction. This leads to a few changes in our storage model:

  • If volumes are owned by worker-independent artifacts, we cannot also have them depend on worker-specific types like worker_resource caches. Since Runtime shouldn't care about concepts such as resource caches or task caches, the worker_resource_cache, worker_task_cache, worker_base_resource_types should all go away.

  • The reason we had worker_resource_caches is because we wanted to know the version of the base resource type a cache was holding on a worker for cache invalidation. This need exists because the workers advertise their base resource types to the ATC upon registration, which means they can all have different versions.

  • However, if the ATC provides base resource types to all workers by streaming them in, there will be no version differences that we need to keep context of.

  • Worker task caches are currently only worker specific (there is no task_cache db table). We want to model them similar to resource caches, which are managed by Core and have an artifact associated to them. This way, each worker-specific volume will be owned by the task cache.

@ddadlani
Copy link
Contributor Author

We are currently working on #3810 which will enable using artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

2 participants