Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover image pulls by trying to run the container anyways. #3813

Merged
merged 3 commits into from
Jul 24, 2018

Conversation

markusthoemmes
Copy link
Contributor

@markusthoemmes markusthoemmes commented Jun 27, 2018

A docker pull can fail due to various reasons. One of them is network throttling by the image registry. Since we try to pull on each blackbox invocation, high volume load can cause lots of errors due to failing pulls unnecessarily.

Instead, we try to run the container even if the pull failed in the first place. If the image is available locally, the container will start just fine and recover the error gracefully. If the image is not available locally, the run will fail as well and return the same error as the docker pull would've returned.

This behavior will only be enabled for blackbox actions that specify a tag. Blackbox actions not using a tag or using "latest" as a tag will exhibit the very same behavior as today. That is: There will always be a pull before each container start and a failing pull will result in an error reported to the user. This is to enable rapid prototyping on images and enable determinism in the workflow. Updating the action will then force a pull and will fail early if that pull fails. With the new behavior, the developer might be surprised by the image not being updated because the pull is swallowed.

For production workload it is considered best-practice to version images through labels, thus we can "safely" assume that we can fall back to a local image in case the pull fails.

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@markusthoemmes markusthoemmes added docker invoker review Review for this PR has been requested and yet needs to be done. labels Jun 27, 2018
Copy link
Member

@rabbah rabbah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is semantic changing - updating the action to force a docker pull will no longer work.

}
} else Future.successful(())
// Always recover the Future so docker run is tried below in any case, in case we have an image available locally
docker.pull(image).map(_ => true).recover { case _ => false }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/incubator-openwhisk/pull/2852/files

not sure if this is better but this was an old attempt to skip docker pull if image is present locally using docker inspect.

@codecov-io
Copy link

codecov-io commented Jun 27, 2018

Codecov Report

Merging #3813 into master will decrease coverage by 4.76%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3813      +/-   ##
==========================================
- Coverage   75.82%   71.05%   -4.77%     
==========================================
  Files         145      145              
  Lines        6919     6921       +2     
  Branches      409      410       +1     
==========================================
- Hits         5246     4918     -328     
- Misses       1673     2003     +330
Impacted Files Coverage Δ
.../containerpool/docker/DockerContainerFactory.scala 27.58% <100%> (+0.91%) ⬆️
...sk/core/containerpool/docker/DockerContainer.scala 77.92% <100%> (+0.89%) ⬆️
...core/database/cosmosdb/RxObservableImplicits.scala 0% <0%> (-100%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0% <0%> (-95.1%) ⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala 0% <0%> (-92.6%) ⬇️
...whisk/core/database/cosmosdb/CosmosDBSupport.scala 0% <0%> (-81.82%) ⬇️
...abase/cosmosdb/CosmosDBArtifactStoreProvider.scala 0% <0%> (-58.83%) ⬇️
...la/whisk/core/database/cosmosdb/CosmosDBUtil.scala 92% <0%> (-4%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5694f38...518c87e. Read the comment docs.

@markusthoemmes
Copy link
Contributor Author

@rabbah was right and I added some behavioral changes. I'll email the dev-list to discuss the behavior change contained here.

Copy link
Contributor

@cbickel cbickel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT

@@ -105,6 +105,10 @@ The instructions that follow show you how to use the OpenWhisk Docker skeleton.
Notice the use of `--docker` when creating an action. Currently all Docker images are assumed
to be hosted on Docker Hub.

*Note:* It is considered best-practice for production images to be versioned via docker image tags. The tag "latest" or no tag will guarantee to always be on the latest state when creating new containers, but that contains the possibility of failing because pulling the image fails. For tagged images however, the system will allow a failing pull and gracefully recover it by using the image that is already locally available, making it much more resilient against Dockerhub outages etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested rewrite for the first sentence:

The absence of a tag will be treated the same as using the tag "latest" which will pull an image from Dockerhub. Please note that "latest" doesn't mean newest tag, but rather "latest" is an alias for any image built without an explicit tag.

case Left(userProvided) if userProvided.tag.map(_ == "latest").getOrElse(true) =>
// Iff the image tag is "latest" explicitly (or implicitly because no tag is given at all), failing to pull will
// fail the whole container bringup process, because it is expected to pick up the very latest version every
// time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the very latest "untagged" version every time.

A `docker pull` can fail due to various reasons. One of them is network throttling by the image registry. Since we try to pull on each blackbox invocation, high volume load can cause lots of errors due to failing pulls unnecessarily.

Instead, we try to run the container even if the pull failed in the first place. If the image is available locally, the container will start just fine and recover the error gracefully. If the image is not available locally, the run will fail as well and return the same error as the docker pull would've returned.

This behavior will only be enabled for blackbox actions that specify a tag. Blackbox actions not using a tag *or* using "latest" as a tag will exhibit the very same behavior as today. That is: There will always be a pull before each container start and a failing pull will result in an error reported to the user. This is to enable rapid prototyping on images and enable determinism in the workflow. Updating the action will then force a pull and will fail early if that pull fails. With the new behavior, the developer might be surprised by the image not being updated because the pull is swallowed.

For production workload it is considered best-practice to version images through labels, thus we can "safely" assume that we can fall back to a local image in case the pull fails.
@markusthoemmes
Copy link
Contributor Author

PG3 2559 🔵

@cbickel cbickel merged commit 8b5abe7 into apache:master Jul 24, 2018
BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
)

A `docker pull` can fail due to various reasons. One of them is network throttling by the image registry. Since we try to pull on each blackbox invocation, high volume load can cause lots of errors due to failing pulls unnecessarily.

Instead, we try to run the container even if the pull failed in the first place. If the image is available locally, the container will start just fine and recover the error gracefully. If the image is not available locally, the run will fail as well and return the same error as the docker pull would've returned.

This behavior will only be enabled for blackbox actions that specify a tag. Blackbox actions not using a tag *or* using "latest" as a tag will exhibit the very same behavior as today. That is: There will always be a pull before each container start and a failing pull will result in an error reported to the user. This is to enable rapid prototyping on images and enable determinism in the workflow. Updating the action will then force a pull and will fail early if that pull fails. With the new behavior, the developer might be surprised by the image not being updated because the pull is swallowed.

For production workload it is considered best-practice to version images through labels, thus we can "safely" assume that we can fall back to a local image in case the pull fails.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker invoker review Review for this PR has been requested and yet needs to be done.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants