Skip to content

Support Alpine-based docker base image #822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 22, 2020

Conversation

jackyzha0
Copy link
Contributor

@jackyzha0 jackyzha0 commented Jun 19, 2020

Description

image

Use of venv and multi-stage builds to shave cost of building dependencies.

Motivation and Context

  • Should fix Add support for Alpine based docker image #760
  • Reduce already thicc Docker images to a more reasonable size (on Iris example, reduced image size from ~1.5GB to ~0.5GB or a 300% reduction)
  • Add more documentation around using alternative base images

How Has This Been Tested?

  • Tested using basic Iris Classifier example with @env(auto_pip_dependencies=True, docker_base_image="bentoml/model-server:0.8.12-alpine")
  • Ran both full-size and alpine images to ensure functionality remained consistent

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Documentation
  • Test, CI, or build
  • None

Components (if applicable)

  • BentoService (model packaging, dependency management, handler definition)
  • Model Artifact (model serialization, multi-framework support)
  • Model Server (mico-batching, logging, metrics, tracing, benchmark, OpenAPI)
  • YataiService (model management, deployment automation)
  • Docker related
  • Documentation

Checklist:

  • My code follows the bentoml code style, both ./dev/format.sh and
    ./dev/lint.sh script have passed
    (instructions).
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • My change requires a change in bentoml/gallery example notebooks
  • I have sent a pull request to bentoml/gallery to make that change

parano and others added 3 commits June 9, 2020 21:20
* make sagemaker docker image have same file structure

* use consistent file names

* move operator code out of __init__ to avoid loading unused code in model server startup

* refactor deployment validator

* reorganize bento repository code

* deployment valiator test&linting error fix

* more repository code cleanup

* renaming and adding inline comments

* move out lambda operator code to separate file
@parano
Copy link
Member

parano commented Jun 19, 2020

Hi @jackyzha0, just adding some context here:

In the default docker base image, we use conda to ensure the python version matches the saved bundle's python version(from what python version the bundle is created). The python version can be found in the bentoml.yml file under the saved bundle directory, as well as the environments.yml(conda environment config file). Besides installing the right python version, conda does make installing some packages a lot easier, (e.g. H2O for example) and some companies might be already using conda for dependency management.

Although with bentoml, It is possible to use a docker base image without conda installed, it will just ignore the environment.yml. And in that case, if the user has specified conda dependencies with their BentoService class, those dependencies will not be installed.

With Alpine based docker image, I think there are two approaches here:

  1. Install conda in the alpine base image and use conda to install the right python version as well as conda dependencies. The downside is, an alpine image with conda install pretty much lost the size advantage IIRC, it is not that different from the default docker base image.

  2. Ignore conda dependencies(we will just tell the users that conda is not available when using alpine based image) and publish one alpine base images for every major python versions that BentoML support, e.g. bentoml/model-server:0.8.1-alpine-py36, bentoml/model-server:0.8.1-alpine-py37, bentoml/model-server:0.8.1-alpine-py38

There might be other ways to do it, let me know if you'd like to discuss more.

@jackyzha0
Copy link
Contributor Author

Hi @jackyzha0, just adding some context here:

In the default docker base image, we use conda to ensure the python version matches the saved bundle's python version(from what python version the bundle is created). The python version can be found in the bentoml.yml file under the saved bundle directory, as well as the environments.yml(conda environment config file). Besides installing the right python version, conda does make installing some packages a lot easier, (e.g. H2O for example) and some companies might be already using conda for dependency management.

Although with bentoml, It is possible to use a docker base image without conda installed, it will just ignore the environment.yml. And in that case, if the user has specified conda dependencies with their BentoService class, those dependencies will not be installed.

With Alpine based docker image, I think there are two approaches here:

  1. Install conda in the alpine base image and use conda to install the right python version as well as conda dependencies. The downside is, an alpine image with conda install pretty much lost the size advantage IIRC, it is not that different from the default docker base image.
  2. Ignore conda dependencies(we will just tell the users that conda is not available when using alpine based image) and publish one alpine base images for every major python versions that BentoML support, e.g. bentoml/model-server:0.8.1-alpine-py36, bentoml/model-server:0.8.1-alpine-py37, bentoml/model-server:0.8.1-alpine-py38

There might be other ways to do it, let me know if you'd like to discuss more.

Heyo @parano,

Thanks for the context! That does make a lot more sense. I actually did try the first approach, but ran into a lot of issues installing wheels as they are built for glibc but Alpine uses musl. If we did go down this path, this would require having a multi-stage build that rebuilds every requirement for musl which would completely kill image build times. Even after ignoring some of those build steps, the final image is actually ~1.2GB which is still way too large given that the Alpine image is supposed to be slim right 😅

I think option 2 will be the best way to go about doing it and just build Alpine versions for every major Python release. I was planning on writing up some more docs for this today. Anything major you'd like me to include?

@jackyzha0 jackyzha0 marked this pull request as ready for review June 20, 2020 16:42
pip_dependencies=['pandas'],
conda_channels=['h2oai'],
conda_dependencies=['h2o==3.24.0.2'],
docker_base_image="bentoml/model-server:0.8.12-alpine3.7"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you make sure this base image here in the example matches the name we have in the docker release script? bentoml/model-server:0.8.12-alpine3.7?

I think you may also want to rename the docker file here to Dockerfile-alpine-py36, Dockerfile-alpine-py37.. and add them to the docker/model-server/release.sh script.

We currently run that release script after every PyPI release, which pushes the new image to docker hub under, e.g. bentoml/model-server:0.8.2-alpine-py36.

@jackyzha0 jackyzha0 requested a review from parano June 21, 2020 05:14
@codecov
Copy link

codecov bot commented Jun 21, 2020

Codecov Report

Merging #822 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #822   +/-   ##
=======================================
  Coverage   56.41%   56.41%           
=======================================
  Files         116      116           
  Lines        8608     8609    +1     
=======================================
+ Hits         4856     4857    +1     
  Misses       3752     3752           
Impacted Files Coverage Δ
bentoml/saved_bundle/bundler.py 93.90% <100.00%> (+0.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8741efd...f099142. Read the comment docs.

@jackyzha0
Copy link
Contributor Author

@parano Just added a section to release.sh to iterate over a list of Python versions and passes that into the Docker-alpine image as a build arg to save on code duplication! Should do the trick. I tested locally without the push step and it works

-t bentoml/model-server:"$BENTOML_VERSION"-alpine \
PYTHON_MAJOR_VERSIONS=(3.6 3.7 3.8)
echo "Building Alpine based docker base images for ${PYTHON_MAJOR_VERSIONS[*]}"
for version in "${PYTHON_MAJOR_VERSIONS[@]}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@parano
Copy link
Member

parano commented Jun 22, 2020

Could you rename the image tag from bentoml/model-server:0.8.2-alpine36 to bentoml/model-server:0.8.2-alpine-py36 instead? It seems py36 is the convention that many other python projects in this space are following, e.g. pytorch, tensorflow for example.

Otherwise, this looks great @jackyzha0!

@parano parano changed the title Fully implement Docker "Alpine" Support Alpine-based docker base image Jun 22, 2020
@parano
Copy link
Member

parano commented Jun 22, 2020

Thanks for updating the PR @jackyzha0! I will go ahead and run the release these docker images for 0.8.2 and do some more testing on my end!

@parano parano merged commit 04f8011 into bentoml:master Jun 22, 2020
@parano
Copy link
Member

parano commented Jun 22, 2020

Just published alpine based images for 0.8.1 on docker-hub https://hub.docker.com/repository/docker/bentoml/model-server

cc @bojiang - could you add a test case in the dockerized API server integration tests you are working on, to use alpine based docker image?

aarnphm pushed a commit to aarnphm/BentoML that referenced this pull request Jul 29, 2022
* Repository and Deployment refactor and cleanup (bentoml#771)

* make sagemaker docker image have same file structure

* use consistent file names

* move operator code out of __init__ to avoid loading unused code in model server startup

* refactor deployment validator

* reorganize bento repository code

* deployment valiator test&linting error fix

* more repository code cleanup

* renaming and adding inline comments

* move out lambda operator code to separate file

* fixed dockerfile to use multi-stage builds

* add some docs for alternative docker images

* add bash script support for building multiple python versions

* add more docs

* update name of docker files to reflect naming conventions and fixed docs

Co-authored-by: Chaoyu <paranoyang@gmail.com>
Co-authored-by: cory <cory.massaro@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Alpine based docker image
3 participants