Skip to content

cloud-helpers/cloud-python-images

Repository files navigation

Container images focusing on Python tasks in the cloud

Docker Cloud Build Status

Overview

That project produces OCI (Docker-compliant) images, which provide Python environments, ready to use and to be deployed on the cloud, be it private or public (e.g., AWS, Azure, GCP). These images are based on the latest Python-ready Linux distributions.

As well explained in a PythonSpeed article from June 2023, the best light images for Python production operations are based on Debian (the latest stable release being, as of end-2023, Debian 12, also known as Bookworm).

These Python OCI images are aimed at deploying Data Science applications on operational environments such as cloud-based Kubernetes clusters or services (e.g., AWS EKS, Azure AKS, IBM/RedHat OpenShift v4 or Google GKE). Examples of Python deployments are API applications, typically with one of Flask, FastAPI or WSGI.

The author of this repository also maintains Data Science Python OCI images for every day development purposes in a dedicated GitHub repository and Docker Hub space. Thanks to Docker multi-stage builds, one can easily have in the same Docker specification files two images, one for every day data science work, and the other one to deploy the corresponding applications onto production environments.

The Docker images of this repository just add various utilities to make it work out of the box with databases (e.g., Oracle, MySQL/MariaDB, PostgreSQL) cloud vendors (e.g., Azure and AWS command-line utilities) and cloud-native tools (e.g., Pachyderm), on top of the native images maintained by the Docker Python project

In the OCI image, Python packages are installed by the pip utility. For testing purposes, outside of the container, Python virtual environments may be installed thanks to Pyenv and pipenv, as detailed in the dedicated procedure on the Python induction notebook sub-project.

Any additional Python module may be installed either:

  • With pip and some requirements.txt dependency specification file:
$ python3 -mpip install -r requirements.txt
  • In a dedicated virtual environment, controlled by pipenv through local Pipfile (and potentially Pipfile.lock) files, which should be versioned:
$ pipenv --rm; pipenv install; pipenv install --dev

On the other hand, the OCI images install those modules globally.

The Docker images of this repository are intended to run any API applications.

See also

Simple use

  • Download the Docker image:
$ docker pull infrahelpers/cloud-python
  • Launch Dash or Flask within the Docker image (where <port> corresponds to the local port on which Dash or Flask is launched; the default is 8050):
$ docker run -it infrahelpers/cloud-python

Build your own container image

$ mkdir -p ~/dev/infra && cd ~/dev/infra
$ git clone https://github.com/cloud-helpers/cloud-python.git
$ cd cloud-python
  • Build the OCI images (here with Docker, but any other tool may be used):
    • Python 3.12
      • py312-bookworm tag: Debian Bookworm / Python 3.12:
$ docker build -t infrahelpers/cloud-python:py312-bookworm python-3.12-bookworm
  • Python 3.11
    • py311-bookworm tag: Debian Bookworm / Python 3.11:
$ docker build -t infrahelpers/cloud-python:py311-bookworm python-3.11-bookworm
  • Python 3.10
    • py310-bookworm tag: Debian Bookworm / Python 3.10:
$ docker build -t infrahelpers/cloud-python:py310-bookworm python-3.10-bookworm
  • Python 3.9
    • py39-bookworm tag: Debian Bookworm / Python 3.9:
$ docker build -t infrahelpers/cloud-python:py39-bookworm python-3.9-bookworm
  • Python 3.8
    • py38-bookworm tag: Debian Bookworm / Python 3.8:
$ docker build -t infrahelpers/cloud-python:py38-bookworm python-3.8-bookworm
  • Deprecated. Use images for Data Processing Pipelines (DPP) instead (DPP images on Docker Hub and DPP images on GitHub)
    • Amazon Linux 2 for Elastic Map Reduce (EMR) 6 (system Python 3.7.15) and DataBricks (PyEnv-based Python 3.8), with JDK 8:
$ docker build -t infrahelpers/cloud-python:pyspark-emr-dbs pyspark-corretto-8-emr-dbs
  • Amazon Linux 2 for Elastic Map Reduce (EMR) 6, Python 3.7.15, with JDK 8:
$ docker build -t infrahelpers/cloud-python:pyspark-emr6 pyspark-emr-6-corretto-8
  • Amazon Linux 2 for Elastic Map Reduce (EMR) 6, Python 3.7.10, lighter image, with JDK 8:
$ docker build -t infrahelpers/cloud-python:pyspark-emr6-light pyspark-emr-6-corretto-8-light
  • [WIP] Amazon Linux 2 for Elastic Map Reduce (EMR), usually with Python 3.7.15 (as of end 2022), with JDK 11:
$ docker build -t infrahelpers/cloud-python:pyspark-emr-jdk11 pyspark-emr-corretto-11
  • In addition to what the Docker Hub builds, the CI/CD (GitHub Actions) pipeline also builds the infrahelpers/cloud-python:pyspark-emr-6-light-multi-platform image, from the pyspark-emr-6-corretto-11-light/ directory, on two CPU architectures, namely the classical AMD64 and the newer ARM64

  • (Optional) Push the newly built images to Docker Hub. That step is usually not needed, as the images are automatically built everytime there is a change on GitHub)

$ docker login
$ docker push infrahelpers/cloud-python:pyspark-emr-dbs
$ docker push infrahelpers/cloud-python:pyspark-emr-jdk11
$ docker push infrahelpers/cloud-python:pyspark-emr6
$ docker push infrahelpers/cloud-python:pyspark-emr6-light
$ docker push infrahelpers/cloud-python:py311-bookworm
$ docker push infrahelpers/cloud-python:py310-bookworm
$ docker push infrahelpers/cloud-python:py39-bookworm
$ docker push infrahelpers/cloud-python:py38-bookworm
  • Choose which image should be the latest, tag it and upload it to Docker Hub:
$ docker push infrahelpers/cloud-python:py311-bookworm
  • (Optional) Push the newly built images to Quay.io. That step is usually not needed, as the images are automatically built everytime there is a change on GitHub)
    • Login to Quay:
$ docker login quay.io
+ Python 3.12:
$ docker tag infrahelpers/cloud-python:py312-bookworm quay.io/infrahelpers/cloud-python:py312-bookworm
$ docker push quay.io/infrahelpers/cloud-python:py312-bookworm
+ Python 3.11:
$ docker tag infrahelpers/cloud-python:py311-bookworm quay.io/infrahelpers/cloud-python:py311-bookworm
$ docker push quay.io/infrahelpers/cloud-python:py311-bookworm
+ Python 3.10:
$ docker tag infrahelpers/cloud-python:py310-bookworm quay.io/infrahelpers/cloud-python:py310-bookworm
$ docker push quay.io/infrahelpers/cloud-python:py310-bookworm
+ Python 3.9:
$ docker tag infrahelpers/cloud-python:py39-bookworm quay.io/infrahelpers/cloud-python:py39-bookworm
$ docker push quay.io/infrahelpers/cloud-python:py39-bookworm
+ Python 3.8:
$ docker tag infrahelpers/cloud-python:py38-bookworm quay.io/infrahelpers/cloud-python:py38-bookworm
$ docker push quay.io/infrahelpers/cloud-python:py38-bookworm
  • Shutdown the Docker image
$ docker ps
CONTAINER ID IMAGE                    COMMAND                   CREATED        STATUS        PORTS                  NAMES
7b69efc9dc9a ai/cloud-python        "/bin/sh -c 'python …"    48 seconds ago Up 47 seconds 0.0.0.0:9000->8050/tcp vigilant_merkle
$ docker kill vigilant_merkle
vigilant_merkle
$ docker ps
CONTAINER ID IMAGE                    COMMAND                   CREATED        STATUS        PORTS                  NAMES

About

Container images focusing on Python tasks in cloud environments

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published