Skip to content

Commit

Permalink
Added a few more tags
Browse files Browse the repository at this point in the history
  • Loading branch information
da115115 committed Nov 29, 2023
1 parent 2033aba commit f37699f
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 9 deletions.
8 changes: 7 additions & 1 deletion .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,9 @@ jobs:
build-args: |
JDK_VERSION=${{ matrix.jdk_version }}
push: true
tags: infrahelpers/dpp:jdk${{ matrix.jdk_version }}
tags: |
infrahelpers/dpp:jdk${{ matrix.jdk_version }}
infrahelpers/dpp:latest
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: linux/amd64,linux/arm64/v8
Expand Down Expand Up @@ -91,6 +93,7 @@ jobs:
tags: |
infrahelpers/dpp:jdk8-sbt${{ matrix.sbt_version }}
infrahelpers/dpp:jdk8-sbt${{ matrix.sbt_version }}-${{ env.SHA }}
infrahelpers/dpp:jdk8-sbt
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: linux/amd64,linux/arm64/v8
Expand Down Expand Up @@ -144,8 +147,11 @@ jobs:
PYTHON_MICRO_VERSION=${{ matrix.python_micro_version }}
push: true
tags: |
infrahelpers/dpp:jdk${{ matrix.jdk_version }}-python${{ steps.extract_minor_version.outputs.minor_version }}
infrahelpers/dpp:jdk${{ matrix.jdk_version }}-python${{ steps.extract_minor_version.outputs.minor_version }}-${{ env.SHA }}
infrahelpers/dpp:jdk${{ matrix.jdk_version }}-python${{ matrix.python_micro_version }}
infrahelpers/dpp:jdk${{ matrix.jdk_version }}-python${{ matrix.python_micro_version }}-${{ env.SHA }}
infrahelpers/dpp:jdk${{ matrix.jdk_version }}-python
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: linux/amd64,linux/arm64/v8
Expand Down
33 changes: 25 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
Container images focusing on Data Processing Pipelines (DPP)
============================================================

# Table of Content (ToC)
* [Overview](#overview)
* [See also](#see-also)
* [Simple use](#simple-use)
* [Build your own container image](#build-your-own-container-image)

Created by [gh-md-toc](https://github.com/ekalinin/github-markdown-toc.go)

# Overview
[That project](https://github.com/data-engineering-helpers/dpp-images)
produces [OCI](https://opencontainers.org/)
Expand All @@ -19,30 +27,30 @@ the OCI imaages are built and published on

These OCI images are aimed at deploying Data Engineering applications,
typically Data Processing Pipelines (DPP), on
[Modern Data Stack (MDS)](https://www.montecarlodata.com/blog-what-is-a-data-platform-and-how-to-build-one/)
[Modern Data Stack (MDS)](https://www.montecarlodata.com/blog-what-is-a-data-platform-and-how-to-build-one/).

The author of this repository also maintains general purpose cloud
The authors of this repository also maintain general purpose cloud
Python OCI images in a
[dedicated GitHub repository](https://github.com/cloud-helpers/cloud-python-images/)
and
[Docker Hub space](https://hub.docker.com/repository/docker/infrahelpers/cloud-python).

Thanks to
[Docker multi-stage builds](https://docs.docker.com/develop/develop-images/multistage-build/),
one can easily have in the same Docker specification files two images, one for
one can easily have in a same Docker specification file two images, one for
every day data engineering work, and the other one to deploy the corresponding
applications onto production environments.

The Docker images of this repository just add various utilities to make it
work out of the box with cloud vendors (_e.g._, Azure and AWS command-line
utilities) and cloud-native tools (_e.g._, Pachyderm), on top of the native
utilities) and cloud-native tools (_e.g._, S3-Mountpoint), on top of the native
images maintained by the
[AWS-supported Corretto](https://docs.aws.amazon.com/corretto/latest/corretto-8-ug/what-is-corretto-8.html).
They also add specific Python versions.

In the OCI image, Python packages are installed by the `pip` utility.
For testing purposes, outside of the container, Python virtual environments
may be installed thanks to Pyenv and `pipenv`, as detailed in the
may be installed thanks to PyEnv and `pipenv`, as detailed in the
[dedicated procedure](http://github.com/machine-learning-helpers/induction-python/tree/master/installation/virtual-env)
on the
[Python induction notebook sub-project](http://github.com/machine-learning-helpers/induction-python).
Expand Down Expand Up @@ -91,16 +99,21 @@ applications / Data Processing Pipeline (DPP).
- https://github.com/docker-library/python/tree/master/3.10/buster
+ [Python 3.9](https://github.com/docker-library/python/tree/master/3.9)
- https://github.com/docker-library/python/tree/master/3.9/buster
* AWS cloud:
[GitHub - Data Engineering Helpers - Knowledge Sharing - AWS](https://github.com/data-engineering-helpers/ks-cheat-sheets/blob/main/clouds/aws/)
* Kubenertes:
[GitHub - Data Engineering Helpers - Knowledge Sharing - Kubernetes (k8s)](https://github.com/data-engineering-helpers/ks-cheat-sheets/blob/main/frameworks/k8s/)

# Simple use
* Download the Docker image:
* Download the Docker images:
```bash
$ docker pull infrahelpers/dpp:py311
$ docker pull infrahelpers/dpp:jdk11-python3.9
docker pull infrahelpers/dpp:jdk11-sbt1.9.7
```

* Launch a Spark application:
```bash
$ docker run -it infrahelpers/dpp:311
$ docker run -it --rm infrahelpers/dpp:jdk11-python3.9
```

# Build your own container image
Expand Down Expand Up @@ -129,11 +142,13 @@ $ docker build -t infrahelpers/dpp:jdk$JDK_VERSION --build-arg JDK_VERSION=$JDK_
```bash
$ docker build -t infrahelpers/dpp:jdk$JDK_VERSION-python$PYTHON_MINOR_VERSION --build-arg JDK_VERSION=$JDK_VERSION --build-arg PYTHON_MINOR_VERSION=$PYTHON_MINOR_VERSION --build-arg PYTHON_MICRO_VERSION=$PYTHON_MICRO_VERSION corretto-emr-dbs-universal-pyspark
docker tag infrahelpers/dpp:jdk$JDK_VERSION-python$PYTHON_MINOR_VERSION infrahelpers/dpp:jdk$JDK_VERSION-python$PYTHON_MICRO_VERSION
docker tag infrahelpers/dpp:jdk$JDK_VERSION-python
```
+ Amazon Linux 2 for Elastic Map Reduce (EMR) 6 and DataBricks
with SBT and Scala, with more freedom on its version:
```bash
$ docker build -t infrahelpers/dpp:jdk$JDK_VERSION-sbt$SBT_VERSION --build-arg JDK_VERSION=$JDK_VERSION --build-arg SBT_VERSION=$SBT_VERSION corretto-emr-dbs-universal-spark-scala
docker tag infrahelpers/dpp:jdk$JDK_VERSION-sbt$SBT_VERSION infrahelpers/dpp:jdk$JDK_VERSION-sbt
```

* In addition to what the Docker Hub builds, the CI/CD (GitHub Actions)
Expand All @@ -153,7 +168,9 @@ $ docker login
docker push infrahelpers/dpp:jdk$JDK_VERSION
docker push infrahelpers/dpp:jdk$JDK_VERSION-python$PYTHON_MINOR_VERSION
docker push infrahelpers/dpp:jdk$JDK_VERSION-python$PYTHON_MICRO_VERSION
docker push infrahelpers/dpp:jdk$JDK_VERSION-python
docker push infrahelpers/dpp:jdk$JDK_VERSION-sbt$SBT_VERSION
docker push infrahelpers/dpp:jdk$JDK_VERSION-sbt
```

* Choose which image should be the latest, tag it and upload it to Docker Hub:
Expand Down

0 comments on commit f37699f

Please sign in to comment.