Skip to content

Commit

Permalink
Added support for JDK17
Browse files Browse the repository at this point in the history
  • Loading branch information
da115115 committed Jan 17, 2024
1 parent ed87a9e commit a2607a4
Show file tree
Hide file tree
Showing 3 changed files with 56 additions and 33 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
build_base_images:
strategy:
matrix:
jdk_version: [8, 11]
jdk_version: [8, 11, 17]
environment: docker-hub
runs-on: ubuntu-latest
steps:
Expand Down Expand Up @@ -102,8 +102,8 @@ jobs:
needs: build_base_images
strategy:
matrix:
jdk_version: [8, 11]
python_micro_version: [3.8.16, 3.9.16, 3.10.11, 3.11.3 ] # Use the latest micro versions of each minor version
jdk_version: [8, 11, 17]
python_micro_version: [3.8.18, 3.9.18, 3.10.13, 3.11.7, 3.12.1] # Use the latest micro versions of each minor version

environment: docker-hub
runs-on: ubuntu-latest
Expand Down
32 changes: 20 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,24 +91,32 @@ applications / Data Processing Pipeline (DPP).
+ Docker Cloud:
https://cloud.docker.com/u/infrahelpers/repository/docker/artificialintelligence/python-light
* [Native Python OCI images](https://github.com/docker-library/python):
+ [Python 3.12](https://github.com/docker-library/python/tree/master/3.12-rc)
- https://github.com/docker-library/python/tree/master/3.12-rc/buster
+ [Python 3.13-rc](https://github.com/docker-library/python/tree/master/3.13-rc)
- https://github.com/docker-library/python/tree/master/3.13-rc/bookworm
+ [Python 3.12](https://github.com/docker-library/python/tree/master/3.12)
- https://github.com/docker-library/python/tree/master/3.12/bookworm
+ [Python 3.11](https://github.com/docker-library/python/tree/master/3.11)
- https://github.com/docker-library/python/tree/master/3.11/buster
- https://github.com/docker-library/python/tree/master/3.11/bookworm
+ [Python 3.10](https://github.com/docker-library/python/tree/master/3.10)
- https://github.com/docker-library/python/tree/master/3.10/buster
- https://github.com/docker-library/python/tree/master/3.10/bookworm
+ [Python 3.9](https://github.com/docker-library/python/tree/master/3.9)
- https://github.com/docker-library/python/tree/master/3.9/buster
- https://github.com/docker-library/python/tree/master/3.9/bookworm
* AWS cloud:
[GitHub - Data Engineering Helpers - Knowledge Sharing - AWS](https://github.com/data-engineering-helpers/ks-cheat-sheets/blob/main/clouds/aws/)
* Kubenertes:
[GitHub - Data Engineering Helpers - Knowledge Sharing - Kubernetes (k8s)](https://github.com/data-engineering-helpers/ks-cheat-sheets/blob/main/frameworks/k8s/)

# Simple use
* Download the Docker images:
* Download the Docker images
+ JDK17:
```bash
$ docker pull infrahelpers/dpp:jdk17-python3.9
docker pull infrahelpers/dpp:jdk17-sbt1.9.8
```
+ JDK11:
```bash
$ docker pull infrahelpers/dpp:jdk11-python3.9
docker pull infrahelpers/dpp:jdk11-sbt1.9.7
docker pull infrahelpers/dpp:jdk11-sbt1.9.8
```

* Launch a Spark application:
Expand All @@ -128,23 +136,23 @@ $ cd dpp
* Build the OCI images (here with Docker, but any other tool may be used):
+ Setup the requested versions for the various stacks:
```bash
$ export JDK_VERSION="11" # or "8"
$ export JDK_VERSION="17" # or "11" or "8"
export PYTHON_MINOR_VERSION="3.9"
export PYTHON_MICRO_VERSION="3.9.18"
export SBT_VERSION="1.9.7"
export SBT_VERSION="1.9.8"
```
+ Amazon Linux 2 for Elastic Map Reduce (EMR) 6 and DataBricks base image:
+ Amazon Linux 2023 (AL2023) for Elastic Map Reduce (EMR) 7.x and DataBricks base image:
```bash
$ docker build -t infrahelpers/dpp:jdk$JDK_VERSION --build-arg JDK_VERSION=$JDK_VERSION corretto-emr-dbs-universal-base
```
+ Amazon Linux 2 for Elastic Map Reduce (EMR) 6 and DataBricks
+ Amazon Linux 2023 (AL2023) for Elastic Map Reduce (EMR) 7.x and DataBricks
with a single Python installation, with more freedom on its version:
```bash
$ docker build -t infrahelpers/dpp:jdk$JDK_VERSION-python$PYTHON_MINOR_VERSION --build-arg JDK_VERSION=$JDK_VERSION --build-arg PYTHON_MINOR_VERSION=$PYTHON_MINOR_VERSION --build-arg PYTHON_MICRO_VERSION=$PYTHON_MICRO_VERSION corretto-emr-dbs-universal-pyspark
docker tag infrahelpers/dpp:jdk$JDK_VERSION-python$PYTHON_MINOR_VERSION infrahelpers/dpp:jdk$JDK_VERSION-python$PYTHON_MICRO_VERSION
docker tag infrahelpers/dpp:jdk$JDK_VERSION-python$PYTHON_MINOR_VERSION infrahelpers/dpp:jdk$JDK_VERSION-python
```
+ Amazon Linux 2 for Elastic Map Reduce (EMR) 6 and DataBricks
+ Amazon Linux 2023 (AL2023) for Elastic Map Reduce (EMR) 7.x and DataBricks
with SBT and Scala, with more freedom on its version:
```bash
$ docker build -t infrahelpers/dpp:jdk$JDK_VERSION-sbt$SBT_VERSION --build-arg JDK_VERSION=$JDK_VERSION --build-arg SBT_VERSION=$SBT_VERSION corretto-emr-dbs-universal-spark-scala
Expand Down
51 changes: 33 additions & 18 deletions corretto-emr-dbs-universal-base/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
# On Docker Hub: https://hub.docker.com/repository/docker/infrahelpers/dpp/general
# Convention for the tags of the generated images:
# * infrahelpers/dpp:jdk{JDK_VERSION} e.g.:
# * infrahelpers/dpp:jdk8
# * infrahelpers/dpp:jdk17
# * infrahelpers/dpp:jdk11
# * infrahelpers/dpp:jdk8
#
# Base image for Data Processing Pipelines (DPP), with images
# for specific Python versions
Expand All @@ -16,29 +17,33 @@
# A pristine Python installation is performed by the downstream images
# (see the pyspark-py3X/ directories), with specific versions.
# Note that:
# * DataBricks uses Python 3.8 internally by default
# * AWS EMR uses Python 3.7.16 by default
# * DataBricks uses Python 3.10 internally by default
# * AWS EMR uses Python 3.9 by default
#
# AWS Corretto / EMR
# ==================
# + https://docs.aws.amazon.com/corretto/latest/corretto-8-ug/what-is-corretto-8.html
# + https://docs.aws.amazon.com/corretto/latest/corretto-11-ug/docker-install.html
# + https://docs.aws.amazon.com/corretto/latest/corretto-17-ug/docker-install.html
# The underlying operating system (OS) is Amazon Linux 2, i.e., based on a
# RedHat Linux 7 with some Amazon specific additions.
# The Python version is 3.7.15 by default, if installed with the Linux
# + https://docs.aws.amazon.com/corretto/latest/corretto-11-ug/docker-install.html
# + https://docs.aws.amazon.com/corretto/latest/corretto-8-ug/what-is-corretto-8.html
# The underlying operating system (OS) is:
# * Amazon Linux 2023 (AL2023) for Corretto 17, based on a
# RedHat Linux 8 with some Amazon specific additions.
# * Amazon Linux 2 for Corretto 11 and Corretto 8, based on a
# RedHat Linux 7 with some Amazon specific additions.
# The Python version is 3.9 by default (on Corretto 17), if installed with the Linux
# distribution.
# Note that, up to at least version 6.9.0 of EMR, only Java 8 is supported.
# With Java 11+, it generates errors like
# Note that, for EMR versions lower than 6.x, only Java 8 was supported.
# With Java 11+, it generated errors like
# https://confluence.atlassian.com/confkb/unrecognized-jvm-gc-options-when-using-java-11-1002472841.html
# From EMR 7.x, Java 17 is now fully supported.
#
# DataBricks
# ==========
# + Base image Dockerfile: https://github.com/databricks/containers/tree/master/ubuntu/standard
# + Base image on Docker Hub: https://hub.docker.com/r/databricksruntime/standard
# - Usual Docker tag: latest
#
# The underlying operating system (OS) is Ubuntu 18.04 LTS (Bionic Beaver).
# The underlying operating system (OS) is Ubuntu 22.04 LTS (Jammy Jellyfish).
# The Python installation has to be a virtual environment in
# /databricks/python3, and Python is the main one (pristine, installed manually
# by that container image)
Expand Down Expand Up @@ -68,26 +73,36 @@ RUN yum -y install procps net-tools hostname iproute coreutils \
zlib-devel bzip2-devel gzip \
openssl11-libs openssl11-devel \
autoconf automake libtool m4 gcc gcc-c++ cmake cmake3 libffi-devel \
readline-devel sqlite-devel jq fuse fuse-libs && \
readline-devel sqlite-devel fuse fuse-libs && \
yum clean all

#
WORKDIR $HOME

# Install a newer jq version (as jq v1.5 seems to have some issues)
RUN JQ_VER=$(curl -Ls https://api.github.com/repos/jqlang/jq/releases/latest|grep 'tag_name' | cut -d'-' -f2,2 | cut -d'"' -f1,1) && \
curl -Ls \
https://github.com/jqlang/jq/releases/download/jq-${JQ_VER}/jq-linux-amd64 -o /usr/local/bin/jq && \
chmod +x /usr/local/bin/jq

# yq, the YAML CLI utility like jq, for YAML (https://github.com/mikefarah/yq)
RUN YQ_VER=$(curl -Ls https://api.github.com/repos/mikefarah/yq/releases/latest | grep 'tag_name' | cut -d'v' -f2 | cut -d'"' -f1) && \
curl -Ls \
https://github.com/mikefarah/yq/releases/download/v${YQ_VER}/yq_linux_amd64 -o /usr/local/bin/yq && \
chmod +x /usr/local/bin/yq

# Cloud helpers Shell scripts (https://github.com/cloud-helpers/k8s-job-wrappers)
RUN KJW_VER=$(curl -Ls https://api.github.com/repos/cloud-helpers/k8s-job-wrappers/tags|jq -r '.[].name'|grep "^v"|sort -r|head -1|cut -d'v' -f2,2) && \
echo "KJW_VER=${KJW_VER}" && \
curl -Ls \
https://github.com/cloud-helpers/k8s-job-wrappers/archive/refs/tags/v${KJW_VER}.tar.gz \
-o k8s-job-wrappers.tar.gz && \
https://github.com/cloud-helpers/k8s-job-wrappers/archive/refs/tags/v${KJW_VER}.tar.gz -o k8s-job-wrappers.tar.gz && \
tar zxf k8s-job-wrappers.tar.gz && rm -f k8s-job-wrappers.tar.gz && \
mv -f k8s-job-wrappers-${KJW_VER} /usr/local/ && \
ln -s /usr/local/k8s-job-wrappers-${KJW_VER} /usr/local/k8s-job-wrappers

# AWS: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html
RUN curl -Ls https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip \
-o awscliv2.zip && \
unzip -q awscliv2.zip && rm -f awscliv2.zip && ./aws/install && \
RUN curl -Ls https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip -o awscliv2.zip && \
unzip -q awscliv2.zip && rm -f awscliv2.zip && ./aws/install && \
rm -rf ./aws

# SAML-to-AWS (saml2aws)
Expand All @@ -97,7 +112,7 @@ RUN SAML2AWS_VER=$(curl -Ls https://api.github.com/repos/Versent/saml2aws/releas
https://github.com/Versent/saml2aws/releases/download/v${SAML2AWS_VER}/saml2aws_${SAML2AWS_VER}_linux_amd64.tar.gz -o saml2aws.tar.gz && \
tar zxf saml2aws.tar.gz && rm -f saml2aws.tar.gz README.md LICENSE.md && \
mv -f saml2aws /usr/local/bin/ && \
chmod 775 /usr/local/bin/saml2aws
chmod +x /usr/local/bin/saml2aws

# Copy configuration in the user home, for the root user
ADD bashrc $HOME/.bashrc
Expand Down

0 comments on commit a2607a4

Please sign in to comment.