To build Data Cooker Dist, you need Java 11 and Apache Maven. Exact version of Maven is enforced in the project file, so please look into enforcer plugin section. For Java, Amazon's Corretto is the preferred distribution.
As a prerequisite, you need an artifact io.github.pastorgl.datacooker:config
from Data Cooker ETL available in your local Maven repo, of the same version. Refer there for build instructions.
There are two profiles to target AWS EMR production environment (EMR
— selected by default) and for local testing of ETL processes (local
), so you have to call
mvn clean package
or
mvn -Plocal clean package
to build a shaded executable 'Fat JAR' artifact, datacooker-dist.jar.
Currently supported version of EMR is 6.9. For local testing, Ubuntu 22.04 is recommended (either native or inside WSL).
As well as executable artifact, modular documentation is automatically built from the modules' metadata at docs directory, in both HTML (single-file and linked files) and PDF formats.