We call it Airflow Breeze as It's a Breeze to contribute to Airflow.
I have worked with Apache Airflow, the Breeze Docker environment and the pre-commit / prek hooks system.
https://github.com/apache/airflow/blob/main/dev/breeze/doc/README.rst
refs #2729
refs #2202
Gemini is AI and can make mistakes so double check it
What is Breeze
Apache Airflow Breeze is a Python-based CLI development environment designed to streamline workflows for project contributors and maintainers. It eliminates the need to manually manage complex Docker topologies and conflicting Python dependencies by encapsulating them into a single platform.
By utilizing the exact same Docker images and configuration matrices locally as the GitHub Actions CI pipelines, it ensures absolute environment parity. This structural consistency prevents environment drift and drastically reduces wasted CI compute cycles during pull request reviews.
Developers can instantly swap runtime environments, including various Python versions and backend databases like PostgreSQL or MySQL, using simple CLI flags. The environment isolates all external provider dependencies inside containers, keeping the host machine clean and automatically updating layers when files like pyproject.toml change.
Breeze also live-mounts the local host repository into the workspace, meaning any code edits made in an IDE reflect immediately inside the running container without requiring a rebuild. For testing frontend components, it features built-in background asset compilation utilities to dynamically handle UI modifications.
Code quality is strictly maintained through a unified static analysis framework powered by pre-commit for linting and security checks. Finally, it includes direct tooling to compile documentation locally and spin up Kubernetes test environments using Kind commands.
Blueprint for Porting Apache Airflow Breeze to Apache Sedona
Porting Apache Airflow Breeze (Airflow's containerized development environment) over to Apache Sedona creates a standardized, reproducible contributor environment.
Since Breeze is essentially a complex orchestration wrapper built on top of Docker, Docker Compose, and Python's click library, you aren't porting APIs; you are adapting its container architecture and CLI harness to handle Spark, Java/Scala dependencies, and geospatial libraries instead of Airflow backends.
1. Deconstruct the Architecture
Airflow Breeze works by building a heavyweight CI Docker image, mounting your local git repository into the container, and providing a unified CLI (./breeze) to handle testing, linting, and image building.
To port this concept to Sedona, you need to swap the dependencies:
- CLI Framework: Python
click -> Python click (retaining the framework)
- Containers: Webserver, Scheduler, Celery, Postgres/MySQL -> Spark Master, Spark Worker, Jupyter Notebook/Lab
- Base Image: Debian/Ubuntu + Python + Airflow system deps -> Ubuntu + Java (17/21) + Spark + Python (
uv/pip)
- Core Matrix: Python versions x Backend Databases -> Python versions x Spark versions x Scala versions
2. Step-by-Step Porting Guide
Step 1: Isolate the CLI Skeleton
Airflow's Breeze code lives entirely in dev/breeze/. You don't need the entire Airflow repository. You can copy the structure of dev/breeze/src/airflow_breeze/ into Sedona's repository (e.g., dev/sedona_breeze/).
Keep the foundational click structure, but strip out Airflow-specific commands like start-airflow, setup-kvm, or k8s. Focus on defining these core commands instead:
breeze shell – Drops the developer into an interactive shell inside a container pre-configured with Spark and Sedona.
breeze test – Runs pytest or sbt test inside the containerized environment.
breeze build-image – Compiles the local developer Docker image.
Step 2: Redesign the Dockerfile (Dockerfile.ci)
Airflow's Breeze Dockerfile is highly optimized for multi-stage caching. You will need to write a new Dockerfile.ci tailored for Sedona's unique dual-language setup (Java/Scala + Python).
Your base image layer needs to manage:
- Java Runtime: Install your required JDK (e.g., Temurin or Zulu OpenJDK 17).
- Spark Binaries: Download and unpack the targeted Apache Spark version matching your matrix.
- Geospatial Libraries: Install underlying native system dependencies if required (like
libgeos-dev or proj-bin for certain Python bindings), though Sedona handles most geometry primitives natively via JTS.
- Python Tooling: Install
uv or pip alongside the targeted Python version to quickly mount the apache-sedona Python package in editable mode (pip install -e ".[spark]").
Step 3: Map the Volume Mounts and Environments
The magic of Breeze is that code changes on your host machine instantly update inside the container. In your ported Docker Compose configuration (docker-compose.yaml generated dynamically by your script), make sure to map:
- The Sedona root repository to
/opt/sedona
- Local Maven/SBT caches (
~/.m2 and ~/.sbt) to container paths to prevent downloading JARs (like the geotools-wrapper) on every restart.
- Local
uv or pip cache directories.
Step 4: Adapt the Matrix Generation
Airflow Breeze uses environment variables to dynamically construct container tags and combinations (e.g., Python 3.9 + Postgres 15).
For Sedona, rewrite the shell and build parameters to accept a different matrix:
# Example of what your ported CLI options should look like
./breeze shell --python 3.11 --spark 3.5 --scala 2.12
Your Python script will catch these parameters and feed them as --build-arg strings to Docker:
SPARK_VERSION=${SPARK_VERSION}
SCALA_VERSION=${SCALA_VERSION}
3. Recommended Directory Blueprint
When restructuring the copied Breeze code inside Sedona, aim for this simplified file structure to keep your build system maintainable:
dev/sedona_breeze/
├── BREEZE.rst # Developer documentation
├── breeze # Main executable entrypoint script
├── pyproject.toml # Python dependencies for the CLI tool (Click, Rich)
├── setup.cfg
└── src/
└── sedona_breeze/
├── __init__.py
├── main.py # Root Click group configuration
├── commands/
│ ├── developer_commands.py # 'shell', 'test', 'jupyter' logic
│ └── ci_commands.py # 'build-image', 'pull-image' logic
└── utils/
├── docker_command_utils.py # Wrapper logic for spinning up docker/docker-compose
└── path_utils.py # Finds Sedona repository root paths
4. Key Pitfalls to Avoid
- Over-complicating the Backend: Airflow Breeze spins up multiple companion databases (Postgres, MySQL, Core, Celery queues). Sedona does not need this. Your Docker Compose file should be exceptionally lean—usually just a single container for a unified localized Spark environment, or a 3-node layout (1 Master, 2 Workers) if testing distributed spatial partitioning behavior.
- Ignoring the Fat JARs: Sedona relies on compiling Scala/Java code into shared shaded JARs (
sedona-spark-shaded). Ensure your breeze test or breeze shell commands contain an automated pre-hook step that executes sbt assembly or mvn clean package before firing up the Python environment, ensuring the spark.jars.packages configuration can find the updated local builds.
We call it Airflow Breeze as It's a Breeze to contribute to Airflow.
I have worked with Apache Airflow, the Breeze Docker environment and the pre-commit / prek hooks system.
https://github.com/apache/airflow/blob/main/dev/breeze/doc/README.rst
refs #2729
refs #2202
Gemini is AI and can make mistakes so double check it
What is Breeze
Apache Airflow Breeze is a Python-based CLI development environment designed to streamline workflows for project contributors and maintainers. It eliminates the need to manually manage complex Docker topologies and conflicting Python dependencies by encapsulating them into a single platform.
By utilizing the exact same Docker images and configuration matrices locally as the GitHub Actions CI pipelines, it ensures absolute environment parity. This structural consistency prevents environment drift and drastically reduces wasted CI compute cycles during pull request reviews.
Developers can instantly swap runtime environments, including various Python versions and backend databases like PostgreSQL or MySQL, using simple CLI flags. The environment isolates all external provider dependencies inside containers, keeping the host machine clean and automatically updating layers when files like
pyproject.tomlchange.Breeze also live-mounts the local host repository into the workspace, meaning any code edits made in an IDE reflect immediately inside the running container without requiring a rebuild. For testing frontend components, it features built-in background asset compilation utilities to dynamically handle UI modifications.
Code quality is strictly maintained through a unified static analysis framework powered by
pre-commitfor linting and security checks. Finally, it includes direct tooling to compile documentation locally and spin up Kubernetes test environments using Kind commands.Blueprint for Porting Apache Airflow Breeze to Apache Sedona
Porting Apache Airflow Breeze (Airflow's containerized development environment) over to Apache Sedona creates a standardized, reproducible contributor environment.
Since Breeze is essentially a complex orchestration wrapper built on top of Docker, Docker Compose, and Python's
clicklibrary, you aren't porting APIs; you are adapting its container architecture and CLI harness to handle Spark, Java/Scala dependencies, and geospatial libraries instead of Airflow backends.1. Deconstruct the Architecture
Airflow Breeze works by building a heavyweight CI Docker image, mounting your local git repository into the container, and providing a unified CLI (
./breeze) to handle testing, linting, and image building.To port this concept to Sedona, you need to swap the dependencies:
click-> Pythonclick(retaining the framework)uv/pip)2. Step-by-Step Porting Guide
Step 1: Isolate the CLI Skeleton
Airflow's Breeze code lives entirely in
dev/breeze/. You don't need the entire Airflow repository. You can copy the structure ofdev/breeze/src/airflow_breeze/into Sedona's repository (e.g.,dev/sedona_breeze/).Keep the foundational
clickstructure, but strip out Airflow-specific commands likestart-airflow,setup-kvm, ork8s. Focus on defining these core commands instead:breeze shell– Drops the developer into an interactive shell inside a container pre-configured with Spark and Sedona.breeze test– Runspytestorsbt testinside the containerized environment.breeze build-image– Compiles the local developer Docker image.Step 2: Redesign the Dockerfile (
Dockerfile.ci)Airflow's Breeze Dockerfile is highly optimized for multi-stage caching. You will need to write a new
Dockerfile.citailored for Sedona's unique dual-language setup (Java/Scala + Python).Your base image layer needs to manage:
libgeos-devorproj-binfor certain Python bindings), though Sedona handles most geometry primitives natively via JTS.uvorpipalongside the targeted Python version to quickly mount theapache-sedonaPython package in editable mode (pip install -e ".[spark]").Step 3: Map the Volume Mounts and Environments
The magic of Breeze is that code changes on your host machine instantly update inside the container. In your ported Docker Compose configuration (
docker-compose.yamlgenerated dynamically by your script), make sure to map:/opt/sedona~/.m2and~/.sbt) to container paths to prevent downloading JARs (like thegeotools-wrapper) on every restart.uvorpipcache directories.Step 4: Adapt the Matrix Generation
Airflow Breeze uses environment variables to dynamically construct container tags and combinations (e.g., Python 3.9 + Postgres 15).
For Sedona, rewrite the shell and build parameters to accept a different matrix:
# Example of what your ported CLI options should look like ./breeze shell --python 3.11 --spark 3.5 --scala 2.12Your Python script will catch these parameters and feed them as
--build-argstrings to Docker:SPARK_VERSION=${SPARK_VERSION}SCALA_VERSION=${SCALA_VERSION}3. Recommended Directory Blueprint
When restructuring the copied Breeze code inside Sedona, aim for this simplified file structure to keep your build system maintainable:
4. Key Pitfalls to Avoid
sedona-spark-shaded). Ensure yourbreeze testorbreeze shellcommands contain an automated pre-hook step that executessbt assemblyormvn clean packagebefore firing up the Python environment, ensuring thespark.jars.packagesconfiguration can find the updated local builds.