AP-660: Adds static_files plugin to serve DAG run files#62
Conversation
01d3051 to
97efcbf
Compare
There was a problem hiding this comment.
Pull request overview
Introduces an Airflow FastAPI plugin for serving per-run “static” artifacts from a shared storage mount, and migrates the batch-image summary workflow from the legacy public/ directory approach to the new plugin-backed storage/ layout.
Changes:
- Added
static_filesAirflow plugin (config/helpers/routes + HTML template) with unit and e2e coverage. - Replaced
./public→./storagehost mount and removedpublic_dir()/public_path_to_url()utilities (and their unit tests). - Updated
summarise_jobDAG to write outputs under the plugin’s per-run directory and generate URLs viastatic_path_to_url.
Reviewed changes
Copilot reviewed 13 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
test/unit/test_storage.py |
Removes tests for deleted legacy public storage helpers. |
test/unit/test_static_files.py |
Adds unit tests for path safety, URL building, and route behavior. |
test/e2e/test_static_files.py |
Adds thin e2e checks for plugin discovery and auth gating in Airflow. |
storage/.keep |
Ensures storage/ exists in-repo for bind-mounting and tests. |
pyproject.toml |
Adds explicit fastapi/httpx test deps for plugin unit tests. |
plugins/static_files/templates/listing.html |
Adds directory-listing template. |
plugins/static_files/routes.py |
Implements file/directory serving routes with embed-mode MIME handling. |
plugins/static_files/helpers.py |
Adds safe path resolution, per-run dir creation, and path→URL mapping helpers. |
plugins/static_files/config.py |
Introduces plugin configuration via env vars + Airflow config fallback. |
plugins/static_files/__init__.py |
Registers the plugin’s FastAPI app, external view, and macros. |
mokelumne/util/storage.py |
Removes legacy public storage directory helpers. |
mokelumne/dags/summarise_job.py |
Migrates summary output to static-files storage + URL generation. |
mokelumne/dags/notify_user.py |
Removes unused legacy URL helper import after migration. |
docker-compose.yml |
Switches bind mount from ./public to ./storage. |
.gitignore |
Ignores storage/* while keeping storage/.keep; adds uv.lock. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4f6433b to
5faa082
Compare
| <dl> | ||
| <dt><a href="{public_path_to_url(proc_path)}">{proc_count} images processed</a></dt> | ||
| <dt><a href="{static_path_to_url(proc_path, embed=1)}">{proc_count} images processed</a></dt> | ||
| <dd>{proc_count - proc_failures} succeeded, {proc_failures} failed</dd> | ||
| <dt><a href="{public_path_to_url(fetched_path)}">{fetch_count} images fetched</a></dt> | ||
| <dt><a href="{static_path_to_url(fetched_path, embed=1)}">{fetch_count} images fetched</a></dt> | ||
| <dd>{fetch_success} succeeded, {fetch_failures} failed</dd> | ||
| <dt><a href="{public_path_to_url(skipped_path)}">{skip_count} records skipped</a></dt> | ||
| <dt><a href="{static_path_to_url(skipped_path, embed=1)}">{skip_count} records skipped</a></dt> | ||
| <dd>Records did not match filter criteria</dd> |
There was a problem hiding this comment.
agreed with copilot here. would these work as as a download in the embedded context if the links had target="_blank" as part of their attributes?
| raise HTTPException(status_code=403, detail="Path resolves outside the storage root") | ||
|
|
||
| if path.is_dir(): | ||
| index = path / "index.html" |
| from airflow.plugins_manager import AirflowPlugin | ||
|
|
||
| from static_files.config import ( | ||
| STATIC_FILES_DAG_RUN_TAB_LABEL, | ||
| STATIC_FILES_EMBED_PARAM, | ||
| STATIC_FILES_PLUGIN_NAME, | ||
| STATIC_FILES_ROOT, | ||
| STATIC_FILES_ROUTE, | ||
| STATIC_FILES_URL_PREFIX, | ||
| ) | ||
| from static_files.helpers import static_file_path, static_files_root, static_path_to_url | ||
| from static_files.routes import files_router |
There was a problem hiding this comment.
This isn't an issue and the tests all pass, ignore this.
There was a problem hiding this comment.
this doesn't have to happen now, but i'm curious if we could wrap the plugin into part of the mokelumne package... 🤔
5605f5e to
4e40811
Compare
| build: !reset | ||
| develop: !reset | ||
| image: ${DOCKER_APP_IMAGE} | ||
| volumes: !reset |
There was a problem hiding this comment.
Is there a way we can do &airflow-common-volume-overrides or such so that we don't duplicate this in every container? Just thinking for future maintainability, especially if we either change this volume, or add more in the future.
There was a problem hiding this comment.
I attempted a new override but that doesn't work — it doesn't propagate the !reset, so build is retained.
There was a problem hiding this comment.
So you can't do !reset + &common-overrides? A shame.
There was a problem hiding this comment.
@awilfox It seems like the best you can do is what I just refactored it to do:
imageandvolumescan be set in the override anchor.- Resetting
buildanddevelopstill needs to be done per-service.
Compose seems to push you towards "add-only". E.g. compose.yml would contain ONLY elements common to all environments, and we'd layer on compose.dev.yml to add build and develop and compose.ci.yml to add CI-specific volumes. Bigger change, though, not for this PR IMO (or maybe any).
| condition: service_healthy | ||
| entrypoint: /bin/bash | ||
| restart: on-failure:3 | ||
| # Runs at root to chown the storage directory. Necessary in GitHub Actions |
There was a problem hiding this comment.
| # Runs at root to chown the storage directory. Necessary in GitHub Actions | |
| # Runs as root to chown the storage directory. Necessary in GitHub Actions |
| <dl> | ||
| <dt><a href="{public_path_to_url(proc_path)}">{proc_count} images processed</a></dt> | ||
| <dt><a href="{static_path_to_url(proc_path, embed=1)}">{proc_count} images processed</a></dt> | ||
| <dd>{proc_count - proc_failures} succeeded, {proc_failures} failed</dd> | ||
| <dt><a href="{public_path_to_url(fetched_path)}">{fetch_count} images fetched</a></dt> | ||
| <dt><a href="{static_path_to_url(fetched_path, embed=1)}">{fetch_count} images fetched</a></dt> | ||
| <dd>{fetch_success} succeeded, {fetch_failures} failed</dd> | ||
| <dt><a href="{public_path_to_url(skipped_path)}">{skip_count} records skipped</a></dt> | ||
| <dt><a href="{static_path_to_url(skipped_path, embed=1)}">{skip_count} records skipped</a></dt> | ||
| <dd>Records did not match filter criteria</dd> |
jason-raitz
left a comment
There was a problem hiding this comment.
r+ assuming @awilfox's comments are handled. I don't see anything else glaring at me.
It will be great to have this implemented. Thank you for plumbing the depths of the Airflow plugin docs.
anarchivist
left a comment
There was a problem hiding this comment.
r+; all my comments are clarificatory and non-blocking.
| <dl> | ||
| <dt><a href="{public_path_to_url(proc_path)}">{proc_count} images processed</a></dt> | ||
| <dt><a href="{static_path_to_url(proc_path, embed=1)}">{proc_count} images processed</a></dt> | ||
| <dd>{proc_count - proc_failures} succeeded, {proc_failures} failed</dd> | ||
| <dt><a href="{public_path_to_url(fetched_path)}">{fetch_count} images fetched</a></dt> | ||
| <dt><a href="{static_path_to_url(fetched_path, embed=1)}">{fetch_count} images fetched</a></dt> | ||
| <dd>{fetch_success} succeeded, {fetch_failures} failed</dd> | ||
| <dt><a href="{public_path_to_url(skipped_path)}">{skip_count} records skipped</a></dt> | ||
| <dt><a href="{static_path_to_url(skipped_path, embed=1)}">{skip_count} records skipped</a></dt> | ||
| <dd>Records did not match filter criteria</dd> |
There was a problem hiding this comment.
agreed with copilot here. would these work as as a download in the embedded context if the links had target="_blank" as part of their attributes?
| from airflow.plugins_manager import AirflowPlugin | ||
|
|
||
| from static_files.config import ( | ||
| STATIC_FILES_DAG_RUN_TAB_LABEL, | ||
| STATIC_FILES_EMBED_PARAM, | ||
| STATIC_FILES_PLUGIN_NAME, | ||
| STATIC_FILES_ROOT, | ||
| STATIC_FILES_ROUTE, | ||
| STATIC_FILES_URL_PREFIX, | ||
| ) | ||
| from static_files.helpers import static_file_path, static_files_root, static_path_to_url | ||
| from static_files.routes import files_router |
There was a problem hiding this comment.
this doesn't have to happen now, but i'm curious if we could wrap the plugin into part of the mokelumne package... 🤔
| STATIC_FILES_ROUTE = os.getenv('STATIC_FILES_ROUTE', 'files') | ||
| """Name of the route added to the DAG Run view linked to the file server.""" | ||
|
|
||
| STATIC_FILES_URL_PREFIX = os.getenv('STATIC_FILES_URL_PREFIX', '/public') |
There was a problem hiding this comment.
nit: do we still want this fallback to be /public or should it be /storage?
There was a problem hiding this comment.
Or even ./files? I left it as public so older files would work, but come to think of it they wouldn't anyway because of the directory restructuring, so I agree that a change of some kind is in order.
|
|
||
| # URL prefix the plugin is registered under in Airflow (see | ||
| # STATIC_FILES_URL_PREFIX in plugins/static_files/config.py). | ||
| PLUGIN_URL_PREFIX = "/public" |
0112820 to
b339299
Compare
awilfox
left a comment
There was a problem hiding this comment.
r+, though I'm pretty sure this will break the DAG until it is fixed. I could write the small patch and push it to this branch if desired/needed.
9044d3f to
f96f2b7
Compare
- Adds a static_files plugin that serves files from ./storage (configurable). - Adds a "Files" tab to the DagRun page that shows files created for that run. Refactors existing URLs/storage dirs to use that new location. - Extracts Jinja2 templating for the main app into mokelumne.templates. Templates need to be rendered slightly differently for emails vs. on-disk indexes, and this cleans up that code.
f96f2b7 to
af29ca0
Compare
Summary
embedquery parameter that tells the file server to returnContent-Type: text/plainfor files that the browser would otherwise force to download (even withContent-Disposition: inline).Discussion: Moves file dir/URL helpers into the plugin
The methods used to locate served files on disk and by URL were previously part of the mokelumne.utils.helpers package, resulting in some duplication of configuration—you would have to set both MOKELUMNE_PUBLIC_* and the new STATIC_FILES_* variables correctly. I chose to consolidate that into the plugin so there'd be a single source of truth.