add WasbDagBundle to load Dags from Azure Blob Storage#67016
Conversation
|
I added a note in the PR about enhancing the docs. I think a couple of things that I'd like to know if I were a user setting this up from scratch.
|
|
These changes offer a nice starting point: https://github.com/apache/airflow/pull/66993/changes |
Thanks for the review. I'll add provider-level bundle docs for WASB following the pattern in #66993 (providers/microsoft/azure/docs/bundles/index.rst, Guides entry, cross-link to core Dag bundles), and cover auth, managed-identity permissions, container/prefix setup, networking, and reusing the same wasb Connection as in Dags. I'll push an update shortly. |
yuseok89
left a comment
There was a problem hiding this comment.
Left a couple of small comments inline.
Please take a look when you have a moment.
|
@Nishieee — There are 4 unresolved review thread(s) on this PR from @dominikhei, @jroachgolf84, @yuseok89. Could you either push a fix or reply in each thread explaining why the feedback doesn't apply? Once you believe the feedback is addressed, mark the thread as resolved so the reviewer isn't re-pinged needlessly. Thanks! Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
…e/bundles/wasb.py Co-authored-by: Yuseok Jo <yuseok89@gmail.com>
0144712 to
a16419b
Compare
| delete_stale=True, | ||
| ) | ||
|
|
||
| def view_url(self, version: str | None = None) -> str | None: |
There was a problem hiding this comment.
Can you include a screenshot of this in your PR?
There was a problem hiding this comment.
Or a video of this working.
| """ | ||
| container = self._get_container_client(container_name) | ||
| self.check_for_variable_type("container", container, ContainerClient) | ||
| return cast("ContainerClient", container).exists() |
There was a problem hiding this comment.
I don't love this pattern, but seems it's used quite a few other places :)
| # TODO: rework the interface as it might also return Awaitable | ||
| return blob_client.download_blob(offset=offset, length=length, **kwargs) # type: ignore[return-value] | ||
|
|
||
| def _sync_to_local_dir_delete_stale_local_files( |
There was a problem hiding this comment.
Seems like there are missing tests for this logic in .../hooks/wasb.py...
Summary
Adds a Dag bundle for Azure Blob Storage so Dags can be loaded from a container (with optional prefix), similar to
S3DagBundleandGCSDagBundle. IntroducesWasbDagBundle, extendsWasbHookwith container checks andsync_to_local_dir, registers the bundle in provider metadata, documents it indag-bundles.rst, and adds unit tests.Manual verification: Tested with Breeze against a real Azure storage account:
wasbconnection,dag_processor.dag_bundle_config_listpointing atWasbDagBundle, Dag parsed from blob and visible in the UI. PR includes screenshots (Azure container + Airflow Dags list).closes: #66987
Was generative AI tooling used to co-author this PR?
Generated-by: Cursor following the guidelines