Skip to content

add WasbDagBundle to load Dags from Azure Blob Storage#67016

Open
Nishieee wants to merge 7 commits into
apache:mainfrom
Nishieee:feature/azure-blob-dag-bundle
Open

add WasbDagBundle to load Dags from Azure Blob Storage#67016
Nishieee wants to merge 7 commits into
apache:mainfrom
Nishieee:feature/azure-blob-dag-bundle

Conversation

@Nishieee
Copy link
Copy Markdown
Contributor

Summary

Adds a Dag bundle for Azure Blob Storage so Dags can be loaded from a container (with optional prefix), similar to S3DagBundle and GCSDagBundle. Introduces WasbDagBundle, extends WasbHook with container checks and sync_to_local_dir, registers the bundle in provider metadata, documents it in dag-bundles.rst, and adds unit tests.

Manual verification: Tested with Breeze against a real Azure storage account: wasb connection, dag_processor.dag_bundle_config_list pointing at WasbDagBundle, Dag parsed from blob and visible in the UI. PR includes screenshots (Azure container + Airflow Dags list).

closes: #66987


Was generative AI tooling used to co-author this PR?
  • Yes — Cursor (agent-assisted editing)
Screenshot 2026-05-15 at 7 12 30 PM Screenshot 2026-05-15 at 6 52 48 PM Screenshot 2026-05-15 at 6 52 39 PM

Generated-by: Cursor following the guidelines

Comment thread airflow-core/docs/administration-and-deployment/dag-bundles.rst
@jroachgolf84
Copy link
Copy Markdown
Collaborator

I added a note in the PR about enhancing the docs. I think a couple of things that I'd like to know if I were a user setting this up from scratch.

  • What authentication method do I use between Airflow and Microsoft?
  • What permissions does my managed identity need?
  • Are there any custom storage bucket/container configuration I need to keep in mind?
  • What about networking?
  • Can I use the same type of Connection that I'd use in a DAG (I know the answer is "yes", but it might be worth calling out)?

@jroachgolf84
Copy link
Copy Markdown
Collaborator

These changes offer a nice starting point: https://github.com/apache/airflow/pull/66993/changes

@Nishieee
Copy link
Copy Markdown
Contributor Author

These changes offer a nice starting point: https://github.com/apache/airflow/pull/66993/changes

Thanks for the review. I'll add provider-level bundle docs for WASB following the pattern in #66993 (providers/microsoft/azure/docs/bundles/index.rst, Guides entry, cross-link to core Dag bundles), and cover auth, managed-identity permissions, container/prefix setup, networking, and reusing the same wasb Connection as in Dags. I'll push an update shortly.

Copy link
Copy Markdown
Contributor

@yuseok89 yuseok89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of small comments inline.
Please take a look when you have a moment.

Comment thread providers/microsoft/azure/src/airflow/providers/microsoft/azure/bundles/wasb.py Outdated
Comment thread providers/microsoft/azure/src/airflow/providers/microsoft/azure/hooks/wasb.py Outdated
@potiuk
Copy link
Copy Markdown
Member

potiuk commented May 18, 2026

@Nishieee — There are 4 unresolved review thread(s) on this PR from @dominikhei, @jroachgolf84, @yuseok89. Could you either push a fix or reply in each thread explaining why the feedback doesn't apply? Once you believe the feedback is addressed, mark the thread as resolved so the reviewer isn't re-pinged needlessly. Thanks!


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@Nishieee Nishieee force-pushed the feature/azure-blob-dag-bundle branch from 0144712 to a16419b Compare May 19, 2026 17:28
delete_stale=True,
)

def view_url(self, version: str | None = None) -> str | None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include a screenshot of this in your PR?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a video of this working.

"""
container = self._get_container_client(container_name)
self.check_for_variable_type("container", container, ContainerClient)
return cast("ContainerClient", container).exists()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this pattern, but seems it's used quite a few other places :)

# TODO: rework the interface as it might also return Awaitable
return blob_client.download_blob(offset=offset, length=length, **kwargs) # type: ignore[return-value]

def _sync_to_local_dir_delete_stale_local_files(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there are missing tests for this logic in .../hooks/wasb.py...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add AzureBlobStorageDagBundle

5 participants