Skip to content

Openlineage: read API key auth from Airflow connection#66342

Draft
VladaZakharova wants to merge 1 commit into
apache:mainfrom
VladaZakharova:op-transport-fix
Draft

Openlineage: read API key auth from Airflow connection#66342
VladaZakharova wants to merge 1 commit into
apache:mainfrom
VladaZakharova:op-transport-fix

Conversation

@VladaZakharova
Copy link
Copy Markdown
Contributor

Adds OpenLineage HTTP API key authentication from Airflow connection.

Users now can configure connection like this:
{"type":"http","url":"https://openlineage.example.com","auth":{"type":"airflow_connection_api_key","conn_id":"openlineage_default"}}

The token now is read from the connection password, or from connection extra keys such as apiKey, api_key, token, or access_token. This token is used in constructing new valid connection under the hood.

This option will help creating connections without exposing tokens in Airflow connection configs.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@kacpermuda
Copy link
Copy Markdown
Collaborator

I see the value in using Airflow connections for OpenLineage configuration, but I'd suggest expanding the scope beyond just auth tokens for HTTP transport (that are solving a very specific use case).

Instead of a narrow solution, what if we allow storing the entire OpenLineage config dict (f.e. what can be put in the yaml file, or even just transport config) in an Airflow connection, that we can add to get_openlineage_config as yet another config source checked. (It should probably have precedence over yaml file or env vars, but that's something we can decide later)

This approach would:

  • Work for any transport type, not just HTTP
  • Support composite transports (e.g., two HTTP transports with different auth)
  • Handle any OL config field, not just auth tokens
  • Be more flexible for future use cases

The current auth-token-only solution would miss users with composite transports or other config needs. Could you expand the scope to cover the full OL config dict?

cc @mobuchowski

@VladaZakharova
Copy link
Copy Markdown
Contributor Author

I see the value in using Airflow connections for OpenLineage configuration, but I'd suggest expanding the scope beyond just auth tokens for HTTP transport (that are solving a very specific use case).

Instead of a narrow solution, what if we allow storing the entire OpenLineage config dict (f.e. what can be put in the yaml file, or even just transport config) in an Airflow connection, that we can add to get_openlineage_config as yet another config source checked. (It should probably have precedence over yaml file or env vars, but that's something we can decide later)

This approach would:

  • Work for any transport type, not just HTTP
  • Support composite transports (e.g., two HTTP transports with different auth)
  • Handle any OL config field, not just auth tokens
  • Be more flexible for future use cases

The current auth-token-only solution would miss users with composite transports or other config needs. Could you expand the scope to cover the full OL config dict?

cc @mobuchowski

hi there!
okay, this sounds reasonable, i think it is worth trying
should i wait for @mobuchowski response? or i can go with implementation?

@kacpermuda
Copy link
Copy Markdown
Collaborator

should i wait for @mobuchowski response? or i can go with implementation?

Go ahead, I'll be happy to review the PR, then we can ask Maciej for merge as he has the power to do it.

@mobuchowski
Copy link
Copy Markdown
Contributor

@VladaZakharova Sorry I was going to reply but totally lost track somewhere. Yeah, IMO it would be great if we could cover the whole transport config.

An alternative would be to cover the subset - for example HTTP transport - in a way that would not conflict later, so use the same JSON structure.

@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label May 11, 2026
@VladaZakharova
Copy link
Copy Markdown
Contributor Author

hi there!
I tried to address your ideas, please check them when possible :)

Comment thread providers/openlineage/src/airflow/providers/openlineage/plugins/adapter.py Outdated
Comment thread providers/openlineage/src/airflow/providers/openlineage/conf.py
Comment thread providers/openlineage/src/airflow/providers/openlineage/plugins/adapter.py Outdated
Copy link
Copy Markdown
Collaborator

@kacpermuda kacpermuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few comments, the most important decision I think is "can we do openlineage conn type" and: should we?

yaml_config = self._read_yaml_config(openlineage_config_path)
if yaml_config is None:
return None
self._resolve_airflow_connection_auth(yaml_config)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should every resolve be guarded and actually fire only if there is an airflow connection defined? I think it fires every time now. We need to make sure that for users not setting up this conn, nothing changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we’re okay here. The resolver is called for each config source, but it only does anything when it finds an auth block with:

{"type": "airflow_connection_api_key"}

For normal OpenLineage config, like regular api_key auth, it does not read any Airflow connection and should behave the same as before. I added a test for that case to make sure we don’t accidentally change it later.

Comment thread providers/openlineage/src/airflow/providers/openlineage/plugins/adapter.py Outdated
Comment thread providers/openlineage/src/airflow/providers/openlineage/token_provider.py Outdated
Comment thread providers/openlineage/src/airflow/providers/openlineage/token_provider.py Outdated

return None

def _validate_config(self, config: Any) -> dict[str, Any]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wondering if there is any way we can use OL client initialization as validation here to avoid duplicate check logic. If not it's fine, we'll extend this validation in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about using OpenLineageClient(config=...) for this, but I think it would be a bit too heavy for validation here. It would create the client/transport once just to check the config, and then we would create it again later in the adapter.

So for now I kept this check very small: the Airflow connection extra must be a JSON object with a transport object. The OpenLineage client still does the real transport/auth validation when it is created. If the OpenLineage client gets a dedicated validation method later, we can switch to that.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense !

@kacpermuda
Copy link
Copy Markdown
Collaborator

Hey @VladaZakharova , I see some comments being resolved or marked as addressed, but see no recent commits. Is all the relevant code pushed as intended?

@VladaZakharova
Copy link
Copy Markdown
Contributor Author

Hey @VladaZakharova , I see some comments being resolved or marked as addressed, but see no recent commits. Is all the relevant code pushed as intended?

not really, i forgot to push my changes :D

Copy link
Copy Markdown
Collaborator

@kacpermuda kacpermuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good now ! Left two more nit comments about docstring improvement and one test we should add.

Comment thread providers/openlineage/src/airflow/providers/openlineage/token_provider.py Outdated

return None

def _validate_config(self, config: Any) -> dict[str, Any]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense !

Comment thread providers/openlineage/tests/unit/openlineage/test_token_provider.py
@potiuk potiuk removed the ready for maintainer review Set after triaging when all criteria pass. label May 18, 2026
@potiuk potiuk marked this pull request as draft May 18, 2026 10:48
@potiuk
Copy link
Copy Markdown
Member

potiuk commented May 18, 2026

@VladaZakharova — Removing the ready for maintainer review label and converting back to draft. CI has failed since the label was added (4 failing checks).

The label's contract is that the PR is ready for maintainer review — a regression like this means the PR temporarily isn't. Check the failing checks, fix the regression, push, then mark "Ready for review" again to re-enter the queue.

See the Pull Request quality criteria.

No rush.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@kacpermuda
Copy link
Copy Markdown
Collaborator

I think this is ready to go out of draft, tests are passing, code looks good, one final review and merge from Maciej is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants