Openlineage: read API key auth from Airflow connection#66342
Openlineage: read API key auth from Airflow connection#66342VladaZakharova wants to merge 1 commit into
Conversation
87e982c to
ba31bf3
Compare
|
I see the value in using Airflow connections for OpenLineage configuration, but I'd suggest expanding the scope beyond just auth tokens for HTTP transport (that are solving a very specific use case). Instead of a narrow solution, what if we allow storing the entire OpenLineage config dict (f.e. what can be put in the yaml file, or even just transport config) in an Airflow connection, that we can add to This approach would:
The current auth-token-only solution would miss users with composite transports or other config needs. Could you expand the scope to cover the full OL config dict? cc @mobuchowski |
hi there! |
Go ahead, I'll be happy to review the PR, then we can ask Maciej for merge as he has the power to do it. |
|
@VladaZakharova Sorry I was going to reply but totally lost track somewhere. Yeah, IMO it would be great if we could cover the whole transport config. An alternative would be to cover the subset - for example HTTP transport - in a way that would not conflict later, so use the same JSON structure. |
ba31bf3 to
e1453ed
Compare
|
hi there! |
e1453ed to
3544c9d
Compare
kacpermuda
left a comment
There was a problem hiding this comment.
Added few comments, the most important decision I think is "can we do openlineage conn type" and: should we?
| yaml_config = self._read_yaml_config(openlineage_config_path) | ||
| if yaml_config is None: | ||
| return None | ||
| self._resolve_airflow_connection_auth(yaml_config) |
There was a problem hiding this comment.
Should every resolve be guarded and actually fire only if there is an airflow connection defined? I think it fires every time now. We need to make sure that for users not setting up this conn, nothing changes.
There was a problem hiding this comment.
I think we’re okay here. The resolver is called for each config source, but it only does anything when it finds an auth block with:
{"type": "airflow_connection_api_key"}
For normal OpenLineage config, like regular api_key auth, it does not read any Airflow connection and should behave the same as before. I added a test for that case to make sure we don’t accidentally change it later.
|
|
||
| return None | ||
|
|
||
| def _validate_config(self, config: Any) -> dict[str, Any]: |
There was a problem hiding this comment.
Hmm, wondering if there is any way we can use OL client initialization as validation here to avoid duplicate check logic. If not it's fine, we'll extend this validation in the future.
There was a problem hiding this comment.
I thought about using OpenLineageClient(config=...) for this, but I think it would be a bit too heavy for validation here. It would create the client/transport once just to check the config, and then we would create it again later in the adapter.
So for now I kept this check very small: the Airflow connection extra must be a JSON object with a transport object. The OpenLineage client still does the real transport/auth validation when it is created. If the OpenLineage client gets a dedicated validation method later, we can switch to that.
|
Hey @VladaZakharova , I see some comments being resolved or marked as addressed, but see no recent commits. Is all the relevant code pushed as intended? |
not really, i forgot to push my changes :D |
3544c9d to
c51b459
Compare
kacpermuda
left a comment
There was a problem hiding this comment.
Thanks, this looks good now ! Left two more nit comments about docstring improvement and one test we should add.
|
|
||
| return None | ||
|
|
||
| def _validate_config(self, config: Any) -> dict[str, Any]: |
|
@VladaZakharova — Removing the The label's contract is that the PR is ready for maintainer review — a regression like this means the PR temporarily isn't. Check the failing checks, fix the regression, push, then mark "Ready for review" again to re-enter the queue. See the Pull Request quality criteria. No rush. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
c51b459 to
ab537d5
Compare
ab537d5 to
b306dbb
Compare
|
I think this is ready to go out of draft, tests are passing, code looks good, one final review and merge from Maciej is needed. |
Adds OpenLineage HTTP API key authentication from Airflow connection.
Users now can configure connection like this:
{"type":"http","url":"https://openlineage.example.com","auth":{"type":"airflow_connection_api_key","conn_id":"openlineage_default"}}
The token now is read from the connection password, or from connection extra keys such as apiKey, api_key, token, or access_token. This token is used in constructing new valid connection under the hood.
This option will help creating connections without exposing tokens in Airflow connection configs.
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.