Skip to content

Implement pluggable connection policy#36291

Closed
BasPH wants to merge 4 commits intoapache:mainfrom
BasPH:get_connection_cluster_policy
Closed

Implement pluggable connection policy#36291
BasPH wants to merge 4 commits intoapache:mainfrom
BasPH:get_connection_cluster_policy

Conversation

@BasPH
Copy link
Contributor

@BasPH BasPH commented Dec 18, 2023

This PR implements a pluggable connection policy. This can apply validation rules e.g. only access to a pre-defined list of connection ids in a secrets backend, or apply some business logic such as prefixing the given connection id based on some property such as DAG owner.

Example:

def connection_policy(conn_id: str) -> str:
    """
    This connection policy prefixes the connection id with "bi/" in case the owner is set to "bi", and raises
    an AirflowClusterPolicyViolation in case an unknown owner is found.
    """
    dagowner_connid_prefixes = {"bi": "bi/"}

    from airflow.operators.python import get_current_context

    task_context = get_current_context()
    dag_owner = task_context["dag"].owner

    try:
        return dagowner_connid_prefixes[dag_owner] + conn_id
    except KeyError:
        raise AirflowClusterPolicyViolation(
            f"DAG owner '{dag_owner}' not found in list of owners to route a connection. "
            "Consult your cluster admin."
        )

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.



@local_settings_hookspec
def connection_policy(conn_id: str) -> str: # type: ignore[empty-body]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought for a second that maybe we should call this connection_id_policy or something because... e.g. with task policy or dag policy, you get and can mutate the actual object -- but here we only touch the conn_id. So the naming is perhaps slightly inconsistent but that's nit picking.

I suppose we could call it connection_retrieval_policy? Or leave it 🤷


To support connection id alterations at runtime (typically prefixes are added), a connection id must be
returned, even when not altering the connection id.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, but I also wonder whether we should add a disclaimer that this could provide a false sense of security? To me it makes sense to have this because people want something, and this is about as good as we can do right now. But, it's pretty easy to circumvent e.g. with secrets_backend_list[0].get_connection.

Copy link
Member

@potiuk potiuk Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.. I just see that we have very same comment.

:param conn_id: connection id
:return: connection
"""
conn_id = settings.connection_policy(conn_id=conn_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if Connection.get_connection_from_secrets would be a better location

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but I don't think it matters that much?

The policy would have to be applied before the first request to any secrets system. We could apply it inside Connection.get_connection_from_secrets, but it would have to be the first statement in that method, because after that we start requesting secrets. That applies the policy one level deeper in the stack but logically it wouldn't make a difference.

@potiuk
Copy link
Member

potiuk commented Dec 18, 2023

This is an interesting idea, but I think we should be very clear in the documentation and description that it's not a security feature. Any DAG Author can simply instantiate the connection via creating the model and retrieving it. There is a risk that this will give the users a false sense of security if we are not explicit about it.

@potiuk
Copy link
Member

potiuk commented Dec 18, 2023

Just to add - I can very easily imagine - that if we don't do it, that someone will raise a security report that you can bypass it. And if somoene does it and it's not clearly explained in our security model, then we will have to well, fix it.

@BasPH
Copy link
Contributor Author

BasPH commented Dec 19, 2023

@dstandish @potiuk I didn't intend this to be a security feature. I gave your comments some thought and decided to close the PR. Security isn't something I want to be fuzzy about and stating we put a secure-looking policy in Airflow with a disclaimer that it isn't a security feature somewhere in the docs just doesn't work for me.

@BasPH BasPH closed this Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants