Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import errors thrown by *new* DAGs are hidden from RBAC users' web ui #24742

Open
2 tasks done
jordanbecketmoore opened this issue Jun 30, 2022 · 9 comments
Open
2 tasks done
Labels
area:core kind:bug This is a clearly a bug multi-tenancy Issues related to multi-tenancy Stale Bug Report

Comments

@jordanbecketmoore
Copy link

jordanbecketmoore commented Jun 30, 2022

Apache Airflow version

2.3.2 (latest released)

What happened

I wrote a custom security manager that created custom roles for Airflow users based on LDAP mapping. These users would only be given permissions to view, edit, etc. DAGs whose access_control parameters explicitly permitted their role to view. However, it was discovered that if a malformed DAG file was added to the DAG_FOLDER, an import error would be thrown, but none of the users in custom roles would be able to view it in their web ui. A user with a default Viewer or greater role could see the error, however.

Upon further investigation, I found that this occurs due to a filter that is applied here. If a DAG file is scanned for the first time and causes an import error to be thrown, two things happen that prevent the error from passing this filter and showing on the web ui to RBAC users:

  1. The DAG is never added to the database.
  2. The custom role is never given permissions to view the DAG.

What you think should happen instead

It is important for developers to be able to see the traceback of the errors thrown by their DAGs. One of two things needs to be changed - either the way in which import errors are handled is changed to allow these malformed DAGs to pass the filter (i.e. the DAG is added to the database and the appropriate roles are given the appropriate permissions), or the filter itself is changed to allow these import errors to be shown. The former option is, in my opinion, preferable, since it would also allow the web ui to list the DAG, allowing the users to view the code from there. Also, I can't think of a way to pull off the latter.

How to reproduce

  • Create a custom role (custom_role) that does not have permission to view all DAGs and assign a user that role. In my case, this was done through a custom security manager, but it can also be accomplished through the web ui of an Admin user.
  • Write a DAG whose access_control dictionary specifies that custom_role may at least view it. Ensure that this DAG file is malformed in such a way that will cause Airflow to throw an import error when it attempt to parse it.
  • Add the DAG to the DAG_FOLDER and wait for Airflow to attempt to parse it.
  • A user with Viewer role or higher will be able to see the import error at the top of the web ui, but the user with custom_role only will not.

Operating System

Amazon Linux 2

Versions of Apache Airflow Providers

No response

Deployment

MWAA

Deployment details

Airflow is deployed on an AWS cluster, with its webserver and scheduler on EC2 instances using celery worker nodes on separate EC2s.

Anything else

I have been able to get around this with a custom security manager that overwrites the create_dag_specific_permissions function. The DagBag is loaded from the DAG_FOLDER instead of the database. Then the security manager iterates over the DagBag's import errors and checks to see if the DagBag contains a DAG whose fileloc is the same as that of the import error. If not, a new DAG object is created with the appropriate fileloc. Regex is used to parse the file to find the dag_id and access_control parameters to add to the new DAG. If the DAG is so malformed that it cannot find these parameters, the security manager generates a random string to serve as the dag_id and sets access_control to explicitly give all custom roles can_view permissions. This new "pseudoDAG" is then added to the DagBag. After all import errors are dealt with, the function is allowed to continue. This function is triggered when the cli command airflow sync-perm --include-dags is executed.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@jordanbecketmoore jordanbecketmoore added area:core kind:bug This is a clearly a bug labels Jun 30, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 30, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@uranusjr
Copy link
Member

uranusjr commented Jun 30, 2022

Hmm this is difficult. Since the file cannot be parsed, a DAG can never be produced, and thus the associated ImportError cannot have a valid dag_id. But we can’t just show all ImportError entries without a valid dag_id since that might include code that the user is not supposed to see.

Perhaps we should add a new interface on the security manager, say is_import_error_accessible, or (more generally) is_dag_file_accessible, for this purpose specifically? And we could add some permissions that are file location based, instead of DAGs, so things not associated to a DAG, but a file, can be filtered against them.

@potiuk potiuk added the multi-tenancy Issues related to multi-tenancy label Jun 30, 2022
@potiuk
Copy link
Member

potiuk commented Jun 30, 2022

I think, personally, this problem is simply part of multi-tenancy in the future. I think for now custom security manager doing stuff in custom way is enough, because we have no concept of "team" and per-team acces. This is all implemented by a custom security manager if somoene wishes to do it (and yeah, it is complex and workaround).

However, I think when/if we implement a "fully fledged" team managemenet and ownership concept in Airfow, we should definitely make it a case to handle.

@krishna4ldw
Copy link

Its been a while any idea when this feature will be released ? I need this functionality for multi tenancy use cases in GCP composer.
@jordanbecketmoore @potiuk

@potiuk
Copy link
Member

potiuk commented Nov 14, 2022

The answer is pretty much always the same - It will be implemented, when someone implements it. This is how open-source software works.

If somone badly needs a feature in OSS project, and no-one is actually doing it, then they can do various things:

  • implement them
  • find somene who implements it
  • them or they company can sponsor somoene who migh take an interest in it
  • wait until someone implements it
  • implement it in they own fork and possibly contribute it back
  • you (or someone from your company - or even paid by your company) can start a discussion in the devlist and explain why it is importan to have it. And in case of this kind of problem that goes really deep into the core of Airflow it likely will require someone to come up with idaes how to do what you want, discuss and get it to conclusion (and eventually likely voting on AIP - Airflow Improvement Proposal).

I think following those are some of the ways you can make those things progress.

@jordanbecketmoore
Copy link
Author

I plan to implement a fix for this issue and submit a pr soon, as my team needs it.

@jordanbecketmoore
Copy link
Author

I have written a fix. It will appear in a PR soon.

@brata96
Copy link

brata96 commented Mar 2, 2023

hi @jordanbecketmoore ! currently in my organization we have came across with this issue, any progress on this issue yet?

Copy link

github-actions bot commented Apr 4, 2024

This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug multi-tenancy Issues related to multi-tenancy Stale Bug Report
Projects
None yet
Development

No branches or pull requests

5 participants