Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow does not mask secrets in Database URI's when secret has a special character. #36688

Closed
1 of 2 tasks
Rurushu13 opened this issue Jan 9, 2024 · 10 comments · Fixed by #36692
Closed
1 of 2 tasks
Assignees
Labels
area:core area:webserver Webserver related Issues good first issue kind:bug This is a clearly a bug

Comments

@Rurushu13
Copy link

Rurushu13 commented Jan 9, 2024

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.5.1

What happened?

While we were trying to use a BashOperator that would take our Database URI as an Env variable, we noticed that one of our connection URI's password aren't correctly being masked. The difference we found out between two passwords are, one of them have no special character whereas the other one has exclamation points. The original unmasked passphrase is !Weird!Password^123$ and the masked passphrase is TotallyAlphanumericPassword123
image

In order to confirm that this was indeed the issue, we tried removing any special characters from the password of the errenous one, and it was indeed the issue. Password ended up being masked.
image

What you think should happen instead?

The password should be masked even if it has a special character in it.

How to reproduce

  1. Create a BashOperator (honestly I think it would work with any operator)
  2. Pass a Connection URI of a database that has special characters (we had issue with !) in env parameter using BaseHook.get_connection("Database").get_uri()
  3. Check rendered template of the operator in Airflow UI

Operating System

Ventura 13.5

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

No response

Anything else?

It occurs every-time.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Rurushu13 Rurushu13 added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jan 9, 2024
@dirrao
Copy link
Collaborator

dirrao commented Jan 9, 2024

@Rurushu13
Good finding. If possible, can you share the sample DAG?

@dirrao dirrao added the area:webserver Webserver related Issues label Jan 9, 2024
@Rurushu13
Copy link
Author

Rurushu13 commented Jan 9, 2024

@Rurushu13
Good finding. If possible, can you share the sample DAG?

@dirrao There it is


from datetime import timedelta
import pendulum
import os

from airflow.decorators import dag
from airflow.operators.bash import BashOperator
from airflow.hooks.base import BaseHook


@dag(
    dag_id="shell_test",
    description='testing the shell script',
    start_date=pendulum.datetime(2023, 1, 1, tz="Europe/Istanbul"),
    schedule='@daily',
    catchup=False,
    tags=['test'],
    default_args={
        "retries": 0,
        "retry_delay": timedelta(minutes=5),
    },
    template_searchpath=['include/sql/datawarehouse/source',
                         'include/sql/datawarehouse/stage',
                         'include/sql/datawarehouse/final'],)
def shell_test():
    airflow_home = os.environ.get('AIRFLOW_HOME')
    shell_test_task = BashOperator(
        task_id="shell_test_task",
        bash_command=f"/bin/bash {airflow_home}/include/scripts/datawarehouse/transfer_script.sh ",
        env={
             **{'TARGET_CONNECTION_STRING': BaseHook.get_connection("DATAWAREHOUSE").get_uri(),
                'SOURCE_CONNECTION_STRING': BaseHook.get_connection("EFABRIKA").get_uri()},
             }
    )


shell_test()

@aritra24
Copy link
Collaborator

aritra24 commented Jan 9, 2024

I was able to replicate this, seems like a valid bug. Looking into it. 🤔

@aritra24 aritra24 removed the needs-triage label for new issues that we didn't triage yet label Jan 9, 2024
aritra24 added a commit to aritra24/airflow that referenced this issue Jan 9, 2024
Connection uri's get connection uses quote to change the
password and certain other fields to escape special chars
due to this, when the connection object is passed through
the masker this changed string is skipped.
aritra24 added a commit to aritra24/airflow that referenced this issue Jan 9, 2024
Connection uri's get connection uses quote to change the
password and certain other fields to escape special chars
due to this, when the connection object is passed through
the masker this changed string is skipped.
@dirrao
Copy link
Collaborator

dirrao commented Jan 10, 2024

I was able to replicate this, seems like a valid bug. Looking into it. 🤔

is this happening on the airflow main branch?

@aritra24
Copy link
Collaborator

@dirrao yes, the main branch also has this issue.

aritra24 added a commit to aritra24/airflow that referenced this issue Jan 17, 2024
Connection uri's get connection uses quote to change the
password and certain other fields to escape special chars
due to this, when the connection object is passed through
the masker this changed string is skipped.
@amoghrajesh
Copy link
Contributor

amoghrajesh commented Jan 19, 2024

I think it is by design that we have to percent encode the passwords containing special characters. The postgres documentation also has a mention of it: https://www.postgresql.org/docs/11/libpq-connect.html#id-1.7.3.8.3.6

Also check: https://dba.stackexchange.com/questions/243219/in-postgresql-url-i-cant-use-a-password-containing-special-characters

I think what you need to do here is define the env variables post encoding, maybe that should solve the issue.

cc @uranusjr, I think you can have some valuable suggestions here

@aritra24
Copy link
Collaborator

@amoghrajesh while that might be, but the issue is that the password when encoded is not getting redacted (Because our redactor today explicitly redacts exact matches only). I do not believe they're asking that it not be encoded, but that on encoding it continue to get redacted like a alphanumeric password would and that's what the PR I've raised does.

@potiuk
Copy link
Member

potiuk commented Jan 20, 2024

I think @aritra24 is right - this is not about URL passwords (where the password should - indeed - be URL, encoded) - but if there is a password in connection, that does not get masked when we print a URL form of the connnection, then the quoted version should be masked.

While we cannot prevent all the ways how password can be mangled and transformed (say you add space and Base64 such password for example), some common usages (like URL encoding) that are likely to appear in our logs should be masked as welll. If I read this one properly, the connection password is stored "as is" while we are quoting it i the output when printing it as "url" so we should mask it there.

aritra24 added a commit to aritra24/airflow that referenced this issue Jan 20, 2024
Connection uri's get connection uses quote to change the
password and certain other fields to escape special chars
due to this, when the connection object is passed through
the masker this changed string is skipped.
@amoghrajesh
Copy link
Contributor

Thank you for the explanations, understood, @potiuk and @aritra24. Looking at the PR now

potiuk pushed a commit that referenced this issue Jan 29, 2024
* Secret masker ignores passwords with special chars #36688

Connection uri's get connection uses quote to change the
password and certain other fields to escape special chars
due to this, when the connection object is passed through
the masker this changed string is skipped.

* Added a test for the logging change
@Rurushu13
Copy link
Author

Thank you for your hard work everyone 🤗

jedcunningham pushed a commit that referenced this issue Feb 9, 2024
* Secret masker ignores passwords with special chars #36688

Connection uri's get connection uses quote to change the
password and certain other fields to escape special chars
due to this, when the connection object is passed through
the masker this changed string is skipped.

* Added a test for the logging change

(cherry picked from commit e853849)
potiuk pushed a commit that referenced this issue Feb 13, 2024
* Secret masker ignores passwords with special chars #36688

Connection uri's get connection uses quote to change the
password and certain other fields to escape special chars
due to this, when the connection object is passed through
the masker this changed string is skipped.

* Added a test for the logging change

(cherry picked from commit e853849)
ephraimbuddy pushed a commit that referenced this issue Feb 22, 2024
* Secret masker ignores passwords with special chars #36688

Connection uri's get connection uses quote to change the
password and certain other fields to escape special chars
due to this, when the connection object is passed through
the masker this changed string is skipped.

* Added a test for the logging change

(cherry picked from commit e853849)
abhishekbhakat pushed a commit to abhishekbhakat/my_airflow that referenced this issue Mar 5, 2024
* Secret masker ignores passwords with special chars apache#36688

Connection uri's get connection uses quote to change the
password and certain other fields to escape special chars
due to this, when the connection object is passed through
the masker this changed string is skipped.

* Added a test for the logging change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core area:webserver Webserver related Issues good first issue kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants