Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mask secrets in stdout for airflow tasks test ... CLI command (#17476) #21281

Conversation

alex-astronomer
Copy link
Contributor

@alex-astronomer alex-astronomer commented Feb 2, 2022

The problem with airflow tasks test ... is that any stdout that comes from a task inside an operator or other callable does not go through a logger. It runs in the main process. This results in secrets that are printed to stdout to not be masked according to the airflow.task logger.

I wanted a way to capture and filter stdout without having to route everything through a secondary logger. I decided a context manager that filters stdout would be the best way to go about this. The context manager redacts all secrets according to the filters on the airflow.task logger.

Closes #17476

…he#17476)

Add a context manager to secrets_masker inside
of which stdout is captured, redacted according
to the filters on 'airflow.task' logger and then
spit back into stdout.  Filters stdout that doesn't
go through logger according to airflow.task logger's
filters.
Comment on lines +288 to +296
def __exit__(self, *args):
"""
On exit from the context, close the `contextlib.redirect_stdout` context to stop capturing lines
from stdout, then route all lines one by one through self.redirect_to.

"""
self._redirector.__exit__(*args)
for log in self.stdout_lines:
sys.stdout.write(log)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this means all of the task’s output will be buffered, and only spit out in one go when the task is finished, which seems wrong. I would call stdout.write in write directly after redaction instead.

This can be significantly simplified to

class StdoutRedactor:
    @contextlib.contextmanager
    @classmethod
    def enable(cls):
        with contextlib.redirect_stdout(cls()):
            yield

    def write(self, content):
        sys.stdout.write(redact(content))

    def flush(self):
        sys.stdout.flush()

Copy link
Contributor Author

@alex-astronomer alex-astronomer Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem here is that every time it writes to stdout you’ll need to exit the internal redirect context manager because otherwise it’ll get stuck in an infinite recursive loop. I thought about doing that but since __exit__ expects a few arguments that only exist during the external context manager wrapper’s exit function. Do you have any thoughts about being able to disable and reenable the internal redirect context manager within the ‘write’ function?

Copy link
Member

@uranusjr uranusjr Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it’ll get stuck in an infinite recursive loop

Ah right. sys.__stdout__ then?

https://docs.python.org/3/library/sys.html#sys.__stdout__

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find! I'll implement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised a problem, sys.stdout may have already been patched before we try to patch it (can happen if the user runs Airflow with custom interpreter startup code), in which case sys.__stdout__ would point to the wrong, unpatched stream.

Something like this is needed:

@dataclasses.dataclass()
class StdoutRedactor:
    stdout: TextIO

    @contextlib.contextmanager
    @classmethod
    def enable(cls):
        with contextlib.redirect_stdout(cls(sys.stdout)):
            yield

    def write(self, content):
        self.stdout.write(redact(content))

    def flush(self):
        self.stdout.flush()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to make sure that I'm understanding all of this right, we're basically saving a copy of sys.stdout from before it gets redirected, and then writing to that which bypasses the redirect context that we've already defined before that write occurs? And the reason that we use the dataclass decorator is just to simplify the initialization of self.stdout and save a few lines since we don't need to use __init__?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that’s my understanding.

@github-actions
Copy link

github-actions bot commented Apr 8, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 8, 2022
@github-actions github-actions bot closed this Apr 13, 2022
@uranusjr
Copy link
Member

We should finish this.

@potiuk potiuk reopened this Apr 14, 2022
@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 15, 2022
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jun 10, 2022
@uranusjr uranusjr removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sensitive variables don't get masked when rendered with airflow tasks test
3 participants