Mask secrets in stdout for `airflow tasks test ...` CLI command (#17476) #21281

alex-astronomer · 2022-02-02T23:22:54Z

The problem with airflow tasks test ... is that any stdout that comes from a task inside an operator or other callable does not go through a logger. It runs in the main process. This results in secrets that are printed to stdout to not be masked according to the airflow.task logger.

I wanted a way to capture and filter stdout without having to route everything through a secondary logger. I decided a context manager that filters stdout would be the best way to go about this. The context manager redacts all secrets according to the filters on the airflow.task logger.

Closes #17476

…he#17476) Add a context manager to secrets_masker inside of which stdout is captured, redacted according to the filters on 'airflow.task' logger and then spit back into stdout. Filters stdout that doesn't go through logger according to airflow.task logger's filters.

uranusjr · 2022-02-03T04:57:23Z

airflow/utils/log/secrets_masker.py

+    def __exit__(self, *args):
+        """
+        On exit from the context, close the `contextlib.redirect_stdout` context to stop capturing lines
+        from stdout, then route all lines one by one through self.redirect_to.
+
+        """
+        self._redirector.__exit__(*args)
+        for log in self.stdout_lines:
+            sys.stdout.write(log)


Hmm, this means all of the task’s output will be buffered, and only spit out in one go when the task is finished, which seems wrong. I would call stdout.write in write directly after redaction instead.

This can be significantly simplified to

class StdoutRedactor: @contextlib.contextmanager @classmethod def enable(cls): with contextlib.redirect_stdout(cls()): yield def write(self, content): sys.stdout.write(redact(content)) def flush(self): sys.stdout.flush()

I think the problem here is that every time it writes to stdout you’ll need to exit the internal redirect context manager because otherwise it’ll get stuck in an infinite recursive loop. I thought about doing that but since __exit__ expects a few arguments that only exist during the external context manager wrapper’s exit function. Do you have any thoughts about being able to disable and reenable the internal redirect context manager within the ‘write’ function?

it’ll get stuck in an infinite recursive loop

Ah right. sys.__stdout__ then?

https://docs.python.org/3/library/sys.html#sys.__stdout__

Great find! I'll implement.

I just realised a problem, sys.stdout may have already been patched before we try to patch it (can happen if the user runs Airflow with custom interpreter startup code), in which case sys.__stdout__ would point to the wrong, unpatched stream.

Something like this is needed:

@dataclasses.dataclass() class StdoutRedactor: stdout: TextIO @contextlib.contextmanager @classmethod def enable(cls): with contextlib.redirect_stdout(cls(sys.stdout)): yield def write(self, content): self.stdout.write(redact(content)) def flush(self): self.stdout.flush()

So just to make sure that I'm understanding all of this right, we're basically saving a copy of sys.stdout from before it gets redirected, and then writing to that which bypasses the redirect context that we've already defined before that write occurs? And the reason that we use the dataclass decorator is just to simplify the initialization of self.stdout and save a few lines since we don't need to use __init__?

Yes, that’s my understanding.

github-actions · 2022-04-08T00:11:00Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

uranusjr · 2022-04-13T07:50:23Z

We should finish this.

github-actions · 2022-06-10T00:12:29Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

boring-cyborg bot added area:CLI area:logging labels Feb 2, 2022

uranusjr reviewed Feb 3, 2022

View reviewed changes

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 8, 2022

github-actions bot closed this Apr 13, 2022

potiuk reopened this Apr 14, 2022

github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 15, 2022

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jun 10, 2022

uranusjr removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Jun 10, 2022

uranusjr mentioned this pull request Jun 10, 2022

Mask secrets in stdout for CLI command #24362

Merged

potiuk closed this in #24362 Jun 12, 2022

ephraimbuddy mentioned this pull request Jul 6, 2022

Status of testing of Apache Airflow 2.3.3rc3 #24863

Closed

74 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mask secrets in stdout for `airflow tasks test ...` CLI command (#17476) #21281

Mask secrets in stdout for `airflow tasks test ...` CLI command (#17476) #21281

alex-astronomer commented Feb 2, 2022 •

edited

Loading

uranusjr Feb 3, 2022

alex-astronomer Feb 3, 2022 •

edited

Loading

uranusjr Feb 4, 2022 •

edited

Loading

alex-astronomer Feb 5, 2022

uranusjr Feb 6, 2022

alex-astronomer Feb 14, 2022

uranusjr Feb 21, 2022

github-actions bot commented Apr 8, 2022

uranusjr commented Apr 13, 2022

github-actions bot commented Jun 10, 2022

Mask secrets in stdout for airflow tasks test ... CLI command (#17476) #21281

Mask secrets in stdout for airflow tasks test ... CLI command (#17476) #21281

Conversation

alex-astronomer commented Feb 2, 2022 • edited Loading

uranusjr Feb 3, 2022

Choose a reason for hiding this comment

alex-astronomer Feb 3, 2022 • edited Loading

Choose a reason for hiding this comment

uranusjr Feb 4, 2022 • edited Loading

Choose a reason for hiding this comment

alex-astronomer Feb 5, 2022

Choose a reason for hiding this comment

uranusjr Feb 6, 2022

Choose a reason for hiding this comment

alex-astronomer Feb 14, 2022

Choose a reason for hiding this comment

uranusjr Feb 21, 2022

Choose a reason for hiding this comment

github-actions bot commented Apr 8, 2022

uranusjr commented Apr 13, 2022

github-actions bot commented Jun 10, 2022

Mask secrets in stdout for `airflow tasks test ...` CLI command (#17476) #21281

Mask secrets in stdout for `airflow tasks test ...` CLI command (#17476) #21281

alex-astronomer commented Feb 2, 2022 •

edited

Loading

alex-astronomer Feb 3, 2022 •

edited

Loading

uranusjr Feb 4, 2022 •

edited

Loading