Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to built-in data structures in SecretsMasker #16424

Merged
merged 1 commit into from
Jun 16, 2021

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Jun 13, 2021

Using Iterable in SecretsMasker might cause undesireable
side effect in case the object passed as log parameter
is an iterable object and actually iterating it is not idempotent.

For example in case of botocore, it passes StreamingBody
object to log and this object is Iterable. However it can be
iterated only once. Masking causes the object to be iterated
during logging and results in empty body when actual results
are retrieved later.

This change only iterates list type of objects and recurrently
redacts only dicts/strs/tuples/sets/lists which should never
produce any side effects as all those objects do not have side
effects when they are accessed.

Fixes: #16148


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk potiuk requested a review from ashb June 13, 2021 17:55
@potiuk
Copy link
Member Author

potiuk commented Jun 13, 2021

cc: @uranusjr @mik-laj

@potiuk potiuk force-pushed the just_redact_primitive_values branch from 2bb0b92 to ceab5ac Compare June 13, 2021 17:57
@potiuk potiuk changed the title Switch to primitive types in SecretsMasker Switch to built-in data structures in SecretsMasker Jun 13, 2021
Using Iterable in SecretsMasker might cause undesireable
side effect in case the object passed as log parameter
is an iterable object and actually iterating it is not idempotent.

For example in case of botocore, it passes StreamingBody
object to log and this object is Iterable. However it can be
iterated only once. Masking causes the object to be iterated
during logging and results in empty body when actual results
are retrieved later.

This change only iterates list type of objects and recurrently
redacts only dicts/strs/tuples/sets/lists which should never
produce any side effects as all those objects do not have side
effects when they are accessed.

Fixes: apache#16148
@potiuk potiuk force-pushed the just_redact_primitive_values branch from ceab5ac to af39d40 Compare June 13, 2021 18:36
return list(self.redact(subval) for subval in item)
else:
return item
# I think this should never happen, but it does not hurt to leave it just in case
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find an easy way to raise an exception any more after removing Iterable. I believe we cannot get the Exception if we do what we do now - i.e. walking the built-in structures in the way we do. But maybe there is some way an exception can be raised here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s technically possible if the user passes in some kind of custom subclass (e.g. class MyList(list): that overrides some weird stuff), so yeah let’s keep the exception handler there, but I don’t think it’s worthwhile to have a test for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreet

@@ -196,12 +195,11 @@ def redact(self, item: "RedactableItem", name: str = None) -> "RedactableItem":
elif isinstance(item, (tuple, set)):
# Turn set in to tuple!
return tuple(self.redact(subval) for subval in item)
elif isinstance(item, io.IOBase):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only reason IOBase was added here specifically was that it was Iterable. This else is not needed now as this one will fall through the last return item anyway.

@@ -72,22 +72,6 @@ def test_args(self, logger, caplog):

assert caplog.text == "INFO Cannot connect to user:***\n"

def test_non_redactable(self, logger, caplog):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could not find a way to trigger Exception :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inherit from list or dict should trigger it, but don't mind deleting this test

@potiuk
Copy link
Member Author

potiuk commented Jun 16, 2021

@ashb , does it look good for you? I think switching to list solves the problem entirely.

@@ -165,7 +164,7 @@ def _redact_all(self, item: "RedactableItem") -> "RedactableItem":
elif isinstance(item, (tuple, set)):
# Turn set in to tuple!
return tuple(self._redact_all(subval) for subval in item)
elif isinstance(item, Iterable):
elif isinstance(item, list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuple too please

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a few lines up

return list(self.redact(subval) for subval in item)
else:
return item
# I think this should never happen, but it does not hurt to leave it just in case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreet

@@ -72,22 +72,6 @@ def test_args(self, logger, caplog):

assert caplog.text == "INFO Cannot connect to user:***\n"

def test_non_redactable(self, logger, caplog):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inherit from list or dict should trigger it, but don't mind deleting this test

@ashb ashb added this to the Airflow 2.1.1 milestone Jun 16, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jun 16, 2021
@potiuk potiuk merged commit d1d02b6 into apache:main Jun 16, 2021
@potiuk potiuk deleted the just_redact_primitive_values branch June 16, 2021 09:29
ashb pushed a commit that referenced this pull request Jun 22, 2021
Using Iterable in SecretsMasker might cause undesireable
side effect in case the object passed as log parameter
is an iterable object and actually iterating it is not idempotent.

For example in case of botocore, it passes StreamingBody
object to log and this object is Iterable. However it can be
iterated only once. Masking causes the object to be iterated
during logging and results in empty body when actual results
are retrieved later.

This change only iterates list type of objects and recurrently
redacts only dicts/strs/tuples/sets/lists which should never
produce any side effects as all those objects do not have side
effects when they are accessed.

Fixes: #16148
(cherry picked from commit d1d02b6)
kaxil pushed a commit to astronomer/airflow that referenced this pull request Jun 22, 2021
Using Iterable in SecretsMasker might cause undesireable
side effect in case the object passed as log parameter
is an iterable object and actually iterating it is not idempotent.

For example in case of botocore, it passes StreamingBody
object to log and this object is Iterable. However it can be
iterated only once. Masking causes the object to be iterated
during logging and results in empty body when actual results
are retrieved later.

This change only iterates list type of objects and recurrently
redacts only dicts/strs/tuples/sets/lists which should never
produce any side effects as all those objects do not have side
effects when they are accessed.

Fixes: apache#16148
(cherry picked from commit d1d02b6)
(cherry picked from commit 4c37aea)
kaxil pushed a commit to astronomer/airflow that referenced this pull request Jun 23, 2021
Using Iterable in SecretsMasker might cause undesireable
side effect in case the object passed as log parameter
is an iterable object and actually iterating it is not idempotent.

For example in case of botocore, it passes StreamingBody
object to log and this object is Iterable. However it can be
iterated only once. Masking causes the object to be iterated
during logging and results in empty body when actual results
are retrieved later.

This change only iterates list type of objects and recurrently
redacts only dicts/strs/tuples/sets/lists which should never
produce any side effects as all those objects do not have side
effects when they are accessed.

Fixes: apache#16148
(cherry picked from commit d1d02b6)
(cherry picked from commit 4c37aea)
(cherry picked from commit 7e5968a)
kaxil pushed a commit to astronomer/airflow that referenced this pull request Jun 23, 2021
Using Iterable in SecretsMasker might cause undesireable
side effect in case the object passed as log parameter
is an iterable object and actually iterating it is not idempotent.

For example in case of botocore, it passes StreamingBody
object to log and this object is Iterable. However it can be
iterated only once. Masking causes the object to be iterated
during logging and results in empty body when actual results
are retrieved later.

This change only iterates list type of objects and recurrently
redacts only dicts/strs/tuples/sets/lists which should never
produce any side effects as all those objects do not have side
effects when they are accessed.

Fixes: apache#16148
(cherry picked from commit d1d02b6)
(cherry picked from commit 4c37aea)
(cherry picked from commit 7e5968a)
(cherry picked from commit 523bba0)
@potiuk potiuk restored the just_redact_primitive_values branch April 26, 2022 20:47
@potiuk potiuk deleted the just_redact_primitive_values branch July 29, 2022 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Downloading files from S3 broken in 2.1.0
3 participants