Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bag #5081

Merged
merged 2 commits into from
Nov 19, 2019
Merged

Fix bag #5081

merged 2 commits into from
Nov 19, 2019

Conversation

twang817
Copy link
Contributor

The Bag class retrieves values using it's parent dict class, but setattr modifies values in its local __dict__ (default object behavior).

This causes problems like below:

b = Bag({'a': 1, 'b': 2})
b.a = 3
assert str(b) == "{'a': 1, 'b': 2}"
assert json.loads(json.dumps(b))['a'] == 1
assert b.a == 3

This causes the config properties in metadata.json in output files to always contain default/empty values:

cloud-custodian/c7n/ctx.py

Lines 127 to 138 in b73eccc

def get_metadata(self, include=('sys-stats', 'api-stats', 'metrics')):
t = time.time()
md = {
'policy': self.policy.data,
'version': version,
'execution': {
'id': self.execution_id,
'start': self.start_time,
'end_time': t,
'duration': t - self.start_time},
'config': dict(self.options)
}

I believe this also means that the copy() method on the Config class must be broken.

This change adds __setattr__ such that new attributes are always set within the parent dict's internal state.

The Bag class retrieves values using it's parent dict class, but setattr modifies values in its local __dict__.

This causes problems like below:

```
b = Bag({'a': 1, 'b': 2})
b.a = 3
assert str(b) == "{'a': 1, 'b': 2}"
assert json.loads(json.dumps(b))['a'] == 1
assert b.a == 3
```

This causes the metadata.json found in output files to *always* contain default/empty values (c7n/ctx.py#L137)
@kapilt
Copy link
Collaborator

kapilt commented Nov 16, 2019

Thanks for the pull request. If you don't mind could you sign the CLA referenced per the contributing docs? Direct link here: https://docs.google.com/forms/d/e/1FAIpQLSfwtl1s6KmpLhCY6CjiY8nFZshDwf_wrmNYx1ahpsNFXXmHKw/viewform

wrt to config/bag usage, I'm not aware of any mutations that use attribute access on bags in custodian. I just tried changing setattr to raise exceptions, and I'm not seeing any errors in the test suite. Could you clarify the scenario in which your seeing empty config in metadata.json? Running for example on the cli I'm not seeing that behavior.

[update] i did find one use of setattr when retrieving/setting the active account's account id, which was masked by a try/except account the retrieval. i'm still curious for understanding what other differential values your seeing.

[update 2] found a few more setattrs setting region using __slots__ = (), i had a typo in my setattr checking, i'm good with your pr as is, i think it addressing a legit issue/ though i'd like to normalize our usage of config mutation as well.

Copy link
Collaborator

@kapilt kapilt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@twang817
Copy link
Contributor Author

twang817 commented Nov 18, 2019

CLA signed.

The null account_id is exactly what lead me down this path.

We are using c7n across multiple accounts without c7n_org. In effort to centralize reporting, I have added a firehose blob_output and noticed that account id was null.

It struck me as really strange, since the notify action was able to accurately grab the account_id to send to c7n_mailer. It took a while for me able to track the defect to Bag/Config.

If there is any interest in a firehose blob_output, or a firehose target for c7n_mailer, I would be more than happy to submit a PR.

I also added a multi blob_output that takes a URL in the form of:

multi://scheme1+scheme2/?scheme1=scheme1_url&scheme2=scheme2__url

For example:

multi://s3+firehose/?s3=my_bucket&firehose=my_stream

This would effectively break down into 2 blob_outputs:

s3://my-bucket and firehose://my-stream

This was done this way out of desire to not have to fork or monkey patch the code (I use C7N_EXTPLUGINS). If there is interest in multiple blob_output support, I would be more than happy to set up something more elegant (possibly modifying how blob_output.select works and handling multiple -s options on the command line).

@kapilt
Copy link
Collaborator

kapilt commented Nov 19, 2019

Thanks for tracking this down and the pull request.

For the log output it’s best to file an issue for discussion. I’m interested. Though it brings to mind questions on how your handling the blobs output.

WRT to the plugins mechanism, it’s sadly something that’s probably going away soon. Time has shown that it’s getting misused in ways that makes installations fragile as internals get refactored. As well as discouraging contributions. It’s original intent was scoped to enabling private beta cloud APIs that are under NDA. At the moment we don’t have a public api on the codebase (there's an open issue on it for discussion), and for integration it’s better to think microservices instead of extending custodian as a monolith.

Ie. for something like you logs you could have sent a notify to sns and a custom lambda to send to kinesis/firehouse is or other workflow.

That said definitely would be interested in enabling additional blob store support in core.

@kapilt kapilt merged commit faadcfc into cloud-custodian:master Nov 19, 2019
ajmsra pushed a commit to ajmsra/cloud-custodian that referenced this pull request Jan 14, 2020
fidelito pushed a commit to fidelito/cloud-custodian that referenced this pull request May 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants