Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize log names #1188

Merged
merged 3 commits into from
Mar 13, 2020
Merged

Sanitize log names #1188

merged 3 commits into from
Mar 13, 2020

Conversation

chunyong-lin
Copy link
Contributor

@chunyong-lin chunyong-lin commented Mar 12, 2020

to: @ryandeivert @Ryxias @blakemotl
cc: @airbnb/streamalert-maintainers
related to:
resolves: #1186

Background

See the issue #1186 for more details.

Changes

  • Replace all . (dot) in the log names in conf/schemas/*.json or conf/logs.json with _ (underscore). The change is made to Config class where all the conf files are loaded.
  • Also update rule test method to replace . with _ when inspect log source in the testing events. Eventually there are two carbonblack testing events with log set to carbonblack:ingress.event.procstart and carbonblack:ingress_event_procstart.

Testing

  • Add unit tests to reproduce the issue and test the code change.
  • python manage.py athena create-table --bucket nobody2020030420-streamalert-data --table-name carbonblack_alert_status_updated works in staging environment with the fix.

@coveralls
Copy link

coveralls commented Mar 12, 2020

Coverage Status

Coverage increased (+0.008%) to 95.424% when pulling a3ea5c5 on sanitize_log_names into 1b704c6 on release-3-1-0.

if config.get('logs'):
config['logs'] = _sanitize_logs_name(config['logs'])

if (config.get('global')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eeek conditional blocks like this make me cringe so hard. readability is so bad. can be break this up into multiple statements?

like:

infra_config = config.get('global', {}).get('infrastructure')
if not infra_config:
    return config

if 'firehose' not in infra_config:
    return config

# ... and so on ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative way is to lump it into a function>

if self._requires_sanitized_log_names(etc):
  return config['global']....

return config

@chunyong-lin
Copy link
Contributor Author

Hi @ryandeivert , about your concern

does theis _sanitize_logs_name function actually replace the value within a person's config to be written back out to disk??

if so, I don't think we should do this... this could mess up rules, etc that use specific logs. I think and "transformation" for log name --> table name + firehose name should be done by us under the hood

I have checked the code and tested the new changes in the staging environment with a weird log name cloudwatch:events.dots-test, the person's config on the disk is not changed, classifier, rules engine, alert processor and athena partition lambda functions are working as expected. Can you PTAL my latest two commits which address the comments. Thank you thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants