-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalization reboot - Refactor normalization code #1238
Normalization reboot - Refactor normalization code #1238
Conversation
Architecture | ||
============ | ||
|
||
.. figure:: ../images/normalization-arch.png |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this image could use some love (like attention to detail with capitalized words vs non-capitalized words, spacing between labels + images, etc)... can you share this with me and I can tweak it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if the athena screenshots didn't show your "cylin2020..."."artifacts"
prefix.. and only showed select * from artifacts
... the cylin2020..
bit is actually not required (unless you are searching a database that is NOT selected in the dropdown in Athena console.. which is rarely the case)
docs/source/normalization.rst
Outdated
Coming soon. | ||
In Normalization v1, the normalized types are based on log source (e.g. osquery, cloudwatch, etc) and defined in ``conf/normalized_types.json`` file. | ||
|
||
In Normalization v2, the normalized types will be based on log type (e.g. osquery:differential, cloudwatch:cloudtrail, cloudwatch:events, etc) and defined in ``conf/schemas/*.json``. Although, it is recommended to configure normalization in ``conf/schemas/*.json``, the v1 configuration will be still valid and merged to v2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Normalization v2, the normalized types will be based on log type (e.g. osquery:differential, cloudwatch:cloudtrail, cloudwatch:events, etc) and defined in ``conf/schemas/*.json``. Although, it is recommended to configure normalization in ``conf/schemas/*.json``, the v1 configuration will be still valid and merged to v2. | |
In Normalization v2, the normalized types will be based on log type (e.g. ``osquery:differential``, ``cloudwatch:cloudtrail``, ``cloudwatch:events``, etc) and defined in ``conf/schemas/*.json``. However, we recommend configuring normalization in ``conf/schemas/*.json``. The v1 configuration will be still valid and merged to v2. |
docs/source/normalization.rst
Outdated
Configuration | ||
============= | ||
|
||
Coming soon. | ||
In Normalization v1, the normalized types are based on log source (e.g. osquery, cloudwatch, etc) and defined in ``conf/normalized_types.json`` file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Normalization v1, the normalized types are based on log source (e.g. osquery, cloudwatch, etc) and defined in ``conf/normalized_types.json`` file. | |
In Normalization v1, the normalized types are based on log source (e.g. ``osquery``, ``cloudwatch``, etc) and defined in ``conf/normalized_types.json`` file. |
docs/source/normalization.rst
Outdated
|
||
In Normalization v2, the normalized types will be based on log type (e.g. osquery:differential, cloudwatch:cloudtrail, cloudwatch:events, etc) and defined in ``conf/schemas/*.json``. Although, it is recommended to configure normalization in ``conf/schemas/*.json``, the v1 configuration will be still valid and merged to v2. | ||
|
||
Giving some examples to configure normalization v2. All normalized types are arbitrary, but we recommend to use all lower cases and underscores to name the normalized types to have better compatibility with Athena. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving some examples to configure normalization v2. All normalized types are arbitrary, but we recommend to use all lower cases and underscores to name the normalized types to have better compatibility with Athena. | |
Below are some example configurations for normalization v2. All normalized types are arbitrary, but only lower case alphabetic characters and underscores should be used for names in order to be compatible with Athena. |
docs/source/normalization.rst
Outdated
|
||
Giving some examples to configure normalization v2. All normalized types are arbitrary, but we recommend to use all lower cases and underscores to name the normalized types to have better compatibility with Athena. | ||
|
||
* Normalized all ip addresses (``ip_address``) and user identities (``user_identity``) for ``cloudwatch:events`` events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Normalized all ip addresses (``ip_address``) and user identities (``user_identity``) for ``cloudwatch:events`` events | |
* Normalize all ip addresses (``ip_address``) and user identities (``user_identity``) for ``cloudwatch:events`` logs |
docs/source/normalization.rst
Outdated
} | ||
} | ||
|
||
* Normalized all commands (``command``) and user identities (``user_identity``) for ``osquery:differential`` events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Normalized all commands (``command``) and user identities (``user_identity``) for ``osquery:differential`` events | |
* Normalize all commands (``command``) and user identities (``user_identity``) for ``osquery:differential`` logs |
docs/source/normalization.rst
Outdated
* A new Lambda function | ||
* A new Glue catalog table ``artifacts`` for Historical Search via Athena | ||
* A new Firehose to deliver artifacts to S3 bucket | ||
* Update existing Firehoses to allow to invoke Artifact Extractor lambda if it is enabled on the Firehoses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Update existing Firehoses to allow to invoke Artifact Extractor lambda if it is enabled on the Firehoses | |
* Update existing Firehoses to allow to invoke Artifact Extractor Lambda if it is enabled on the Firehose resources |
docs/source/normalization.rst
Outdated
|
||
python manage.py deploy --function artifact_extractor | ||
|
||
* If normalization configuration changed in ``conf/schemas/*.json``, make sure deploy classifier as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If normalization configuration changed in ``conf/schemas/*.json``, make sure deploy classifier as well | |
* If the normalization configuration has changed in ``conf/schemas/*.json``, make sure to deploy the classifier Lambda function as well |
docs/source/normalization.rst
Outdated
Artifacts | ||
========= | ||
|
||
Artifacts will be searching via Athena ``artifacts`` table. During the test in staging environment, two fake ``cloudwatch:events`` were sent to a Kinesis data stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Artifacts will be searching via Athena ``artifacts`` table. During the test in staging environment, two fake ``cloudwatch:events`` were sent to a Kinesis data stream. | |
Artifacts will be searchable within the Athena ``artifacts`` table |
docs/source/normalization.rst
Outdated
|
||
Artifacts will be searching via Athena ``artifacts`` table. During the test in staging environment, two fake ``cloudwatch:events`` were sent to a Kinesis data stream. | ||
|
||
Those two fake events were searchable in ``cloudwatch_events`` table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are these references to "fake" events and "staging" environment in the docs??? please remove or update as necessary
""" | ||
# Enforce all fields are strings in a Artifact to prevent type corruption in Parquet format | ||
self._function = str(kwargs.get('function', 'not_specified')) | ||
self._record_id = str(kwargs.get('record_id', self.RESERVED)) | ||
self._function = str(kwargs.get('function')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these values ('function' and 'record_id') are no longer optional, please remove the usage of kwargs
and add these are required arguments to __init__
# 'awsRegion': 'us-west-2' | ||
# } | ||
# }, | ||
# 'normalization': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this key would be streamalert:normalization
?
# 'normalization': { | ||
# 'region': { | ||
# 'values': ['us-east-1', 'us-west-2'] | ||
# 'function': 'AWS region' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure of this object will not work. If you have 2 region
type values with different function
values it will conflict.
"streamalert:normalization": {
"ip_address": [
{
"values": ["4.3.2.1"],
"function": "outbound_connection_destination"
},
{
"values": ["2.2.2.2", "4.4.4.4"],
"function": "dns_lookup"
}
}
]
}
streamalert/shared/normalize.py
Outdated
having following format, otherwise it will raise ConfigError. | ||
[ | ||
{ | ||
'fields': ['source', 'sourceIPAddress'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not how I intended it to work. In this documentation you're implying that each normalizer can map to multiple fields. The way I designed it is each normalizer maps to exactly one field. The field
array is a JSON path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[
{
'fields': ['path', 'to', 'the', 'field'],
'function': 'same_function'
},
{
'fields': ['other', 'path', 'same', 'function'],
'function': 'same_function'
},
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, misunderstood the configuration part in the design doc. The new change will be up soon.
…o find original key
@ryandeivert @Ryxias PTAL, I have addressed your comments. I also updated PR description and docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
streamalert/shared/normalize.py
Outdated
continue | ||
|
||
yield value | ||
# for key, value in record.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary comment block?
ff0d98e
to
d19d23e
Compare
to: @airbnb/streamalert-maintainers
related to: #1237 #1230
resolves:
Background
This is 3rd PR to implement Normalization v2 feature. It is focus on refactor normalization code to use both Normalization v1 and v2 configuration and add additional information
function
to normalized values.New Support
The biggest change in this PR is the normalization configuration change in
conf/schemas/*.json
. In StreamAlert schemas, provide a new configuration option:Archtecture
Deprecate Normalization v1
We deprecate Normalization v1 and
conf/normalized_types.json
will have no effect. Make sure migration our normalized configuration toconf/schemas/*.json
along with log schemas.Changes
conf/schemas/carbonblack.json
,conf/schemas/cloudwatch
andconf/schemas/osquery.json
. Also update ruleright_to_left_character
to use normalization v2.Testing
Deploy the changes to staging environment and created a kinesis stream for testing. Sent two fake cloudwatch events to the kinesis.
cloudwatch:events
table.artifacts
table.Next Step
Still have one more PR to complete this feature that is to add advanced features to apply filters in normalization.