-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CarbonBlack downloader #52
Conversation
self.failed_queue = failed_queue | ||
|
||
# Each process needs its own logger to avoid race conditions. | ||
self.logger = logging.getLogger(self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, the logging package actually handles race conditions pretty well through the use of locks, etc (as seen here. That said, this is dependent on the threading
package being available, and I don't think you'll have that in lambda(?). So, in normal use cases, having a separate logger for each is probably unnecessary, but is probably the safer/better solution in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a one-off script that is not
designed to run in Lambda. It's probably confusing that it's part of the lambda_functions
hierarchy, but I'm not sure where would be a better place to put it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right; Python logging is thread-safe. I'll remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!! This is super exciting to open source. A few comments/questions throughout. Also excellent job on all the unit tests & mocks. You're a pro!
if failed_md5s: | ||
logger.error( | ||
'%d %s failed to copy: \n%s', len(failed_md5s), | ||
'binary' if len(failed_md5s) == 1 else 'binaries', '\n'.join(sorted(failed_md5s))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
lambda_functions/downloader/main.py
Outdated
LOGGER = logging.getLogger() | ||
LOGGER.setLevel(logging.INFO) | ||
|
||
ENCRYPTED_TOKEN = os.environ['ENCRYPTED_CARBON_BLACK_API_TOKEN'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are required env variables, would you consider putting this into a try/except KeyError
block that could log an error informing the user they must export said env variables (and then raising the exception as well). Currently the KeyError will be raised but without much context as to how to fix. For instance:
try:
CARBON_BLACK_URL = os.environ['CARBON_BLACK_URL']
ENCRYPTED_TOKEN = os.environ['ENCRYPTED_CARBON_BLACK_API_TOKEN']
TARGET_S3_BUCKET = os.environ['TARGET_S3_BUCKET']
except KeyError as err:
LOGGER.error('Please export the environment variable \'%s\' using...blah', err.message)
raise
I notice the 'TARGET_S3_BUCKET' env var is accessed within the _upload_to_s3
function and could presumably result in bigger problems if it doesn't exist. If there is not a chance that these would ever be missing, then you can probably safely ignore this :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I'm going to leave it for now because I want to standardize this across all of the Lambda functions in a subsequent PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I will definitely add this check in copy_all.py
, since that's the only code which is designed to be invoked by a local user
lambda_functions/downloader/main.py
Outdated
"""Upload a binary to S3, keyed by a UUID. | ||
|
||
Args: | ||
local_file_path: [string] Path to the file to upload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consideration - switching to parenths style arguments (similar to what we started doing in SA)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you must have been looking at an older commit; all types in new code should be replaced by explicit annotations (which obviates the need for types in the docstrings)
lambda_functions/downloader/main.py
Outdated
|
||
with open(local_file_path, 'rb') as target_file: | ||
S3_CLIENT.put_object( | ||
Bucket=os.environ['TARGET_S3_BUCKET'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comment about env variables (and my concern with accessing this within a function/loop in the instance that it doesn't exist).
tests/lambda_functions/build_test.py
Outdated
def test_build_all(self, mock_print): | ||
"""Verify that the top-level build function executes without error.""" | ||
build.build(self._tempdir) | ||
self.assertEqual(3, mock_print.call_count) | ||
self.assertEqual(4, mock_print.call_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ for call count. easy tests are the best tests
manage.py
Outdated
except ManagerError as error: | ||
# Print error type and message, not full stack trace. | ||
sys.exit('{}: {}'.format(type(error).__name__, error)) | ||
self._parse_config(allow_empty=(command == 'test')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify (to me) what the allow_empty
param is used for here? It looks like you mention it's only used in unit tests, but then it's used here (not in a unit test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a confusing flag to help with validation; it has since been removed and it is hopefully clearer now
lambda_functions/downloader/main.py
Outdated
|
||
# Exponential backoff: try up to 4 times, waiting longer each time. | ||
RETRY_SLEEP_SECS = [0, 30, 60, 120] | ||
|
||
|
||
def _download_from_carbon_black(md5): | ||
@backoff.on_exception(backoff.expo, ObjectNotFoundError, max_tries=5) | ||
def _download_from_carbon_black(binary: Binary) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ for typing
lambda_functions/downloader/main.py
Outdated
|
||
def _upload_to_s3(local_file_path, md5, observed_path): | ||
"""Upload a binary to S3, keyed by a UUID. | ||
download_path = '/tmp/carbonblack_{}'.format(binary.md5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a hardcoded tmp dir here, I'd suggest using the tempdir
package that offers interoperability across OSes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, that's a good point. Lambda explicitly allocates /tmp
(which may not be the value returned by gettempdir
). I'll try it and see!
@@ -30,7 +30,7 @@ class MockMain(object): | |||
"""Mock out the downloader Lambda main.py.""" | |||
CARBON_BLACK = MockCarbonBlack() | |||
|
|||
def __init__(self, inject_errors: bool=False): | |||
def __init__(self, inject_errors: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the space around the equals something Typing requires? Typically this causes a pylint error so just wondering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pylint actually complains if there is not the extra space:
************* Module tests.lambda_functions.downloader.copy_all_test
C: 39, 0: Exactly one space required around keyword argument assignment
def __init__(self, inject_errors: bool=False):
^ (bad-whitespace)
However, this is not keyword argument assignment
, so I'm guessing pylint doesn't exactly know how to handle type annotations?
I prefer the no-space version, but I don't want to have to disable the whitespace rule, so I guess I'll live with it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note:
def __init__(self, inject_errors=False: bool):
is invalid syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying!
@mock.patch.object(manage.Manager, 'build') | ||
@mock.patch.object(manage.Manager, 'apply') | ||
@mock.patch.object(manage.Manager, 'analyze_all') | ||
def test_deploy(self, mock_analyze: mock.MagicMock, mock_apply: mock.MagicMock, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stacks on stacks on stacks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boots and cats and stacks on stacks
b4b8ae4
to
5551964
Compare
5551964
to
ef0dfc6
Compare
@ryandeivert PTAL I've squashed the commits to simplify things and addressed all of the feedback. Be sure to look at the downloader code again because your last review was an an older commit for some reason |
@austinbyers your'e right - I had been stepping through commits since this PR was so large :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ryandeivert Thanks for your review! I should have done a better job of keeping track of and squashing the commits. I did another deploy just to test everything one last time and once the Travis tests pass I'll go ahead and merge |
Overview
Size: Extra Large
CarbonBlack automatically uploads new binaries that it finds on endpoints; users who have CarbonBlack can now optionally enable a CarbonBlack downloader Lambda function to copy binaries from CarbonBlack into BinaryAlert.
The downloader can be enabled by running the new
python3 manage.py configure
command to allow the user to set the CarbonBlack URL and encrypt its API key.Additionally,
python3 manage.py cb_copy_all
allows users to copy the entire CarbonBlack binary corpus into BinaryAlert in one go.Full documentation will be added in the next PR.
Change Summary
lambda_functions/downloader
enable_carbon_black_downloader
Terraform variable. All downloader resources are created only if the downloader is explicitly enabled.manage.py
:configure
andcb_copy_all
commandstest
command tounit_test
(to distinguish it fromlive_test
)Resolves: #29 (add downloader)
Resolves: #48 (additional name prefix validation)
Contributes to: #34 (type annotations)
Tested
CI
Added unit tests for downloader code as well as most of
manage.py
. As you can see from the commit history, mocking everything correctly was a huge pain. In the future, I think we should removemoto
entirely (we already have to do our own Dynamo and S3 mocks due to 2 separate moto issues)Test Deploy: Downloader Disabled (Default)
Test Deploy: Enable Downloader
After the previous deploy, we can easily re-configure and re-deploy:
Reviewers
to: @ryandeivert
cc: @mime-frame @airbnb/binaryalert-maintainers