Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Austin Byers committed Sep 5, 2018
1 parent a1fd645 commit 89f99e9
Show file tree
Hide file tree
Showing 14 changed files with 185 additions and 168 deletions.
2 changes: 1 addition & 1 deletion cli/__init__.py
@@ -1,2 +1,2 @@
"""BinaryAlert release version"""
__version__ = '1.1.0'
__version__ = '1.2.0'
Binary file modified docs/images/architecture.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 6 additions & 11 deletions docs/source/adding-yara-rules.rst
Expand Up @@ -79,15 +79,15 @@ In summary, BinaryAlert will copy a file from a remote repository if and only if

Write Your Own Rules
--------------------
You can add your own ``.yar`` or ``.yara`` files anywhere in the ``rules/`` directory tree. Refer to the `writing YARA rules <http://yara.readthedocs.io/en/latest/writingrules.html>`_ documentation for guidance and examples. Note that when BinaryAlert finds a file which matches a YARA rule, the rule name, `metadata <http://yara.readthedocs.io/en/latest/writingrules.html#metadata>`_, `tags <http://yara.readthedocs.io/en/latest/writingrules.html#rule-tags>`_, and matched `string <http://yara.readthedocs.io/en/latest/writingrules.html#strings>`_ names will be included in the alert for your convenience.
You can add your own ``.yar`` or ``.yara`` files anywhere in the ``rules/`` directory tree. Refer to the `writing YARA rules <http://yara.readthedocs.io/en/latest/writingrules.html>`_ documentation for guidance and examples. Note that when BinaryAlert finds a file which matches a YARA rule, the rule name, `metadata <http://yara.readthedocs.io/en/latest/writingrules.html#metadata>`_, `tags <http://yara.readthedocs.io/en/latest/writingrules.html#rule-tags>`_, and matched `string <http://yara.readthedocs.io/en/latest/writingrules.html#strings>`_ names and string data will be included in the alert for your convenience.

.. note:: Because the folders for each remote source will be overwritten during rule cloning, we recommend keeping your own YARA rules in ``rules/private`` or similar.

.. _external-variables:

External Variables
------------------
In order to support the rule repositories listed above, BinaryAlert provides the following `external variables <http://yara.readthedocs.io/en/latest/writingrules.html#external-variables>`_:
In order to support the rule repositories listed above, BinaryAlert provides the following `external variables <http://yara.readthedocs.io/en/latest/writingrules.html#external-variables>`_ to YARA:

* ``extension`` - File extension (".docx", ".exe", ".pdf", etc)
* ``filename`` - File basename ("file.exe")
Expand Down Expand Up @@ -120,7 +120,7 @@ Disabling Rules
---------------
There may be times you want to disable certain YARA rules, but not delete them (e.g. rules with high false-positive rates). Since only ``.yar`` and ``.yara`` files in the ``rules/`` directory tree are bundled in a BinaryAlert deploy, you can simply rename ``rules.yar`` to any other extension, e.g. ``rules.yar.DISABLED``, to skip it during rules compilation.

If you want to disable an individual rule (not the entire file), you can either comment it out or prefix the rule with the ``private`` modifier to elide it from reported YARA match results. Unfortunately, there is no easy way to automatically *remove* individual rules from a file.
If you want to disable an individual rule (not the entire file), you can either comment it out or prefix the rule with the ``private`` modifier to elide it from reported YARA match results.


.. _testing_yara_rules:
Expand All @@ -141,13 +141,8 @@ To test *all* of your YARA rules, you first need to compile them into a single b
$ ./manage.py compile_rules # Saves "compiled_yara_rules.bin"
This compiled rules file is what gets bundled with the BinaryAlert analyzers. Now, from a Python interpreter:
This compiled rules file is what gets bundled with the BinaryAlert analyzers, and you can use it with YARA just like any other rules file:

.. code-block:: python
import yara
rules = yara.load('compiled_yara_rules.bin')
matches = rules.match('file_to_text.exe')
print(matches)
.. code-block:: bash
See the `yara-python <http://yara.readthedocs.io/en/latest/yarapython.html>`_ docs for more information about using YARA from Python.
$ yara compiled_yara_rules.bin file_to_test
230 changes: 127 additions & 103 deletions docs/source/analyzing-files.rst
@@ -1,27 +1,144 @@
Analyzing Files
===============
Files uploaded to the BinaryAlert S3 bucket will be automatically queued for analysis. You can also
use the analyzer to scan files from other buckets directly or in response to event notifications.
invoke the analyzer directly, scan files in other buckets, or download files from CarbonBlack.

Uploading Files
---------------

To upload files for analysis, you need only upload them to the BinaryAlert S3 bucket. The S3 bucket name is of the form
All files uploaded to the BinaryAlert S3 bucket will be immediately queued for analysis. The S3 bucket name is of the form

.. code-block:: none
YOUR.NAME.PREFIX.binaryalert-binaries.REGION
When uploading to S3, any object metadata you set will be included in all match alerts. In addition, if there is a ``filepath`` metadata key, BinaryAlert will make the filepath :ref:`external variables <external-variables>` available to the YARA rules.

Uploaded files are persisted indefinitely so that BinaryAlert can retroactively analyze all files with every rule update. The S3 bucket has both `access logging <http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html>`_ and `object versioning <http://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html>`_ enabled.
Uploaded files are persisted indefinitely so that BinaryAlert can retroactively analyze all files.
The S3 bucket has `access logging <http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html>`_, `object versioning <http://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectVersioning.html>`_, `inventory <https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-inventory.html>`_, and `server-side encryption <https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html>`_ enabled.


Analyzing Existing Buckets
--------------------------
To scan files in other S3 buckets, you first need to grant BinaryAlert permission to access them. Modify the S3 section of your `terraform.tfvars <https://github.com/airbnb/binaryalert/blob/master/terraform/terraform.tfvars>`_ file and `deploy <deploying.html>`_ the changes:

.. code-block:: terraform
# ##### S3 #####
# If using BinaryAlert to scan existing S3 buckets, add the S3 and KMS resource ARNs here
# (KMS if the objects are server-side encrypted)
external_s3_bucket_resources = ["arn:aws:s3:::bucket-name/*"]
external_kms_key_resources = ["arn:aws:kms:REGION:ACCOUNT:key/KEY-UUID"]
Direct Invocation
.................
You can directly invoke the BinaryAlert analyzer to scan any S3 object it has access to. The match
results will always be saved to Dynamo, but you can configure whether each request should also trigger
the normal SNS alerts:

.. code-block:: python
import boto3, json
response = boto3.client('lambda').invoke(
FunctionName='your_prefix_binaryalert_analyzer',
InvocationType='RequestResponse',
Payload=json.dumps({
'BucketName': 'your-bucket-name', # S3 bucket name
'EnableSNSAlerts': False, # Toggle SNS alerts
'ObjectKeys': ['key1', 'key2'] # List of S3 object keys
}),
Qualifier='Production'
)
results = json.loads(response['Payload'].read().decode('utf-8'))
print(json.dumps(results, sort_keys=True, indent=4))
{
'S3:BUCKET-NAME:KEY1': {
'FileInfo': {
'MD5': '...',
'S3LastModified': '...',
'S3Metadata': {},
'SHA256': '...'
},
'MatchedRules': {
'Rule1':
'MatchedData': ['abc'],
'MatchedStrings': ['$a'],
'Meta': {
'description': 'Test YARA rule'
},
'RuleFile': 'rules.yara',
'RuleName': 'test_dummy_true'
},
'NumMatchedRules': 1
}
'S3:BUCKET-NAME:KEY2': {
'FileInfo': {
'MD5': '...',
'S3LastModified': '...',
'S3Metadata': {},
'SHA256': '...'
},
'MatchedRules': {},
'NumMatchedRules': 0
}
}
Configuring Event Notifications
...............................
You can configure other buckets to send S3 event notifications to the BinaryAlert SQS queue.
To do so, create an `event notification <http://docs.aws.amazon.com/AmazonS3/latest/user-guide/enable-event-notifications.html>`_ on your existing bucket
and then modify the `BinaryAlert SQS permissions <https://github.com/airbnb/binaryalert/blob/ea5c31ee55a483e5216296e3e0598e3318b7eb24/terraform/sqs.tf#L28-L33>`_ accordingly.
Once configured, BinaryAlert will be automatically analyzing new objects in your existing buckets in addition to its own.


.. _retro_scan:

Retroactive Analysis
--------------------
When adding new YARA rules to your collection, you can easily re-scan all of your files in the BinaryAlert bucket to see if any of them match the new rules:

.. code-block:: bash
$ ./manage.py retro_fast
This will enumerate the most recent `S3 inventory manifest <https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-inventory.html>`_, adding all object keys to the analysis SQS queue.
However, if your bucket is less than 48 hours old, it may not yet have an inventory manifest. In that case, you can list the objects yourself:

.. code-block:: bash
$ ./manage.py retro_slow
As its name suggests, enumerating the bucket directly will generally be much slower than reading the inventory, particularly for buckets with thousands of objects or more.

.. note:: Because the inventory may be up to 24 hours old, a ``retro_fast`` scan may miss the newest objects in the bucket. If you need to scan *all* files immediately, use ``retro_slow``.

In either case, once all of the objects are in the analyzer SQS queue, it will take some time for BinaryAlert to finish scanning all of them (depending on how many objects you have).
`YARA matches <yara-matches.html>`_ found during a retroactive scan are treated like any other - the matches are saved to Dynamo and reported via SNS.

Stopping a Retro Scan
.....................
Sometimes, a new YARA rule you thought would be great turns out to be super noisy, flooding you with false positive alerts.
Unfortunately, if you have millions of objects in your BinaryAlert bucket, a retro scan can take hours to finish.
To stop a retro scan dead in its tracks, you can drop all messages from the analysis queue:

.. code-block:: bash
$ ./manage.py purge_queue
.. warning:: This will also drop any event notifications from newly added objects that arrived after the retro scan started. These objects won't be scanned again until either (a) the next ``retro_slow`` scan or (b) the next ``retro_fast`` after 24 hours when the new object is in the inventory.


.. _cb_downloader:

CarbonBlack Downloader
----------------------
If you use CarbonBlack Enterprise Response, you can enable BinaryAlert's optional downloader Lambda function. The downloader copies files (and some metadata) from CarbonBlack into BinaryAlert's S3 bucket. To enable it:
If you use CarbonBlack Enterprise Response, you can enable BinaryAlert's optional downloader SQS queue and Lambda function.
The downloader copies files (and some metadata) from CarbonBlack into BinaryAlert's S3 bucket. To enable it:

.. code-block:: none
Expand All @@ -32,6 +149,8 @@ If you use CarbonBlack Enterprise Response, you can enable BinaryAlert's optiona
CarbonBlack URL: https://your.carbonblack.url
CarbonBlack API token (only needs binary read access):
$ ./manage.py deploy
.. warning:: The API token only needs access to read binaries. Do not use a token with admin privileges, do not allow other users to share the same token, and be sure to regularly rotate the token.

.. note:: The API token will not be shown on screen and BinaryAlert will create a new KMS key to encrypt the credentials before saving them to the ``terraform.tfvars`` configuration file. The downloader (and no other component) is authorized to decrypt the credentials with the generated key.
Expand All @@ -51,8 +170,6 @@ Binaries downloaded from CarbonBlack are saved to the BinaryAlert S3 bucket with
'filepath' # from the "observed_filenames" CarbonBlack metadata
]
Once the downloader is enabled, you can either copy everything from CarbonBlack in one go, or you can `deploy <deploying.html>`_ the downloader components and setup real-time invocations for every new binary.

Copy All Files
..............
Expand All @@ -62,113 +179,20 @@ If you want to run a one-time job to copy every file from CarbonBlack into Binar
$ ./manage.py cb_copy_all
This runs *locally*, using multiple threads to enumerate the files in CarbonBlack and copy them over to BinaryAlert. The downloader *code* is used, but there are no Lambda invocations. This means you can copy all of the files from CarbonBlack without actually deploying the downloader components.
This runs locally, using multiple threads to enumerate the files in CarbonBlack into the BinaryAlert downloader SQS queue.


Real-Time Invocations
.....................
To ensure real-time file analysis, we recommend invoking the downloader every time CarbonBlack logs a ``binarystore.file.added`` event. If you use `StreamAlert <https://streamalert.io/>`_ to process CarbonBlack logs, the following `rule <https://streamalert.io/rules.html>`_ will invoke the BinaryAlert downloader for every new binary (assuming BinaryAlert is a properly configured Lambda `output <https://streamalert.io/outputs.html>`_):
For real-time file analysis, we recommend publishing to the downloader SQS queue every time CarbonBlack logs a ``binarystore.file.added`` event. If you use `StreamAlert <https://streamalert.io/>`_ to process CarbonBlack logs, the following `rule <https://streamalert.io/rules.html>`_ will publish a message for every new binary (assuming the SQS queue is a properly configured StreamAlert `output <https://streamalert.io/outputs.html>`_):

.. code-block:: python
@rule(logs=['carbonblack:binarystore.file.added'],
matchers=[],
outputs=['aws-lambda:binaryalert'])
@rule(logs=['carbonblack:binarystore.file.added'], outputs=['aws-sqs:binaryalert'])
def cb_binarystore_file_added(rec):
"""
description: CarbonBlack found a new binary: forward to BinaryAlert for YARA analysis.
"""
return True
If you don't use StreamAlert, you can invoke the downloader yourself:

.. code-block:: python
import boto3, json
boto3.client('lambda').invoke(
FunctionName='your_prefix_binaryalert_downloader',
InvocationType='Event', # Asynchronous invocation
Qualifier='Production', # Invoke production alias
Payload=json.dumps({'md5': 'FILE_MD5'}).encode('utf-8')
)
Analyzing Existing Buckets
--------------------------
As of v1.1, the BinaryAlert YARA analyzer is no longer restricted to just its own S3 bucket - it can
read other existing buckets as well. To grant access to other buckets, modify the analyzer's
IAM policy in `lambda_iam.tf <https://github.com/airbnb/binaryalert/blob/master/terraform/lambda_iam.tf>`_.

Direct Invocation
.................
You can directly invoke the BinaryAlert analyzer to scan any S3 object it has access to:

.. code-block:: python
import boto3, json
response = boto3.client('lambda').invoke(
FunctionName='your_prefix_binaryalert_analyzer',
InvocationType='RequestResponse',
Qualifier='Production',
Payload=json.dumps({
'Records': [
{
's3': {
'bucket': {'name': 'BUCKET-NAME'},
'object': {'key': 'KEY1'}
}
},
{
's3': {
'bucket': {'name': 'BUCKET-NAME'},
'object': {'key': 'KEY2'}
}
}
]
})
)
decoded = json.loads(response['Payload'].read().decode('utf-8'))
print(decoded)
{
'S3:BUCKET-NAME:KEY1': {
'FileInfo': {
'MD5': '...',
'S3LastModified': '...',
'S3Metadata': {},
'SHA256': '...'
},
'MatchedRules': {
'Rule1':
'MatchedStrings': ['$a'],
'Meta': {
'description': 'Test YARA rule'
},
'RuleFile': 'rules.yara',
'RuleName': 'test_dummy_true'
},
'NumMatchedRules': 1
}
'S3:BUCKET-NAME:KEY2': {
'FileInfo': {
'MD5': '...',
'S3LastModified': '...',
'S3Metadata': {},
'SHA256': '...'
},
'MatchedRules': {},
'NumMatchedRules': 0
}
}
.. note:: The analyzer will always save YARA matches to Dynamo and send alerts to the SNS topic, even when invoked directly or when analyzing other buckets.

Configuring Event Notifications
...............................
You can configure other buckets to send S3 event notifications to the BinaryAlert SQS queue
(recommended) or to the analyzer directly. In either case, once configured, BinaryAlert will be
automatically analyzing your existing buckets in addition to its own.
See `AWS: Enable Event Notifications <http://docs.aws.amazon.com/AmazonS3/latest/user-guide/enable-event-notifications.html>`_
and a `terraform example <https://www.terraform.io/docs/providers/aws/r/s3_bucket_notification.html#add-notification-configuration-to-sqs-queue>`_ to get started.
You can also directly publish messages to the downloader SQS queue. Messages are expected to be in the very simple format ``{'md5': 'ABCDE....'}``
4 changes: 2 additions & 2 deletions docs/source/architecture.rst
Expand Up @@ -4,7 +4,7 @@ BinaryAlert utilizes a `serverless <https://aws.amazon.com/serverless/>`_ archit

.. image:: ../images/architecture.png
:align: center
:scale: 30%
:scale: 80%
:alt: BinaryAlert Architecture


Expand All @@ -16,5 +16,5 @@ Analysis Lifecycle
3. The SQS queue automatically batches files and invokes many analyzers in parallel.
4. Each analyzer scans its files using a list of pre-compiled `YARA rules <adding-yara-rules.html>`_.
5. `YARA matches <yara-matches.html>`_ are saved to DynamoDB and an alert is sent to an SNS topic. You can subscribe to these alerts via `StreamAlert <https://streamalert.io>`_, email, or any other supported `SNS subscription <http://docs.aws.amazon.com/sns/latest/api/API_Subscribe.html>`_.
6. For retroactive analysis, a batching Lambda function enqueues the entire S3 bucket to be re-analyzed.
6. For :ref:`retroactive analysis <retro_scan>`, the CLI will enqueue the entire S3 bucket to be re-analyzed.
7. Configurable :ref:`CloudWatch alarms <metric_alarms>` will trigger if any BinaryAlert component is behaving abnormally. This will notify a different SNS topic than the one used for YARA match alerts.
13 changes: 7 additions & 6 deletions docs/source/conf.py
Expand Up @@ -22,6 +22,7 @@
# sys.path.insert(0, os.path.abspath('.'))

import os
import re
on_rtd = os.environ.get('READTHEDOCS', None) == 'True'

if not on_rtd: # only import and set the theme if we're building docs locally
Expand Down Expand Up @@ -55,17 +56,17 @@

# General information about the project.
project = 'BinaryAlert'
copyright = '2017, Airbnb'
author = 'Austin Byers'
copyright = '2018, Airbnb'
author = 'Airbnb'

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '1.1'
# The full version, including alpha/beta/rc tags.
release = '1.1.0'
with open('../../cli/__init__.py', 'r') as version_file:
release = re.search(
r"^__version__ = ['\"]([^'\"]+)['\"]", version_file.read(), re.MULTILINE).group(1)
version = release

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down

0 comments on commit 89f99e9

Please sign in to comment.