Skip to content

Commit

Permalink
Change rule source configuration to JSON with include and exclude sup…
Browse files Browse the repository at this point in the history
…port (#99)
  • Loading branch information
austinbyers committed Jan 10, 2018
1 parent f8141d4 commit 460d2cd
Show file tree
Hide file tree
Showing 7 changed files with 291 additions and 88 deletions.
67 changes: 61 additions & 6 deletions docs/source/adding-yara-rules.rst
Expand Up @@ -12,22 +12,76 @@ BinaryAlert includes a number of `custom YARA rules <https://github.com/airbnb/b

Clone Rules From Other Projects
-------------------------------
BinaryAlert makes it easy to clone YARA rules from other open-source projects:
BinaryAlert makes it easy to clone YARA rules from other projects:

.. code-block:: bash
$ ./manage.py clone_rules
This will copy a subset of YARA rules from several :ref:`open-source collections <yara-credits>`.
You can add more rule sources in `rules/clone_rules.py <https://github.com/airbnb/binaryalert/blob/master/rules/clone_rules.py>`_
This will copy a subset of YARA rules from several default :ref:`open-source collections <yara-credits>` into the ``rules/`` folder.
The cloned folder structure will mirror that of the remote repository.

.. note:: We are working on a more expressive configuration for cloning subsets of rule repositories.
.. note:: To ensure all upstream changes are copied (including deletions), the cloned folder structure for each repo will be deleted before cloning. For example, ``rules/github.com/Yara-Rules/rules.git`` will be deleted from your local filesystem before cloning from ``Yara-Rules``.

Configuring Rule Sources
........................

You can configure the remote rule sources in `rules/rule_sources.json <https://github.com/airbnb/binaryalert/blob/master/rules/rule_sources.json>`_. Each rule source is defined by a git-cloneable ``url``, an optional list of file paths to ``include``, and an optional list of file paths to ``exclude``.

Some examples using the `Yara-Rules <https://github.com/Yara-Rules/rules>`_ repository:

**1. URL only**

.. code-block:: json
{
"repos": [
{
"url": "https://github.com/Yara-Rules/rules.git"
}
]
}
If you specify just the ``git`` URL, BinaryAlert will traverse the entire repo and copy every ``.yar`` and ``.yara`` file (case insensitive).
SSH URLs (e.g. ``git@github.com:Yara-Rules/rules.git``) are also supported, since BinaryAlert just runs a ``git clone`` on the specified URL.

**2. Filter with Include and Exclude**

The ``Yara-Rules`` repo is very large, and you may only be interested in a specific subset of rules:

.. code-block:: json
{
"repos": [
{
"url": "https://github.com/Yara-Rules/rules.git",
"include": [
"CVE_Rules/*",
"Malware/*"
],
"exclude": [
"Malware/POS*",
"*_index.yar"
]
}
]
}
.. note:: This example is for demonstrative purposes only and is not necessarily recommended.

This will copy rules from the ``CVE_Rules`` and ``Malware`` folders, excluding POS and index files. BinaryAlert runs Unix filename pattern matching via `fnmatch <https://docs.python.org/3.6/library/fnmatch.html>`_.

In summary, BinaryAlert will copy a file from a remote repository if and only if the following conditions apply:

1. The file name ends in ``.yar`` or ``.yara`` (case insensitive), AND
2. The file path matches a pattern in the ``include`` list (OR the ``include`` list is empty), AND
3. The file path *does not* match a pattern in the ``exclude`` list.

Write Your Own Rules
--------------------
You can add your own ``.yar`` or ``.yara`` files anywhere in the ``rules/`` directory tree. Refer to the `writing YARA rules <http://yara.readthedocs.io/en/latest/writingrules.html>`_ documentation for guidance and examples. Note that when BinaryAlert finds a file which matches a YARA rule, the rule name, `metadata <http://yara.readthedocs.io/en/latest/writingrules.html#metadata>`_, `tags <http://yara.readthedocs.io/en/latest/writingrules.html#rule-tags>`_, and matched `string <http://yara.readthedocs.io/en/latest/writingrules.html#strings>`_ names will be included in the alert for your convenience.

.. note:: Because the folders for each remote source will be overwritten during rule cloning, we recommend keeping your own YARA rules in ``rules/private`` or similar.

.. _external-variables:

Expand All @@ -40,7 +94,7 @@ In order to support the rule repositories listed above, BinaryAlert provides the
* ``filepath`` - Full file path ("/path/to/file.exe")
* ``filetype`` - Uppercase ``extension`` without leading period ("DOCX", "EXE", "PDF"), etc

You can use these variables in your own rules to match or exclude certain filepaths. (Note that the variables will default to empty strings if they are not available.) For example, this is a YARA rule which matches only files containing the string "evil" in the ``/home/`` directory:
You can use these variables in your own rules to match or exclude certain file paths. (Note that the variables will default to empty strings if they are not available.) For example, this is a YARA rule which matches only files containing the string "evil" in the ``/home/`` directory:

.. code-block:: none
Expand All @@ -53,12 +107,13 @@ You can use these variables in your own rules to match or exclude certain filepa
$evil and filepath matches /\/home\/*/
}
.. warning:: YARA analysis of archives `does not yet support external variables <https://github.com/BayshoreNetworks/yextend/issues/17>`_.

.. _supported_yara_modules:

Supported Modules
-----------------
BinaryAlert supports all of the default `YARA modules <http://yara.readthedocs.io/en/latest/modules.html>`_, including ELF, Math, Hash, and PE.
BinaryAlert supports all of the default `YARA modules <http://yara.readthedocs.io/en/latest/modules.html>`_, including ELF, Math, Hash, and PE. Support for other modules is not planned at this time, but please `let us know <https://github.com/airbnb/binaryalert/issues>`_ if you need a special module.


Disabling Rules
Expand Down
2 changes: 1 addition & 1 deletion manage.py
Expand Up @@ -413,7 +413,7 @@ def cb_copy_all(self) -> None:
@staticmethod
def clone_rules() -> None:
"""Clone YARA rules from other open-source projects."""
clone_rules.clone_rules_from_github()
clone_rules.clone_remote_rules()

@staticmethod
def compile_rules() -> None:
Expand Down
112 changes: 85 additions & 27 deletions rules/clone_rules.py
@@ -1,34 +1,92 @@
"""Update YARA rules cloned from remote sources."""
from fnmatch import fnmatch
import json
import os
import shutil
import subprocess
import tempfile
from typing import Generator, List, Optional

RULES_DIR = os.path.dirname(os.path.realpath(__file__)) # Directory containing this file.
REMOTE_RULE_SOURCES = {
'https://github.com/Neo23x0/signature-base.git': ['yara'],
'https://github.com/YARA-Rules/rules.git': ['CVE_Rules']
}


def clone_rules_from_github() -> None:
"""Update YARA rules cloned from GitHub."""
for url, folders in REMOTE_RULE_SOURCES.items():
# Clone repo into a temporary directory.
print('Cloning YARA rules from {}/{}...'.format(url, folders))
cloned_repo_root = os.path.join(tempfile.gettempdir(), os.path.basename(url))
if os.path.exists(cloned_repo_root):
shutil.rmtree(cloned_repo_root)
subprocess.check_call(['git', 'clone', '--depth', '1', url, cloned_repo_root])

# Copy each specified folder into the target rules directory.
for folder in folders:
source = os.path.join(cloned_repo_root, folder)
destination = os.path.join(RULES_DIR, url.split('//')[1], folder)
if os.path.exists(destination):
# Remove existing rules in this folder before copying
# (in case upstream rules were deleted).
shutil.rmtree(destination)
shutil.copytree(source, destination)

shutil.rmtree(cloned_repo_root) # Remove temporary cloned repo.
REMOTE_RULE_SOURCES = os.path.join(RULES_DIR, 'rule_sources.json')


def _copy_required(path: str, include: Optional[List[str]], exclude: Optional[List[str]]) -> bool:
"""Return True if the given filepath should be copied, given the include/exclude directives."""
# 1) If the path is not in the "include" list (which defaults to everything), skip it.
if include and not any(fnmatch(path, pattern) for pattern in include):
return False

# 2) If the path is specifically excluded, skip it.
if exclude and any(fnmatch(path, pattern) for pattern in exclude):
return False

# 3) If the path is not a .yar or .yara file, skip it.
lower_filename = path.lower()
if not lower_filename.endswith('.yar') and not lower_filename.endswith('.yara'):
return False

return True


def _files_to_copy(
cloned_repo_root: str, include: Optional[List[str]],
exclude: Optional[List[str]]) -> Generator[str, None, None]:
"""Yields string paths to copy, each relative to the root of the repo."""
for root, _, files in os.walk(cloned_repo_root):
for filename in files:
# Compute path *relative to the root of its repository*
relative_path = os.path.relpath(os.path.join(root, filename), start=cloned_repo_root)
if _copy_required(relative_path, include, exclude):
yield relative_path


def _clone_repo(url: str, include: Optional[List[str]], exclude: Optional[List[str]]) -> int:
"""Clone the given repo and copy only the YARA files from the specified paths.
Returns:
Number of files copied.
"""
# Shallow clone entire repo into a temp directory.
cloned_repo_root = os.path.join(tempfile.gettempdir(), os.path.basename(url))
if os.path.exists(cloned_repo_root):
shutil.rmtree(cloned_repo_root)
subprocess.check_call(['git', 'clone', '--quiet', '--depth', '1', url, cloned_repo_root])

# Remove existing rules in target folder before copying (in case upstream rules were deleted).
target_repo_root = os.path.join(RULES_DIR, url.split('//')[1])
if os.path.exists(target_repo_root):
shutil.rmtree(target_repo_root)

# Copy each applicable file into the target folder in the rules/ directory.
files_copied = 0
for relative_path in _files_to_copy(cloned_repo_root, include, exclude):
# Create all of the intermediate directories, if they don't already exist.
os.makedirs(os.path.join(target_repo_root, os.path.dirname(relative_path)), exist_ok=True)
src = os.path.join(cloned_repo_root, relative_path)
dst = os.path.join(target_repo_root, relative_path)
shutil.copy(src, dst)
files_copied += 1

# Remove temporary cloned repo.
shutil.rmtree(cloned_repo_root)

return files_copied


def clone_remote_rules() -> None:
"""Clone YARA rules from all remote sources into the rules/ directory."""
with open(REMOTE_RULE_SOURCES) as f:
rule_sources = json.load(f)

num_repos = len(rule_sources['repos'])
total_files_copied = 0
for count, source in enumerate(rule_sources['repos'], start=1):
print('[{}/{}] Cloning {}... '.format(count, num_repos, source['url']), end='', flush=True)
files_copied = _clone_repo(source['url'], source.get('include'), source.get('exclude'))
print('{} YARA {} copied'.format(files_copied, 'file' if files_copied == 1 else 'files'))
total_files_copied += files_copied

print('Done! {} YARA {} cloned from {} {}.'.format(
total_files_copied, 'file' if total_files_copied == 1 else 'files',
num_repos, 'repository' if num_repos == 1 else 'repositories'))
16 changes: 16 additions & 0 deletions rules/rule_sources.json
@@ -0,0 +1,16 @@
{
"repos": [
{
"url": "https://github.com/Neo23x0/signature-base.git",
"include": [
"yara/*"
]
},
{
"url": "https://github.com/YARA-Rules/rules.git",
"include": [
"CVE_Rules/*"
]
}
]
}
4 changes: 2 additions & 2 deletions tests/manage_test.py
Expand Up @@ -321,9 +321,9 @@ def test_cb_copy_all_not_enabled(self):
with self.assertRaises(manage.InvalidConfigError):
self.manager.cb_copy_all()

@mock.patch.object(manage.clone_rules, 'clone_rules_from_github')
@mock.patch.object(manage.clone_rules, 'clone_remote_rules')
def test_clone_rules(self, mock_clone: mock.MagicMock):
"""Calls clone_rules_from_github (tested elsewhere)."""
"""Calls clone_remote_rules (tested elsewhere)."""
self.manager.clone_rules()
mock_clone.assert_called_once()

Expand Down
126 changes: 126 additions & 0 deletions tests/rules/clone_rules_test.py
@@ -0,0 +1,126 @@
"""Tests for rule update/clone logic."""
# pylint: disable=protected-access
import json
import os
from typing import List
import unittest
from unittest import mock

from pyfakefs import fake_filesystem_unittest

from rules import compile_rules, clone_rules


class CopyRequiredTest(unittest.TestCase):
"""Test the _copy_required private method."""

def test_copy_required_no_lists(self):
"""If neither an exclude nor an include list is specified, YARA files should be copied."""
self.assertTrue(clone_rules._copy_required('path/to/file.yar', None, None))
self.assertTrue(clone_rules._copy_required('path/fo/file.YARA', [], []))
self.assertFalse(clone_rules._copy_required('.git/HEAD', None, None))
self.assertFalse(clone_rules._copy_required('path/to/file.txt', None, None))

def test_copy_required_include_list(self):
"""Only files matching the include list should be copied."""
include_list = ['path/to/*', '[abc]?/*/file*']

self.assertTrue(clone_rules._copy_required('path/to/rules.yara', include_list, []))
self.assertTrue(clone_rules._copy_required(
'a1/some/long/path/file_apt.yara', include_list, []))
self.assertTrue(clone_rules._copy_required('b2/malware/file ROOTKIT.YAR', include_list, []))

self.assertFalse(clone_rules._copy_required('base.yara', include_list, []))
self.assertFalse(clone_rules._copy_required('path/to/file.txt', include_list, []))
self.assertFalse(clone_rules._copy_required('a1/file.yara', include_list, []))

def test_copy_required_exclude_list(self):
"""Skip any file matching the exclude list."""
exclude_list = ['*.yar', 'skip/these/file*']
self.assertTrue(clone_rules._copy_required('base.yara', [], exclude_list))
self.assertTrue(clone_rules._copy_required('path/to/file.yara', [], exclude_list))
self.assertFalse(clone_rules._copy_required('file.yar', [], exclude_list))
self.assertFalse(clone_rules._copy_required('skip/these/file.yara', [], exclude_list))

def test_copy_required_include_and_exclude(self):
"""Test copy required with both an include and exclude list specified."""
include = ['yara/*', '*_malware_*']
exclude = ['*mobile*', 'yara/?.yara']

self.assertTrue(clone_rules._copy_required('yara/packed.yara', include, exclude))
self.assertTrue(clone_rules._copy_required('base_malware_index.yara', include, exclude))
self.assertTrue(clone_rules._copy_required('yara/mac_malware.yar', include, exclude))

self.assertFalse(clone_rules._copy_required('not_included.yara', include, exclude))
self.assertFalse(clone_rules._copy_required('yara/mobile_malware.yara', include, exclude))
self.assertFalse(clone_rules._copy_required('yara/A.yara', include, exclude))


class CloneRulesTest(fake_filesystem_unittest.TestCase):
"""Tests for the rule-cloning logic."""

def setUp(self):
"""Setup the fake filesystem with the expected rules folder structure."""
self.setUpPyfakefs()
os.makedirs(clone_rules.RULES_DIR)

# Add fake rule sources.
self.fs.CreateFile(clone_rules.REMOTE_RULE_SOURCES, contents=json.dumps(
{
"repos": [
{
"url": "https://github.com/test-user1/test-repo1",
"include": ["yara/*"]
},
{
"url": "https://github.com/test-user2/test-repo2",
"exclude": ["windows/*", "*_mobile.yara"]
}
]
}
))

# Add extra rules (which should be deleted).
self.fs.CreateFile(os.path.join(
clone_rules.RULES_DIR,
'github.com', 'test-user1', 'test-repo1', 'CVE_Rules', 'delete-me.yara'
))

# Add some other rules (which should be preserved).
self.fs.CreateFile(os.path.join(clone_rules.RULES_DIR, 'private', 'private.yara'))

def _mock_git_clone(self, args: List[str]) -> None:
"""Mock out git clone by creating the "cloned" directory."""
cloned_repo_root = args[-1]

# Create "cloned" directory and subfolders.
if cloned_repo_root.endswith('test-repo1'):
self.fs.CreateFile(os.path.join(cloned_repo_root, 'yara', 'cloned.yara'))
self.fs.CreateFile(os.path.join(cloned_repo_root, 'not_included.yara'))
else:
self.fs.CreateFile(os.path.join(cloned_repo_root, 'yara', 'cloned.yara'))
self.fs.CreateFile(os.path.join(cloned_repo_root, 'yara', 'exluded_mobile.yara'))
self.fs.CreateFile(os.path.join(cloned_repo_root, 'windows', 'excluded.yara'))

@mock.patch.object(clone_rules, 'print')
def test_clone_remote_rules(self, mock_print: mock.MagicMock):
"""Mock out the clone process and verify which rules files were saved/deleted."""
with mock.patch('subprocess.check_call', side_effect=self._mock_git_clone):
clone_rules.clone_remote_rules()

mock_print.assert_has_calls([
mock.call('[1/2] Cloning https://github.com/test-user1/test-repo1... ',
end='', flush=True),
mock.call('1 YARA file copied'),
mock.call('[2/2] Cloning https://github.com/test-user2/test-repo2... ',
end='', flush=True),
mock.call('1 YARA file copied'),
mock.call('Done! 2 YARA files cloned from 2 repositories.')
])

expected_files = {
'github.com/test-user1/test-repo1/yara/cloned.yara',
'github.com/test-user2/test-repo2/yara/cloned.yara',
'private/private.yara'
}
self.assertEqual(expected_files, set(compile_rules._find_yara_files()))

0 comments on commit 460d2cd

Please sign in to comment.