New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifact writer #178

Merged
merged 3 commits into from Aug 8, 2016

Conversation

Projects
None yet
3 participants
@pidydx
Contributor

pidydx commented Jul 9, 2016

These are changes I have made while working on building a ForensicArtifact server (more details soon).

I have heard there may be a lot of changes to this library coming soon so I wanted to get this up for initial review. I still need to work out testing for the ArtifactWriter classes.

This PR fixes: #171

This PR is related to: #175
I was actually going to implement the exact solution in the referenced PR and then found it was already waiting in a PR so I just left it at my minimal necessary changes for reference. Being able to add additional types would also be extremely useful, but at a minimum I need a list of valid types for data storage/searching in the artifact server.

I also made changes to ArtifactReader that allow overriding the definitions for similar reasons of being able to easily extend labels.


This change is Reviewable

@joachimmetz

This comment has been minimized.

Member

joachimmetz commented Jul 9, 2016

I've split this PR in 4

  • fixes - merged
  • changes to source type and definitions - merged
  • changes to reader - please move to a separate CL, it is unclear to me what the changes try to accomplish
  • changes to writer - please update this CL
@@ -12,6 +12,18 @@
TYPE_INDICATOR_WINDOWS_REGISTRY_VALUE = 'REGISTRY_VALUE'
TYPE_INDICATOR_WMI_QUERY = 'WMI'
INDICATOR_TYPES = frozenset([

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016

Member

=> TYPE_INDICATORS

class ArtifactsReader(BaseArtifactsReader):
def __init__(self):
self.supported_os = definitions.SUPPORTED_OS

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016

Member
  • call to super()
class ArtifactsReader(BaseArtifactsReader):
def __init__(self):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016

Member
  • docstring
"""
class ArtifactsReader(BaseArtifactsReader):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016

Member

docstring and while line

@@ -27,7 +27,7 @@ def testArtifactDefinitionsRegistry(self):
self.assertEqual(len(artifact_registry.GetDefinitions()), 7)
artifact_definition = artifact_registry.GetDefinitionByName('EventLogs')
self.assertNotEquals(artifact_definition, None)
self.assertNotEqual(artifact_definition, None)

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016

Member

=> assertIsNotNone()

@property
def attributes_definition(self):
"""Dictionary representation of the source attributes

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016

Member

no need for this white space.

@@ -40,6 +40,13 @@ def type_indicator(self):
u'Invalid source type missing type indicator.')
return self.TYPE_INDICATOR
@property
def attributes_definition(self):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016

Member

Why is this a property?

This comment has been minimized.

@pidydx

pidydx Jul 9, 2016

Contributor

Also no reason. I was actually going to implement it as a method at first and don't remember why I didn't.

"""
return {'names': self.names}

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016

Member

=> u'names'

class ArtifactsReader(BaseArtifactsReader):
def __init__(self):
self.supported_os = definitions.SUPPORTED_OS
self.top_level_keys = definitions.TOP_LEVEL_KEYS

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016

Member

why aren't these properties?

This comment has been minimized.

@pidydx

pidydx Jul 9, 2016

Contributor

No reason. I will change them to properties.

@joachimmetz

This comment has been minimized.

Member

joachimmetz commented Jul 9, 2016

Being able to add additional types would also be extremely useful, but at a minimum I need a list of valid types for data storage/searching in the artifact server.

Know that this will break portability of the artifact type. The reason for adding it will be that tools can implement their own for testing purposes. It is highly recommended that the artifact type is added to the main project.

@pidydx

This comment has been minimized.

Contributor

pidydx commented Jul 9, 2016

ArtifactReader split out and ArtifactWriter updated

import abc
import json
import yaml

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016

Member

style nit: + white line

"""Class that implements the artifacts reader interface."""
@abc.abstractmethod
def _ConvertArtifact(self, artifact):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016

Member

make this method name more explict e.g. CopyArtifactDefitionToDict also why is this method needed? Wouldn't it make more sense to add a CopyToDist() method to ArtifactDefinition ?

This comment has been minimized.

@pidydx

pidydx Jul 10, 2016

Contributor

Moving it to ArtifactDefinition as CopyToDict() is probably the better plan. I will do that.

@abc.abstractmethod
def WriteArtifacts(self, artifacts, filename):
"""Reads artifact definitions from a file.

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016

Member

Reads => Writes

"""
@abc.abstractmethod
def WriteArtifacts(self, artifacts, filename):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016

Member

make method name more explicit that you're writing to file (see reader)

@abc.abstractmethod
def FormatArtifacts(self, artifacts):
"""Formats artifact desired output format

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016

Member

Unclear to me what this method is supposed to do, it also seems to be only used by file-based writers.

This comment has been minimized.

@pidydx

pidydx Jul 10, 2016

Contributor

Internally yes. The Artificer server only uses FormatArtifacts to prepare YAML to serve rather than writing a file anywhere.

"""
class ArtifactWriter(BaseArtifactsWriter):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016

Member

+ docstring (repeat everywhere else)

artifact_definition['urls'] = artifact.urls
return artifact_definition
def WriteArtifacts(self, artifacts, filename):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016

Member
  • docsring
def WriteArtifacts(self, artifacts, filename):
with open(filename, 'w') as file_object:
file_object.write(self.FormatArtifacts(artifacts))

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016

Member

FormatArtifacts is not implemented?

return artifact_definition
def WriteArtifacts(self, artifacts, filename):
with open(filename, 'w') as file_object:

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016

Member

what about binary formats?

"""
if isinstance(artifacts, list):
artifact_definitions = [self._ConvertArtifact(artifact) for artifact in artifacts]

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016

Member

why not have _ConvertArtifact correctly deal with a list?

This comment has been minimized.

@pidydx

pidydx Jul 10, 2016

Contributor

I will have to look into this again with moving CopyToDict() to ArtifactDefinition. The main headache here is actually the YAML side with dump vs dump_all and this was mostly for parity. For JSON I could just as easily handle it returning a list of 1 dict instead of just the dict itself.

Returns:
formatted string of artifact_definition
"""

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016

Member

- white line (repeat everywhere else)

sources = []
for source in artifact.sources:
source_definition = {
'type': source.type_indicator,

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016

Member

please use explicit unicode string u''

@joachimmetz

This comment has been minimized.

Member

joachimmetz commented Jul 10, 2016

Can you elaborate what goal are you trying to accomplish with the writer?

@pidydx

This comment has been minimized.

Contributor

pidydx commented Jul 10, 2016

Currently the main purpose I use the writer for is exporting artifacts from the Artificer server in YAML format, but the general idea is being able to take an ArtifactDefinition that has been read from some arbitrary source and validated and write it back in an arbitrary format (typically YAML for file and JSON for API)

@pidydx

This comment has been minimized.

Contributor

pidydx commented Jul 10, 2016

Also using the Label/OS/etc tag filtering mentioned on Reader, but that is all still work in progress.

@joachimmetz

This comment has been minimized.

Member

joachimmetz commented Jul 10, 2016

Let me give some thought on how to address your use case.

@pidydx

This comment has been minimized.

Contributor

pidydx commented Jul 10, 2016

As an additional note, from one of the other discussions, having an ArtifactWriter could make it easier to deal sorting artifacts into their appropriate files, particularly as the collection grows.

@pidydx

This comment has been minimized.

Contributor

pidydx commented Jul 10, 2016

I think I got everything corrected now.

@@ -86,3 +86,40 @@ def AppendSource(self, type_indicator, attributes):
self.sources.append(source_object)
return source_object
def AsDict(self):
"""Converts artifact to dict from ArtifactDefinition

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

It does not convert since the original is preserved.

"""Represents an artifacts as a dictionary.

Returns:
A dictionary containing the artifact attributes.
"""

@@ -42,7 +42,7 @@ def type_indicator(self):
return self.TYPE_INDICATOR
@abc.abstractmethod
def CopyToDict(self):
def AsDict(self):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

Please make the docstring consistent across the different AsDict methods.

import json
import yaml
import sys
from artifacts.artifact import ArtifactDefinition

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

+ file line before local imports

class BaseArtifactsWriter(object):
"""Class that implements the artifacts reader interface."""

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

reader => writer

Args:
artifacts: list of ArtifactDefinition objects to be written

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

- white line

artifacts: an instance or list of ArtifactDefinition
Returns:
formatted string of artifact_definition

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

artifact_definition => artifact definition

"""Class that implements the artifacts reader interface."""
@abc.abstractmethod
def WriteArtifactsFile(self, artifacts, filename):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

Treat docstrings as regular text and add punctuation (repeat everywhere else)

Args:
artifacts: list of ArtifactDefinition objects to be written

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

- white line

Returns:
formatted string of artifact_definition
"""
if isinstance(artifacts, ArtifactDefinition):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

What is this check supposed to do?
Docstring does not says anything about accepting ArtifactDefinition.
Why does this deviate from the interface you've defined?

This comment has been minimized.

@pidydx

pidydx Jul 25, 2016

Contributor

That is what I mean by an instance or list of ArtifactDefinition. If you pass an instance it turns it into a list similar to how GRR handles similar cases. I can change the docstring to be more clear or just force it to only accept lists.

This comment has been minimized.

@joachimmetz

joachimmetz Jul 25, 2016

Member

Please don't, stick with list if that is the required input and add FormatArtifact if you want it for a single artifact definition. Let's make a clear interface, and don't take GRR too much as an example.

from artifacts import writer
class ArtifactAsDictTest(unittest.TestCase):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

Why test the AsDict function here separately? Why not make it a test part of a Writer test case ?

This comment has been minimized.

@pidydx

pidydx Jul 25, 2016

Contributor

I set this up to test that AsDict gets you the same dict as reading the artifact using yaml alone, then I test to make sure all the defined artifacts can be converted without raising.

The Writer tests are ensuring that, while using AsDict, they can convert to string output and read that string back in without losing anything.

This comment has been minimized.

@joachimmetz

joachimmetz Jul 25, 2016

Member

TestCase is a collection of tests. at the moment these tests seem out of place. I opt to move them to the reader tests.

sources = []
for source in self.sources:
source_definition = {
'type': source.type_indicator,

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

Make sure all your text string are unicode safe either with u'' or with importing the right future

artifact_definitions = [artifact.AsDict() for artifact in artifacts]
json_data = json.dumps(artifact_definitions)
if sys.version_info <= (3, 0):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

Solve this more elegant what if the json data is encoded?

artifact_definitions = [artifact.AsDict() for artifact in artifacts]
json_data = json.dumps(artifact_definitions)
if sys.version_info <= (3, 0):
json_data = unicode(json_data)

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

unicode() is not portable and does not handle encodings, please don't use

add something like: https://github.com/log2timeline/plaso/blob/master/plaso/lib/py2to3.py

if isinstance(json_data, py2to3.BYTES_TYPE):
  try:
    return json_data.decode('ascii')
  except UnicodeDecodeError:
     ...
error_location = u'At start'
if last_artifact_definition:
error_location = u'After: {0}'.format(last_artifact_definition.name)
self.fail(u'{0} failed to convert to dict'.format(error_location))

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

what are you testing here? Can you rewrite this test for it to become more clear from the code?

This comment has been minimized.

@pidydx

pidydx Jul 25, 2016

Contributor

This tests that all artifacts in definitions can be converted to dict w/o raising. All the other tests use the test definitions.

This comment has been minimized.

@pidydx

pidydx Jul 25, 2016

Contributor

Related, should this just be except: instead of except errors.FormatError since the test is that it should be able to run AsDict on everything in definitions and not fail for any reason.

This comment has been minimized.

@joachimmetz

joachimmetz Jul 25, 2016

Member

If a test raises it will fail no need to add the self.fail()

This comment has been minimized.

@pidydx

pidydx Jul 26, 2016

Contributor

I used self.fail as part of tracking the last successful artifact so the message points at exactly which artifact caused things to go wrong.

file_object = io.StringIO(initial_value=artifacts_yaml)
converted_artifact_definitions = list(artifact_reader.ReadFileObject(
file_object))
if sys.version_info >= (3, 0):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016

Member

Find an alternative solution for all the sys.version_info >= (3, 0) checks.

What are you trying to test here that is not cross version portable?

@joachimmetz

This comment has been minimized.

Member

joachimmetz commented Jul 24, 2016

@pidydx left some comment, all the check for python 2 versus 3 looks very fragile at the moment. These can be solved in different ways by checking the data at hand not the python version.

try:
return json_data.decode('ascii')
except UnicodeDecodeError:
pass

This comment has been minimized.

@pidydx

pidydx Jul 25, 2016

Contributor

@joachimmetz I just put pass here as a place holder for the moment. What is the best thing to do in this case? It seems like letting it just raise the UnicodeDecodeError rather than catching it makes the most sense.

This comment has been minimized.

@destijl

destijl Jul 28, 2016

Member

Add a comment about why this is necessary. I think you just need to make the error message as informative as possible (probably include the json data?) and raise as formaterror.

@pidydx

This comment has been minimized.

Contributor

pidydx commented Jul 26, 2016

This should be mostly cleaned up now. I need some feedback on the UnicodeDecodeError issue, but the rest of it should be correct now.

@pidydx

This comment has been minimized.

Contributor

pidydx commented Jul 29, 2016

Everything here should be ready now.

test_file = os.path.join('test_data', 'definitions.yaml')
artifact_definitions = list(artifact_reader.ReadFile(test_file))
artifacts_yaml = artifact_writer.FormatArtifacts(artifact_definitions)

This comment has been minimized.

@destijl

destijl Aug 2, 2016

Member

your ascii conversion problem is with stringIO right?

Maybe just use an actual temp file and that way you can also exercise the actual file writer code

with tempfile.TemporaryFile() as converted:
  write yaml to converted
  do checks

This comment has been minimized.

@joachimmetz

joachimmetz Aug 2, 2016

Member

FYI io.stringIO is unicode and io.bytesIO is binary.

This comment has been minimized.

@pidydx

pidydx Aug 2, 2016

Contributor

So I pushed a change to remove the py2to3 stuff in favor of StringIO/BytesIO if you would prefer to merge it that way, but I will probably switch this to actually doing the temp file if that is acceptable.

I only did all of this juggling because I assumed it would be better to not be writing/deleting temp files as part of unit tests, but if that isn't a concern then I can do it that way.

@pidydx

This comment has been minimized.

Contributor

pidydx commented Aug 3, 2016

@destijl Good call on exercising the file write. The juggling on writing changes introduced an error in py3. Latest commit fixes that and uses tempfile for testing the Yamlwriter. The JSON stuff doesn't need the io/file stuff and there is no JsonArtifactReader to really properly test reading from a json file anyways.

This should be ready now. Let me know if there are any other necessary changes.

artifact_writer = writer.JsonArtifactsWriter()
test_file = os.path.join('test_data', 'definitions.yaml')
artifact_definitions = list(artifact_reader.ReadFile(test_file))

This comment has been minimized.

@destijl

destijl Aug 3, 2016

Member

Are you doing this to just avoid creating a temp file? It's confusing to see almost identical code tested quite differently. I don't think creating a tempfile is a problem given it's standard python, and it cleans up for you automatically.

This comment has been minimized.

@pidydx

pidydx Aug 8, 2016

Contributor

The reason for this is a lack of JsonArtifactReader. I can write a json file to disk, but then the only code in the project that actually reads an artifact definition from a json file would be in the test so I just skipped the writing piece.

Because the yaml.safe_load_all() operates differently than json.loads() reading with 'rb' for json.loads() causes more py2/3 issues in writing a new JsonArtifactReader for a format that isn't supposed to really be on disk as JSON anyways.

This comment has been minimized.

@destijl

destijl Aug 8, 2016

Member

Half-implementing JSON support is not ideal. I assume you do this because your GUI code will consume it? I think if you write a JSON writer, others users of the library will expect to find a reader.

This comment has been minimized.

@pidydx

pidydx Aug 8, 2016

Contributor

Modified to add a JsonArtifactReader

This comment has been minimized.

@destijl

destijl Aug 8, 2016

Member

I think those two test methods are now identical: combine into one class with a helper method and just pass in the relevant reader/writer object.

This comment has been minimized.

@pidydx

pidydx Aug 8, 2016

Contributor

Updated to combined class.

@destijl

This comment has been minimized.

Member

destijl commented Aug 8, 2016

LGTM. I don't see anything outstanding from @joachimmetz so I'm going to merge.

@destijl destijl merged commit fdf5c51 into ForensicArtifacts:master Aug 8, 2016

1 of 2 checks passed

code-review/reviewable 7 files, 40 discussions left (destijl, joachimmetz, pidydx)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

joachimmetz added a commit that referenced this pull request May 13, 2017

Create ArtifactWriter classes for writing ArtifactDefinitions #178
* Added JSON support for reading/writing artifacts.

joachimmetz added a commit that referenced this pull request May 14, 2017

Create ArtifactWriter classes for writing ArtifactDefinitions #178
* Added JSON support for reading/writing artifacts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment