Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifact writer #178

Merged
merged 3 commits into from Aug 8, 2016
Merged

Artifact writer #178

merged 3 commits into from Aug 8, 2016

Conversation

@pidydx
Copy link
Contributor

@pidydx pidydx commented Jul 9, 2016

These are changes I have made while working on building a ForensicArtifact server (more details soon).

I have heard there may be a lot of changes to this library coming soon so I wanted to get this up for initial review. I still need to work out testing for the ArtifactWriter classes.

This PR fixes: #171

This PR is related to: #175
I was actually going to implement the exact solution in the referenced PR and then found it was already waiting in a PR so I just left it at my minimal necessary changes for reference. Being able to add additional types would also be extremely useful, but at a minimum I need a list of valid types for data storage/searching in the artifact server.

I also made changes to ArtifactReader that allow overriding the definitions for similar reasons of being able to easily extend labels.


This change is Reviewable

@joachimmetz
Copy link
Member

@joachimmetz joachimmetz commented Jul 9, 2016

I've split this PR in 4

  • fixes - merged
  • changes to source type and definitions - merged
  • changes to reader - please move to a separate CL, it is unclear to me what the changes try to accomplish
  • changes to writer - please update this CL
@@ -12,6 +12,18 @@
TYPE_INDICATOR_WINDOWS_REGISTRY_VALUE = 'REGISTRY_VALUE'
TYPE_INDICATOR_WMI_QUERY = 'WMI'

INDICATOR_TYPES = frozenset([

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016
Member

=> TYPE_INDICATORS


class ArtifactsReader(BaseArtifactsReader):
def __init__(self):
self.supported_os = definitions.SUPPORTED_OS

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016
Member

  • call to super()


class ArtifactsReader(BaseArtifactsReader):
def __init__(self):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016
Member

  • docstring
"""


class ArtifactsReader(BaseArtifactsReader):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016
Member

docstring and while line

@@ -27,7 +27,7 @@ def testArtifactDefinitionsRegistry(self):
self.assertEqual(len(artifact_registry.GetDefinitions()), 7)

artifact_definition = artifact_registry.GetDefinitionByName('EventLogs')
self.assertNotEquals(artifact_definition, None)
self.assertNotEqual(artifact_definition, None)

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016
Member

=> assertIsNotNone()

@property
def attributes_definition(self):
"""Dictionary representation of the source attributes

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016
Member

no need for this white space.

@@ -40,6 +40,13 @@ def type_indicator(self):
u'Invalid source type missing type indicator.')
return self.TYPE_INDICATOR

@property
def attributes_definition(self):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016
Member

Why is this a property?

This comment has been minimized.

@pidydx

pidydx Jul 9, 2016
Author Contributor

Also no reason. I was actually going to implement it as a method at first and don't remember why I didn't.

"""

return {'names': self.names}

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016
Member

=> u'names'

class ArtifactsReader(BaseArtifactsReader):
def __init__(self):
self.supported_os = definitions.SUPPORTED_OS
self.top_level_keys = definitions.TOP_LEVEL_KEYS

This comment has been minimized.

@joachimmetz

joachimmetz Jul 9, 2016
Member

why aren't these properties?

This comment has been minimized.

@pidydx

pidydx Jul 9, 2016
Author Contributor

No reason. I will change them to properties.

@joachimmetz
Copy link
Member

@joachimmetz joachimmetz commented Jul 9, 2016

Being able to add additional types would also be extremely useful, but at a minimum I need a list of valid types for data storage/searching in the artifact server.

Know that this will break portability of the artifact type. The reason for adding it will be that tools can implement their own for testing purposes. It is highly recommended that the artifact type is added to the main project.

@pidydx pidydx force-pushed the pidydx:ArtifactWriter branch from 6d7bff0 to f78208f Jul 9, 2016
@pidydx
Copy link
Contributor Author

@pidydx pidydx commented Jul 9, 2016

ArtifactReader split out and ArtifactWriter updated

import abc
import json
import yaml

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016
Member

style nit: + white line

"""Class that implements the artifacts reader interface."""

@abc.abstractmethod
def _ConvertArtifact(self, artifact):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016
Member

make this method name more explict e.g. CopyArtifactDefitionToDict also why is this method needed? Wouldn't it make more sense to add a CopyToDist() method to ArtifactDefinition ?

This comment has been minimized.

@pidydx

pidydx Jul 10, 2016
Author Contributor

Moving it to ArtifactDefinition as CopyToDict() is probably the better plan. I will do that.


@abc.abstractmethod
def WriteArtifacts(self, artifacts, filename):
"""Reads artifact definitions from a file.

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016
Member

Reads => Writes

"""

@abc.abstractmethod
def WriteArtifacts(self, artifacts, filename):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016
Member

make method name more explicit that you're writing to file (see reader)


@abc.abstractmethod
def FormatArtifacts(self, artifacts):
"""Formats artifact desired output format

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016
Member

Unclear to me what this method is supposed to do, it also seems to be only used by file-based writers.

This comment has been minimized.

@pidydx

pidydx Jul 10, 2016
Author Contributor

Internally yes. The Artificer server only uses FormatArtifacts to prepare YAML to serve rather than writing a file anywhere.

"""


class ArtifactWriter(BaseArtifactsWriter):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016
Member

+ docstring (repeat everywhere else)

artifact_definition['urls'] = artifact.urls
return artifact_definition

def WriteArtifacts(self, artifacts, filename):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016
Member

  • docsring

def WriteArtifacts(self, artifacts, filename):
with open(filename, 'w') as file_object:
file_object.write(self.FormatArtifacts(artifacts))

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016
Member

FormatArtifacts is not implemented?

return artifact_definition

def WriteArtifacts(self, artifacts, filename):
with open(filename, 'w') as file_object:

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016
Member

what about binary formats?

"""

if isinstance(artifacts, list):
artifact_definitions = [self._ConvertArtifact(artifact) for artifact in artifacts]

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016
Member

why not have _ConvertArtifact correctly deal with a list?

This comment has been minimized.

@pidydx

pidydx Jul 10, 2016
Author Contributor

I will have to look into this again with moving CopyToDict() to ArtifactDefinition. The main headache here is actually the YAML side with dump vs dump_all and this was mostly for parity. For JSON I could just as easily handle it returning a list of 1 dict instead of just the dict itself.

Returns:
formatted string of artifact_definition
"""

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016
Member

- white line (repeat everywhere else)

sources = []
for source in artifact.sources:
source_definition = {
'type': source.type_indicator,

This comment has been minimized.

@joachimmetz

joachimmetz Jul 10, 2016
Member

please use explicit unicode string u''

@joachimmetz
Copy link
Member

@joachimmetz joachimmetz commented Jul 10, 2016

Can you elaborate what goal are you trying to accomplish with the writer?

@pidydx
Copy link
Contributor Author

@pidydx pidydx commented Jul 10, 2016

Currently the main purpose I use the writer for is exporting artifacts from the Artificer server in YAML format, but the general idea is being able to take an ArtifactDefinition that has been read from some arbitrary source and validated and write it back in an arbitrary format (typically YAML for file and JSON for API)

@pidydx
Copy link
Contributor Author

@pidydx pidydx commented Jul 10, 2016

Also using the Label/OS/etc tag filtering mentioned on Reader, but that is all still work in progress.

@joachimmetz
Copy link
Member

@joachimmetz joachimmetz commented Jul 10, 2016

Let me give some thought on how to address your use case.

@pidydx
Copy link
Contributor Author

@pidydx pidydx commented Jul 10, 2016

As an additional note, from one of the other discussions, having an ArtifactWriter could make it easier to deal sorting artifacts into their appropriate files, particularly as the collection grows.

sources = []
for source in self.sources:
source_definition = {
'type': source.type_indicator,

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016
Member

Make sure all your text string are unicode safe either with u'' or with importing the right future


artifact_definitions = [artifact.AsDict() for artifact in artifacts]
json_data = json.dumps(artifact_definitions)
if sys.version_info <= (3, 0):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016
Member

Solve this more elegant what if the json data is encoded?

artifact_definitions = [artifact.AsDict() for artifact in artifacts]
json_data = json.dumps(artifact_definitions)
if sys.version_info <= (3, 0):
json_data = unicode(json_data)

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016
Member

unicode() is not portable and does not handle encodings, please don't use

add something like: https://github.com/log2timeline/plaso/blob/master/plaso/lib/py2to3.py

if isinstance(json_data, py2to3.BYTES_TYPE):
  try:
    return json_data.decode('ascii')
  except UnicodeDecodeError:
     ...
error_location = u'At start'
if last_artifact_definition:
error_location = u'After: {0}'.format(last_artifact_definition.name)
self.fail(u'{0} failed to convert to dict'.format(error_location))

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016
Member

what are you testing here? Can you rewrite this test for it to become more clear from the code?

This comment has been minimized.

@pidydx

pidydx Jul 25, 2016
Author Contributor

This tests that all artifacts in definitions can be converted to dict w/o raising. All the other tests use the test definitions.

This comment has been minimized.

@pidydx

pidydx Jul 25, 2016
Author Contributor

Related, should this just be except: instead of except errors.FormatError since the test is that it should be able to run AsDict on everything in definitions and not fail for any reason.

This comment has been minimized.

@joachimmetz

joachimmetz Jul 25, 2016
Member

If a test raises it will fail no need to add the self.fail()

This comment has been minimized.

@pidydx

pidydx Jul 26, 2016
Author Contributor

I used self.fail as part of tracking the last successful artifact so the message points at exactly which artifact caused things to go wrong.

file_object = io.StringIO(initial_value=artifacts_yaml)
converted_artifact_definitions = list(artifact_reader.ReadFileObject(
file_object))
if sys.version_info >= (3, 0):

This comment has been minimized.

@joachimmetz

joachimmetz Jul 24, 2016
Member

Find an alternative solution for all the sys.version_info >= (3, 0) checks.

What are you trying to test here that is not cross version portable?

@joachimmetz
Copy link
Member

@joachimmetz joachimmetz commented Jul 24, 2016

@pidydx left some comment, all the check for python 2 versus 3 looks very fragile at the moment. These can be solved in different ways by checking the data at hand not the python version.

try:
return json_data.decode('ascii')
except UnicodeDecodeError:
pass

This comment has been minimized.

@pidydx

pidydx Jul 25, 2016
Author Contributor

@joachimmetz I just put pass here as a place holder for the moment. What is the best thing to do in this case? It seems like letting it just raise the UnicodeDecodeError rather than catching it makes the most sense.

This comment has been minimized.

@destijl

destijl Jul 28, 2016
Member

Add a comment about why this is necessary. I think you just need to make the error message as informative as possible (probably include the json data?) and raise as formaterror.

@pidydx pidydx force-pushed the pidydx:ArtifactWriter branch 3 times, most recently from 1a82b26 to 15ad168 Jul 26, 2016
@pidydx
Copy link
Contributor Author

@pidydx pidydx commented Jul 26, 2016

This should be mostly cleaned up now. I need some feedback on the UnicodeDecodeError issue, but the rest of it should be correct now.

@pidydx pidydx force-pushed the pidydx:ArtifactWriter branch 3 times, most recently from 7bcab9f to 177c044 Jul 27, 2016
@pidydx
Copy link
Contributor Author

@pidydx pidydx commented Jul 29, 2016

Everything here should be ready now.

test_file = os.path.join('test_data', 'definitions.yaml')

artifact_definitions = list(artifact_reader.ReadFile(test_file))
artifacts_yaml = artifact_writer.FormatArtifacts(artifact_definitions)

This comment has been minimized.

@destijl

destijl Aug 2, 2016
Member

your ascii conversion problem is with stringIO right?

Maybe just use an actual temp file and that way you can also exercise the actual file writer code

with tempfile.TemporaryFile() as converted:
  write yaml to converted
  do checks

This comment has been minimized.

@joachimmetz

joachimmetz Aug 2, 2016
Member

FYI io.stringIO is unicode and io.bytesIO is binary.

This comment has been minimized.

@pidydx

pidydx Aug 2, 2016
Author Contributor

So I pushed a change to remove the py2to3 stuff in favor of StringIO/BytesIO if you would prefer to merge it that way, but I will probably switch this to actually doing the temp file if that is acceptable.

I only did all of this juggling because I assumed it would be better to not be writing/deleting temp files as part of unit tests, but if that isn't a concern then I can do it that way.

@pidydx pidydx force-pushed the pidydx:ArtifactWriter branch 2 times, most recently from d8beb1e to 43743ec Aug 2, 2016
@pidydx
Copy link
Contributor Author

@pidydx pidydx commented Aug 3, 2016

@destijl Good call on exercising the file write. The juggling on writing changes introduced an error in py3. Latest commit fixes that and uses tempfile for testing the Yamlwriter. The JSON stuff doesn't need the io/file stuff and there is no JsonArtifactReader to really properly test reading from a json file anyways.

This should be ready now. Let me know if there are any other necessary changes.

artifact_writer = writer.JsonArtifactsWriter()
test_file = os.path.join('test_data', 'definitions.yaml')

artifact_definitions = list(artifact_reader.ReadFile(test_file))

This comment has been minimized.

@destijl

destijl Aug 3, 2016
Member

Are you doing this to just avoid creating a temp file? It's confusing to see almost identical code tested quite differently. I don't think creating a tempfile is a problem given it's standard python, and it cleans up for you automatically.

This comment has been minimized.

@pidydx

pidydx Aug 8, 2016
Author Contributor

The reason for this is a lack of JsonArtifactReader. I can write a json file to disk, but then the only code in the project that actually reads an artifact definition from a json file would be in the test so I just skipped the writing piece.

Because the yaml.safe_load_all() operates differently than json.loads() reading with 'rb' for json.loads() causes more py2/3 issues in writing a new JsonArtifactReader for a format that isn't supposed to really be on disk as JSON anyways.

This comment has been minimized.

@destijl

destijl Aug 8, 2016
Member

Half-implementing JSON support is not ideal. I assume you do this because your GUI code will consume it? I think if you write a JSON writer, others users of the library will expect to find a reader.

This comment has been minimized.

@pidydx

pidydx Aug 8, 2016
Author Contributor

Modified to add a JsonArtifactReader

This comment has been minimized.

@destijl

destijl Aug 8, 2016
Member

I think those two test methods are now identical: combine into one class with a helper method and just pass in the relevant reader/writer object.

This comment has been minimized.

@pidydx

pidydx Aug 8, 2016
Author Contributor

Updated to combined class.

@pidydx pidydx force-pushed the pidydx:ArtifactWriter branch from cc19078 to 1c80af7 Aug 8, 2016
@pidydx pidydx force-pushed the pidydx:ArtifactWriter branch from 7a35330 to 2d6bfb3 Aug 8, 2016
@destijl
Copy link
Member

@destijl destijl commented Aug 8, 2016

LGTM. I don't see anything outstanding from @joachimmetz so I'm going to merge.

@destijl destijl merged commit fdf5c51 into ForensicArtifacts:master Aug 8, 2016
1 of 2 checks passed
1 of 2 checks passed
code-review/reviewable 7 files, 40 discussions left (destijl, joachimmetz, pidydx)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
joachimmetz added a commit that referenced this pull request May 13, 2017
* Added JSON support for reading/writing artifacts.
joachimmetz added a commit that referenced this pull request May 14, 2017
* Added JSON support for reading/writing artifacts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants