# Windows Registry Plugin Code Lab

## Intended Audience

This lab is for people who want to learn how to write and execute a Plaso plugin in Python.  This tutorial assumes:

+  You have a functional development environment
+  You have used Plaso
+  You are familiar with the Python programming language
+  You are looking to write a plugin (as an opposed to a parser, which is covered in a [separate codelab](http://plaso.kiddaland.net/developer/code-lab/text-parser))

## Objective

This lab will teach you how to write a Windows Registry plugin with tests for the Plaso framework.  By the end you will be able to:

+ Write a Windows Registry plugin for plaso
+ Write unit tests for the plugin
+ Run the plugin as part of plaso/log2timeline

## Expectations

This lab should take you a couple hours to complete.  Some of this is dependent on strange build issues you might have.  We are not attempting to get you to check in code yet, this is more to demonstrate how a plugin is written. For this to be a checked in plugin you need to write a plugin against a registry key/value that is not already parsed and split the code here into several files (layout explained below).

## Introduction

Welcome to writing a Plaso plugin!  From the outside, writing a plugin can be daunting, but once you get your dev environment going, you've fought half the battle.  This code lab features a simple Windows Registry plugin, but the formula can be used for any type of plugins (another codelab will demonstrate a SQLite database plugin and the hope is that with these two codelabs we should have the plugin writing more or less covered). You may also be interested in the text parser codelab.

## Before Starting

Get familiar with the [developers guide](https://github.com/log2timeline/plaso/wiki/Developers-Guide) and more specifically the [style guide](https://github.com/log2timeline/plaso/wiki/Style-guide). To make the code easier to maintain we follow a style guide, partially based on the [Google Python Style Guide](http://google-styleguide.googlecode.com/svn/trunk/pyguide.html) but slightly modified to fit our needs.

We also follow a code review process that is discussed on the [code review](https://github.com/log2timeline/plaso/wiki/Codereview) site.

This is an iPython notebook, and if you are not familiar with it then here is the brief introduction. This is basically an iPython shell wrapped up in a pretty GUI (browser window). You can execute any Python code you wish, and quickly go back, edit and re-run code. To run the code, click the window with the code segment and press "Shift+Enter", that way you will see that the bracket on the left will change to indicate it has been executed and you may see some output below (if the code segment produced any output).

One thing to make note of is that some of the code segments depend on previous code segments having been executed. So in order for this codelab to work properly you need to execute **EVERY** code segment that is presented here, especially all class declarations and import statements, but to be sure just execute them all [except those explicitly stated as **optional**].

The first thing we need to do is to make sure your development environment is up-to date. Run the following code snippet (below) by clicking the tab and pressing "**SHIFT-ENTER**". If you'll see a printed warning start by upgrading the tool before continuing.

In [None]:
import inspect

# Let's put this in a method so we can easily call it from other parts of the codelab.
def PrintClassHelp(class_object, filter_string=''):
  """Prints a help string for a given class object.

  Args:
    class_object: The class that we are about to inspect.
    filter_string: Filter class members that start with a particular string.
  """
  # Print the docstring of the class.
  print u''
  print class_object.__doc__
   
  # Print information for every member function.
  additional_members = []
  for member_name, member_value in inspect.getmembers(class_object):
    # Check to see if we are filtering out members starting with
    # a particular string.
    if filter_string and not member_name.startswith(filter_string):
      continue
    if inspect.ismethod(member_value):
      args = inspect.getargspec(member_value)
      doc_string = member_value.__doc__
        
      print u'{0}{1:>20s}({2}){0:>10}\n\n{3}\n{4}\n\n'.format(
          '*'*5, member_name, u','.join(args.args), doc_string, '-'*80)
    else:
      if member_name.startswith('_'):
        continue
      if member_name in ['classes', 'parent_class', 'plugin_feature', 'top_level_class']:
        continue
        
      additional_members.append(u'{} = {}'.format(member_name, repr(member_value)))

  if additional_members:
    print '\n'
    print '*** Additional Members of Class ***\n\n ',
    print u'\n  '.join(additional_members)


## Disclaimer

During this codelab we will be using the iPython notebook interface for everything, which means we have all the classes and code in a single file. Once we deploy the code to the actual codebase we would need to save the code in several places, typically something like:

+ plaso/parsers/winreg_plugins/myplugin.py
+ tests/parsers/winreg_plugins/myplugin.py
+ plaso/formatters/myplugin.py [this is likely not needed in the case of a Windows Registry plugin]
+ tests/formatters/myplugin.py [only if you created a new formatter]

And make necessary changes to:
+ plaso/parsers/\_\_init\_\_.py
+ plaso/formatters/\_\_init\_\_.py

We may also want to change the **plaso/frontend/presets.py** to include the plugin in a preset [this depends on the plugin itself, sometimes you want to include the parser and all its plugins, sometimes a specific plugin -- which is the case here, you don't want to specifically load one registry plugin; you want to load them all typically].

We are however omitting all these details to make the codelab easier to follow along. This can also be used for people to test their plugins and play with them without the need to mess with the codebase and once the plugin is fully functional then create the necessary files and start the code review process.

There are also a lot of comments in the code in this codelab that would typically be omitted from a released plugin. To see the actual code that is used as an example here click on one of the below links:

+ [plaso/parsers/winreg_plugins/lfu.py](https://github.com/log2timeline/plaso/tree/master/plaso/parsers/winreg_plugins/lfu.py)
+ [plaso/parsers/winreg_plugins/lfu_test.py](https://github.com/log2timeline/plaso/tree/master/plaso/parsers/winreg_plugins/lfu_test.py)

## Writing the Plugin

We are going to write the plugin completely in this iPython notebook, and test it there too. There is no need for anything else than this notebook, a sample registry file and the plaso libs available.

Before writing a plugin, and now we are assuming we are attempting parse a particular registry key, ask yourself these questions:

+ Examine the registry key itself. What values does it have?
+ Are there any subkeys that need to be included?
+ How are the timestamps formatted? Is the only timestamp the the last time the registry key was modified or are there other timestamps embedded in the content of one of the keys/values?
+ Are there any binary values that need to be intepreted?
+ Are there any keys/values that need some decoding?
+ What does the registry's key last written time mean? What does it signify.

Remember that we are not about to submit this plugin in for review, since it is already checked in, this is only for demonstration purposes, please refer to the [plaso roadmap](https://docs.google.com/a/kiddaland.net/spreadsheet/ccc?key=0An0H7z4S52FldFFuQ2tHRDRsaEdzeDRBVXN5SXNfRnc#gid=0) for open parser/plugin assignments (or add your own).

Before we start looking at the code we need to download the registry hive to a temporary location so that we can use it for the remainder of this codelab. For this you need an Internet connection.

In [None]:
# Import a library to make the HTTP connection.
import urllib2

# Import a library so that we can create a temporary file.
import tempfile

# The URL to the SYSTEM hive we are about to use for our testing.
url = u'https://github.com/log2timeline/plaso/raw/master/test_data/SYSTEM'

# Download the file.
response = urllib2.urlopen(url)
data = response.read()

# Save it in a temporary file (we don't want it to be deleted).
test_file = tempfile.NamedTemporaryFile(delete=False)

# Save the name since that is what we will refer to later in the code.
test_registry_file_name = test_file.name

# Write data to it.
test_file.write(data)

# Close the file.
test_file.close()

# Print a confirmation.
print u'File downloaded and saved'

### The header

First things first, every file checked into the project needs a header. That header contains among other a docstring as well as import statements.

The first line should be an encoding defintion. After that there is a doc string that needs to be created, it should not be longer than 80 characters in width. If you need more than a single line to describe the parser please still only use max 80 characters as the first line, ending with a dot. Then you can create a more detailed
description two lines down (an example of that can be seen below).

The import order is defined in the [style guide](http://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Imports_formatting#Imports_formatting):

    Imports are always put at the top of the file, just after any module comments and doc strings and before module globals and constants. 
    Imports should be grouped with the order being most generic to least generic:

    + standard library imports
    + third-party imports
    + application-specific imports

    Within each grouping, imports should be sorted lexicographically, ignoring case, according to each module's full package path.


In [None]:
# -*- coding: utf-8 -*-
"""Plug-in to collect the Less Frequently Used Keys.

This demonstrates how a longer line should be broken up."""

import logging

# We need to be able to create new event objects, specifically
# for a Windows registry event we would like to get access to the 
# WindowsRegistryEvent EventObject.
from plaso.events import windows_events
# Import the registry plugin interface.
from plaso.parsers.winreg_plugins import interface

  
### The Plugin Class

We know need to know what kind of plugin you are trying to implement. For now we know we are trying to parse a specific Windows Registry key using a registry plugin. If we look at the [registry plugin interfce](https://plaso-api.readthedocs.org/en/latest/plaso.parsers.winreg_plugins.html#module-plaso.parsers.winreg_plugins.interface) we notice that there are a bunch of different sorts of registry filters, and a plugin interface that we need to implement. 

For LFU, we know the name of the key, so we can use the [WindowsRegistryKeyPathFilter](https://plaso-api.readthedocs.org/en/latest/plaso.parsers.winreg_plugins.html#plaso.parsers.winreg_plugins.interface.WindowsRegistryKeyPathFilter). The specific key we're looking for is in the current control set of the SYSTEM hive, inside the key: \CURRENT_CONTROL\Control\Session Manager. And since we do have a fixed key it makes sense to use the key based plugin to parse it. Let's look at some of the other things we need to consider.

We need to examine the registry key itself and its values. We need the file and some way of examining the content. Let's open up the registry file for future use.

In [None]:
# Import necessary libraries from dfVFS so we can open up the file.
from dfvfs.lib import definitions
from dfvfs.path import factory as path_spec_factory
from dfvfs.resolver import resolver as path_spec_resolver

# Import the Windows Registry library so we can calculate
# the current control set among other things.
from plaso.dfwinreg import registry as winregistry

# Find the file and get a handle to it.
path_spec = path_spec_factory.Factory.NewPathSpec(
    definitions.TYPE_INDICATOR_OS, location=test_registry_file_name)

# Open the registry file.
registry_reader = winregistry.PathSpecWinRegistryFileReader(path_spec)
registry = winregistry.WinRegistry(registry_file_reader=registry_reader)

lfu_key = registry.GetKeyByPath('HKEY_LOCAL_MACHINE\\System\\CurrentControlSet\\Control\\Session Manager')
print lfu_key.path

Plaso's registry library (dfwinreg) tries to create a uniform way to access the registry, doing various fixups to enable bulk processing. There's a few things going on here that a little bit non-obvious:

+ Even though we fed the Plaso registry library a SYSTEM registry hive file, we still specified the full path to the registry key as it would appear on a running Windows system.
+ We specified CurrentControlSet, which doesn't actually exist in the SYSTEM registry hive file, but does when Windows is running. Plaso helpfully worked out what the CurrentControlSet would be, and return the appropriate key.

Now that we have the key, let's take a look at what information is stored in it.

In [None]:
lfu_key = registry.GetKeyByPath('HKEY_LOCAL_MACHINE\\System\\ControlSet001\\Control\\Session Manager')

print u'Key has {} values.'.format(lfu_key.number_of_values)
print u'Key has {} subkeys.'.format(lfu_key.number_of_subkeys)

print u'Values:'
for value in lfu_key.GetValues():
    print u'  {} = {}'.format(value.name, value.data)

print u'*'*80
print u'Sub keys:'
for sub_key in lfu_key.GetSubkeys():
    print u'  {} [{} keys, {} values]'.format(sub_key.path, sub_key.number_of_values, sub_key.number_of_subkeys)

There is a value there called **BootExecute**, and that is the value that we are interested in, see [this Windows documentation](http://technet.microsoft.com/en-us/library/cc963230.aspx).

We don't need any subkeys for this particular registry plugin. There is no timestamp stored in the data set; the only timestamp we are interested in is the last written time of the registry key itself.

We should be ready to start writing the plugin itself, so we can start going over the actual Windows Registry plugin interface and what needs to be implemented.

*One important disclaimer, since we are taking an already checked in plugin to use as an example, to avoid all namespace collitions we are appending the word "Foo" or "foo" to many of the class names and other fields.*

#### Important Class Constants

In plaso terms as soon as we've got more than a single "parser" that attempts to parse a particular file format we convert that to a plugin system. Then a very generic parser can be created that takes care of all file format parsing, leaving the plugins to do minimal work, just defining few class constants that are used to match the particular file or file segments to what the plugin is designed to parse and then a function to process the data collected.

For Windows Registry KeyPlugin we need to define the following class attributes:

+ **NAME**: Name of the plugin, this should be short, concise and unambiguous. 
+ **DESCRIPTION**: The longer form description of what the plugin supposed to do.
+ **FILTERS**: This is a list (technically a FrozenSet) of  filters which the parser will attempt to match against every registry key. When the filter matches, the registry parser will call the plugin to examine the key. The plugin does not need to define software redirects (like Wow6432, that's done automatically).
+ **URLS**: This is a list of URLs that can be used to read additional information about this particular registry key. This could be a link to Technet or some blogs discussing how to interpret the values (or both).

For our plugin's purposes we have these values:

    NAME = 'windows_boot_execute'
    DESCRIPTION = u'Parser for Boot Execution Registry data.

    FILTERS = frozenset([
      interface.WindowsRegistryKeyPathFilter(
          u'HKEY_LOCAL_MACHINE\\System\\CurrentControlSet\\Control\\'
          u'Session Manager')])
    URLS = ['http://technet.microsoft.com/en-us/library/cc963230.aspx']

One thing to note here is the creation of the filter object. It's pretty simple in this case, as we just need to tell it the path to the key we want to process, but there are also other filters for keys that start or end with a certain string, or that contain a certain value. If some registry data you're trying to parse can't be readily selected with these filters, adding one is pretty simple, just take a look at the [interface](https://github.com/log2timeline/plaso/blob/master/plaso/parsers/winreg_plugins/interface.py#L9)

####  GetEntries

The underlying parser takes care of opening up the registry hive, reading in every registry key there is and then trying to match that key with each plugin's registry filters, as defined in the plugins' FILTERS constants. 

When a matching key is found the GetEntries() function of the appropriate plugin is called. The purpose of this function is to extract all potential event objects available from that registry key.

There are a couple of objects passed to the GetEntries function, let's take a look at them now.

In [None]:
PrintClassHelp(interface.WindowsRegistryPlugin, 'GetEntries')

By convention, a method named "GetEntries" is defined by each for each plugin implementation (Registry, SQLite, Bencode etc.) as the method called to invoke a plugin against a particular file. To explain the arguments in a little more detail:

The parser_mediator object is an object passed to each plugin (and parser) that provides access to shared Plaso functionality, like the event output queue, system knowledge base and error signalling functionality.

The object called "key", is the registry key the plugin should attempt to parse, and we also need to catch other attributes that may be sent to the plugin but we are not using, thus the **\*\*unused_kwargs**.

The proper definition of the GetEntries function for a Windows registry plugin is therefore:

    def GetEntries(self, parser_mediator, registry_key **unused_kwargs):




In [None]:
from plaso.parsers import plugins

PrintClassHelp(plugins.BasePlugin, 'GetEntries')

#### Timestamp

When dealing with the Windows Registry the most common timestamp is the Windows FILETIME timestamp. To translate that into the internal microseconds since Epoch that Plaso uses we use the timelib library, and more specifically the **timelib.Timestamp.FromFiletime(file_time)**.

However, there may be other timestamps embedded in a key value. Let's examine what other options we've got.

In [None]:
# Import the library we are about to inspect.
from plaso.lib import timelib

# You can easily change the name of the class here if you want to explore a different
# class and it's members.
PrintClassHelp(timelib.Timestamp)

### Event Object - WindowsRegistryEvent

Each timestamped event is described as an EventObject. It's almost always easier to create a convenience class to make it easier to create an EventObject specific to the events that your plugin or parser is producing. However since we are writing a Windows Registry plugin we already have a convenience EventObject that we can take advantage of, called WindowsRegistryEvent.

Let's examine the WindowsRegistryEvent:

In [None]:
from plaso.events import windows_events

PrintClassHelp(windows_events.WindowsRegistryEvent, '__init__')

The event object is pretty simple really. We need to set the following arguments:

+ **filetime**: The timestamp, either of the registry key itself or extracted from the value itself.
+ **key_path**: The full path to the registry key, stored in the reg_key.path value.
+ **values_dict**: This is a dict object that contains all the text that we would like to be contained in the message string. By default the formatter prints this out as "key_1 = value_1, key_2 = value_2, etc" however this can be easily changed so the registry event is presented differently.
+ **usage**: This is the description of the meaning of the timestamp, by default this is LAST_WRITTEN.
+ **offset**: An offset in bytes into the registry file where this value is, typically this value does not need to be filled in.
+ **urls**: A list of URLs that provide more information or context about the event.
+ **source_append**: By default the source attribute is "REGISTRY_TYPE key", eg: "SOFTWARE key". However some plugins may want to add additional information to the source, further describing it, eg: "NTUSER key UserAssist" or something like that. This value is designed for that.

Typically a plugin only needs to worry about filling in these values:
+ key
+ value_dict
+ timestamp

And that's it really, we only need to worry about filling in the values for the class constants and then fill in the GetEntries function.

Here is the entire code for the BootExecutePlugin:

In [None]:
class BootExecutePluginFoo(interface.WindowsRegistryPlugin):
  """Plug-in to collect the BootExecute Value from the Session Manager key."""

  NAME = u'windows_boot_execute_foo'
  DESCRIPTION = u'Parser for Boot Execution Registry data.'

  FILTERS = frozenset([
      interface.WindowsRegistryKeyPathFilter(
          u'HKEY_LOCAL_MACHINE\\System\\CurrentControlSet\\Control\\'
          u'Session Manager')])

  URLS = [u'http://technet.microsoft.com/en-us/library/cc963230.aspx']

  def GetEntries(self, parser_mediator, registry_key, **kwargs):
    """Gather the BootExecute Value, compare to default, return event.

    The rest of the values in the Session Manager key are in a separate event.

    Args:
      parser_mediator: A parser mediator object (instance of ParserMediator).
      registry_key: A Windows Registry key (instance of
                    dfwinreg.WinRegistryKey).
    """
    values_dict = {}
    for value in registry_key.GetValues():
      if value.name == u'BootExecute':
        # MSDN: claims that the data type of this value is REG_BINARY
        # although REG_MULTI_SZ is known to be used as well.
        if value.DataIsString():
          value_string = value.data
        elif value.DataIsMultiString():
          value_string = u''.join(value.data)
        elif value.DataIsBinaryData():
          value_string = value.data
        else:
          value_string = u''
          error_string = (
              u'Key: {0:s}, value: {1:s}: unsupported value data type: '
              u'{2:s}.').format(
                  registry_key.path, value.name, value.data_type_string)
          parser_mediator.ProduceParseError(error_string)

        value_dict = {u'BootExecute': value_string}
        event_object = windows_events.WindowsRegistryEvent(
            registry_key.last_written_time, registry_key.path, value_dict,
            offset=registry_key.offset, urls=self.URLS)
        parser_mediator.ProduceEvent(event_object)

      else:
        values_dict[value.name] = value.data

    event_object = windows_events.WindowsRegistryEvent(
        registry_key.last_written_time, registry_key.path, values_dict,
        offset=registry_key.offset, urls=self.URLS)
    parser_mediator.ProduceEvent(event_object)
  
from plaso.parsers import winreg
winreg.WinRegistryParser.RegisterPlugin(BootExecutePluginFoo)

[**optional code segment**] If you make some modifications to the class above you need to first de-register it before you can register it again. To be able to de-register it use the code block below:

In [None]:
# OPTIONAL CODE BLOCK !! DON'T EXECUTE UNLESS YOU HAVE MADE SOME 
# CHANGES TO THE PARSER CODE AND WANT TO REGISTER IT AGAIN!!!
winreg.WinRegistryParser.DeregisterPlugin(BootExecutePluginFoo)

### Writing the Formatter

Have you ever noticed the message string when you print out an event?


In [None]:
import datetime

from plaso.formatters import manager as formatters_manager
from plaso.formatters import mediator as formatter_mediator

# Import the Windows Registry formatter.
from plaso.formatters import winreg

datetime_now = datetime.datetime.utcnow()

# Let's create a dummy event.
demo_event = windows_events.WindowsRegistryEvent(
    timelib.Timestamp.FromPythonDatetime(datetime_now),
    '\\dummy\\path\\key',
    {'foo': 'bar', 'stuff': 'more stuff'})

# And print the string.
print demo_event.GetString()

# And to re-iterate, let's print the message string.
formatter_mediator_object = formatter_mediator.FormatterMediator()
message_string, _ = formatters_manager.FormattersManager.GetMessageStrings(formatter_mediator_object, demo_event)

print u'MESSAGE STRING: {}'.format(message_string)

Did you notice that in the event above you never really told it how to construct this message string? How does the tool then know how to print it out?

That is the purpose of the formatter. The formatter is a simple class that defines what attributes to use and how they are put together to form this message string.

You're going to need one for any parser or plugin you create (or more importantly any data type that you might use).

The way the formatter works is that it looks at the data_type attribute in the EventObject and matches that to the formatters DATA_TYPE attribute. If they are the same, then the formatter proceeds to processing that EventObject and construct the messsage string.

Formatters go in separate files under plaso/formatters.  Don't forget the copyright at the top!  

For the most part, you're just setting some values with formats.  You'll want to set up structures that you want to see in your timeline.

Most importantly (to re-iterate), the **DATA_TYPE** must match the data_type attribute from the EventObject from the last section.  Watch out for typos here -- there is no warning.

There are two formatters that you can use, the simple **EventFormatter** and the **ConditionalEventFormatter**. The former should only be used if you are absolutely sure all the attributes mentioned there are going to be set for each and every event object created. That means that for the vast majority of the formatters the ConditionalEventFormatter should be the formatter of choice.

There are two class constants that should always be set, irrelevant of the choice of formatters:

+ **SOURCE_SHORT**: This should match one of the common sources, eg. LOG, WEBHIST, etc. This should closely correspond to the TLN format by H. Carvey as a short description of the source, almost like a short name for the category of the source.
+ **SOURCE_LONG**: Since the category itself is not sufficient to describe the source we have an extra field called SOURCE_LONG that further defines that, for instance a browser history extracted from Chrome browser will have the source short set to WEBHIST, indicating that this comes from a web history, but the SOURCE_LONG contains the text "Chrome History", setting that apart from other browsers.

For the simple EventFormatter two class constants have to be set (or at least one):

+ **FORMAT_STRING**: An unicode string that contains formatting information, place all attribute names in {}. This is just a typical Python formatting string, so all typical [rules](http://docs.python.org/2/library/string.html#formatspec) apply. Timestamp, filename/path, username, hostname, etc information is presented in other fields and should not be a part of the message string.
+ **FORMAT_STRING_SHORT**: This is only needed when you think that the resulting message string may exceed 80 characters in with and you don't want that to be shorten, as in you don't want the short message string to just contain the first 77 characters of the longer version you can construct your own condensed one.

If you use the conditional formatter you need to define the following class constants:

+ **FORMAT_STRING_PIECES**: The same as the FORMAT_STRING, except that this is a list and only one attribute name should be defined per entry. If an attribute is not set in the event object then that particular entry in the list will be omitted.
+ **FORMAT_STRING_SHORT_PIECES**: Same as the FORMAT_STRING_SHORT except in the same format as FORMAT_STRING_PIECES, that is as a list.


**HOWEVER** for our purposes we don't need to write a formatter since we are using an already existing event object, the WindowsRegistryEvent, and for that event object there is an already existing event formatter.

In [None]:
from plaso.formatters import winreg as winreg_formatter

print PrintClassHelp(winreg_formatter.WinRegistryGenericFormatter)

## Test The Plugin

It is very important to test the plugin, to see if it can at least handle our sample dataset.


### Parse the Key Using The Plugin

We can use the code below to test our parsing, to see if the plugin is capable of parsing registry key we provided it with.

In [None]:
from plaso.engine import knowledge_base
from plaso.engine import queue
from plaso.engine import single_process

from plaso.lib import errors
from dfvfs.lib import definitions

from plaso.parsers import mediator as parsers_mediator

from dfvfs.path import factory as path_spec_factory
from dfvfs.resolver import resolver as path_spec_resolver
# A library for testing parsers
from tests.parsers import test_lib as parsers_test_lib

# Create the plugin object
test_plugin = BootExecutePluginFoo()


# Create a mock knowledgebase object.
knowledge_base_object = knowledge_base.KnowledgeBase()

# Create a parser mediator and the necessary queues.
event_queue = single_process.SingleProcessQueue()
event_queue_producer = single_process.SingleProcessItemQueueProducer(event_queue)

parser_error_queue = single_process.SingleProcessQueue()
parser_error_queue_producer = single_process.SingleProcessItemQueueProducer(
      parser_error_queue)

this_parser_mediator = parsers_mediator.ParserMediator(
    event_queue_producer, parser_error_queue_producer, knowledge_base_object)

print u'Parsing key using: {}'.format(test_plugin.plugin_name)

# We already have the registry key extracted from a previous step.
# We can just pass that to the plugin.
print lfu_key.path
test_plugin.Process(this_parser_mediator, lfu_key)

# Set up a consumer to read events emited by our plugin.
event_queue_consumer = parsers_test_lib.TestItemQueueConsumer(event_queue)
# Read all the events.
event_queue_consumer.ConsumeItems()
event_objects = event_queue_consumer.event_objects

print u'Processing of registry key is done.'
print u'Able to extract: {} events from the key.'.format(len(event_objects))

Let's print out the event objects that we managed to extract from this plugin.

In [None]:
# Print out the content of the extracted events.
for index, event_object in enumerate(event_objects):
  print u'*' * 80
  print u'    EVENT NUMBER: {}'.format(index)
  print u'-'*80
  print u'Event:'
  print event_object.GetString()
  print u''


### Writing the Tests

Unit tests are designed to make sure your code is doing what you intended it to do, as well as to let other people know when their refactor broke your code.  This will also assist you when you are writing your code by doing a sanity check on your parser to make sure it works the way you expect it to.

The test go in their own file, in this case the file plaso/winreg_plugins/lfu_test.py.

For the tests to work the formatter needs to imported. However since the formatter is typically named the same name as the actual parser (not in the case of a Windows registry plugin) we may need to import the formatter as a separate name. And since you don't actually use the formatter directly in the file you end up with needing a pylint statement to suppress error messages during linting.  The other imports you'll see through out this code lab. But since everything is in the same namespace here we don't really need to import the formatter, but this is typically added:

    # pylint: disable-msg=unused-import
    from plaso.formatters import winreg as winreg_formatter

The pylint statement needs to be there to make sure that pylint does not complain about an unused import since we are not directly using the formatter, we are just importing it so that it gets registered (othwerise it will not work).


#### TestCase and setUp()

For a plugin test we will use the appropriate plugin test library, in this case the plaso/winreg_plugins/test_lib (or test_lib.RegistryPluginTestCase).  This is a simple class that inherits from the plaso parser test lib (which in turn inherits from the unittest.TestCase class), and adds a few functions to make it easier to test Windows registry plugins. You may want to add a setUp() function to open the sample file and set any other variables you expect in the background, like pre-processor results.  Preprocess will discover system variables such as Timezone and CurrentControlSet.  You can create a local preprocess object to mimic what you expect in a global one.

For a Windows Registry plugin you typically need to open the registry hive using self._GetTestFilePath, define the registry key and use the self._GetKeyFromFile to get the key.

Let's first look at what functions are available to us in the registry test library.

In [None]:
from tests.parsers.winreg_plugins import test_lib

PrintClassHelp(test_lib.RegistryPluginTestCase)

Write the setUp() function for this class definition:

    class TestBootExecutePlugin(test_lib.RegistryPluginTestCase):
      """Tests for the LFU BootExecute Windows Registry plugin."""

      def setUp(self):

While we're setting up the boilerplate of the test, let's add the main function to the bottom of the file.  Then we can run the test on its own.

    if __name__ == '__main__':
     unittest.main()

### Writing the Test

The outline of the main test is to create and run the plugin, then check that the plugin results are correct.  You should check a variety of attributes in one row and something about the extracted events in general.

The test needs to start with the word "test".  Let's use testProcess() (since that is what we are testing, the Process function fo the plugin).  The assertions should include:

+ How many entries were created?
+ For entry[1], is the timestame, username, and full_path correct?
+ For entry[1], are the message strings formatted correctly?


In [None]:
# These are needed to create the fake registry keys for testing.
from plaso.dfwinreg import definitions as dfwinreg_definitions
from plaso.dfwinreg import fake as dfwinreg_fake


class TestBootExecutePluginFoo(test_lib.RegistryPluginTestCase):
  """Tests for the LFU BootExecute Windows Registry plugin."""

  def setUp(self):
    """Sets up the needed objects used throughout the test."""
    # This would have to be changed to reflect the actual plugin, but since we
    # are in the same namespace we can do this.
    self._plugin = BootExecutePluginFoo()
    
  def _CreateTestKey(self, key_path, time_string):
    """Creates Registry keys and values for testing.

    Args:
      key_path: the Windows Registry key path.
      time_string: string containing the key last written date and time.

    Returns:
      A Windows Registry key (instance of dfwinreg.WinRegistryKey).
    """
    filetime = dfwinreg_fake.Filetime()
    filetime.CopyFromString(time_string)
    registry_key = dfwinreg_fake.FakeWinRegistryKey(
        u'Session Manager', key_path=key_path,
        last_written_time=filetime.timestamp, offset=153)

    value_data = u'autocheck autochk *\x00'.encode(u'utf_16_le')
    registry_value = dfwinreg_fake.FakeWinRegistryValue(
        u'BootExecute', data=value_data,
        data_type=dfwinreg_definitions.REG_MULTI_SZ, offset=123)
    registry_key.AddValue(registry_value)

    value_data = u'2592000'.encode(u'utf_16_le')
    registry_value = dfwinreg_fake.FakeWinRegistryValue(
        u'CriticalSectionTimeout', data=value_data,
        data_type=dfwinreg_definitions.REG_SZ, offset=153)
    registry_key.AddValue(registry_value)

    value_data = u'\x00'.encode(u'utf_16_le')
    registry_value = dfwinreg_fake.FakeWinRegistryValue(
        u'ExcludeFromKnownDlls', data=value_data,
        data_type=dfwinreg_definitions.REG_MULTI_SZ, offset=163)
    registry_key.AddValue(registry_value)

    value_data = u'0'.encode(u'utf_16_le')
    registry_value = dfwinreg_fake.FakeWinRegistryValue(
        u'GlobalFlag', data=value_data, data_type=dfwinreg_definitions.REG_SZ,
        offset=173)
    registry_key.AddValue(registry_value)

    value_data = u'0'.encode(u'utf_16_le')
    registry_value = dfwinreg_fake.FakeWinRegistryValue(
        u'HeapDeCommitFreeBlockThreshold', data=value_data,
        data_type=dfwinreg_definitions.REG_SZ, offset=183)
    registry_key.AddValue(registry_value)

    value_data = u'0'.encode(u'utf_16_le')
    registry_value = dfwinreg_fake.FakeWinRegistryValue(
        u'HeapDeCommitTotalFreeThreshold', data=value_data,
        data_type=dfwinreg_definitions.REG_SZ, offset=203)
    registry_key.AddValue(registry_value)

    value_data = u'0'.encode(u'utf_16_le')
    registry_value = dfwinreg_fake.FakeWinRegistryValue(
        u'HeapSegmentCommit', data=value_data,
        data_type=dfwinreg_definitions.REG_SZ, offset=213)
    registry_key.AddValue(registry_value)

    value_data = u'0'.encode(u'utf_16_le')
    registry_value = dfwinreg_fake.FakeWinRegistryValue(
        u'HeapSegmentReserve', data=value_data,
        data_type=dfwinreg_definitions.REG_SZ, offset=223)
    registry_key.AddValue(registry_value)

    value_data = u'2'.encode(u'utf_16_le')
    registry_value = dfwinreg_fake.FakeWinRegistryValue(
        u'NumberOfInitialSessions', data=value_data,
        data_type=dfwinreg_definitions.REG_SZ, offset=243)
    registry_key.AddValue(registry_value)
    
    return registry_key


  def testProcess(self):
    """Tests the Process function."""
    # We don't use the full registry path here (with HKLM, etc), because
    # we're creating a key, not reading it using dfwinreg.
    key_path = u'\\ControlSet001\\Control\\Session Manager'
    time_string = u'2012-08-31 20:45:29'
    registry_key = self._CreateTestKey(key_path, time_string)

    event_queue_consumer = self._ParseKeyWithPlugin(self._plugin, registry_key)
    event_objects = self._GetEventObjectsFromQueue(event_queue_consumer)

    self.assertEqual(len(event_objects), 2)

    event_object = event_objects[0]

    # This should just be the plugin name, as we're invoking it directly,
    # and not through the parser.
    self.assertEqual(event_object.parser, self._plugin.plugin_name)

    expected_timestamp = timelib.Timestamp.CopyFromString(time_string)
    self.assertEqual(event_object.timestamp, expected_timestamp)

    expected_message = (
        u'[{0:s}] BootExecute: autocheck autochk *').format(key_path)

    self._TestGetMessageStrings(
        event_object, expected_message, expected_message)

    event_object = event_objects[1]

    expected_message = (
        u'[{0:s}] '
        u'CriticalSectionTimeout: 2592000 '
        u'ExcludeFromKnownDlls: [] '
        u'GlobalFlag: 0 '
        u'HeapDeCommitFreeBlockThreshold: 0 '
        u'HeapDeCommitTotalFreeThreshold: 0 '
        u'HeapSegmentCommit: 0 '
        u'HeapSegmentReserve: 0 '
        u'NumberOfInitialSessions: 2').format(key_path)
    expected_short_message = u'{0:s}...'.format(expected_message[0:77])

    self._TestGetMessageStrings(
        event_object, expected_message, expected_short_message)


### Running the Test

How will you know what the format string should look like?  Well, it's time to run the code we have.  Typically the plugin needs to be "compiled" before the test will be able to import it, so we need to make sure the plugin gets picked up for compilation.

Normally to run the tests you would either need to run:

    python run_tests.py

And then you can run the test directly using:

    python tests/parsers/winreg_plugins/lfu_test.py

Rinse and repeat as you write the tests.  If you change the parser, you need to recompile.  If you just change the test, you don't.

**However** since we are writing this in our notebook we just need to make sure we've run all the previous code segments, and if you make changes, just re-run it.

To run the test itself, execute the below code:

In [None]:
import unittest
my_suite = unittest.TestSuite()
my_suite.addTest(TestBootExecutePluginFoo('testProcess'))

results = unittest.TextTestRunner(verbosity=3).run(my_suite)

if results.errors:
  print u'Errors came up while trying to run test.'
  for error in results.errors:
    if isinstance(error, basestring):
      print error
    else:
      for sub_error in error:
        print sub_error
elif results.failures:
  print u'Failures came up while trying to run test.'
  for failure in results.failures:
    if isinstance(failure, basestring):
      print failure
    else:
      for sub_failure in failure:
        print sub_failure
else:
  print u'All came out clean.'
  print results

If all went well you should have a fully functioning plugin by now, ready to parse every boot execut registry key you may encounter.  

You can start playing around and making changes to the plugin, to see what happens when changes are introduced, or continue and create a new plugin.


## The Assignment

Now we have gone through step-by-step how a simple Windows Registry plugin is created. It is time to take what we've learned so far and create a new plugin.  

The remainder of the codelab revolves around writing a registry plugin that parses a simple registry key. Write the plugin and the unittest to successfully parse it.

Let's start with a simple registry key from our test registry hive.

In [None]:
mystery_key_path = 'HKEY_LOCAL_MACHINE\\System\\CurrentControlSet\\Control\\ProductOptions'
mystery_key = registry.GetKeyByPath(mystery_key_path)

print u'Key has {0:d} values.'.format(mystery_key.number_of_values)
print u'Key has {0:d} subkeys.'.format(mystery_key.number_of_subkeys)

print u'Values:'
for value in mystery_key.GetValues():
    print u'  {0:s} = {1!r}'.format(value.name, value.data)

print u'*'*80
print u'Sub keys:'
for sub_key in mystery_key.GetSubkeys():
    print u'  {} [{} keys, {} values]'.format(sub_key.path, sub_key.number_of_values, sub_key.number_of_subkeys)

First thing first, what does this key mean? Let's read up:

 + http://support.microsoft.com/kb/152078
 + http://support.microsoft.com/kb/181412
 + http://technet.microsoft.com/en-us/library/cc782360%28WS.10%29.aspx
 

Fill in all the details in the parser so that it successfully parses this file.

In [None]:
class MysteryPlugin(interface.WindowsRegistryPlugin):
  """Plug-in to collect something from something."""

  # Change the name to something more descriptive.
  NAME = 'mystery_plugin'

  # Change the description field.
  DESCRIPTION = 'describe me'

  # We only need one filter to match the key we want
  FILTERS = frozenset([
      interface.WindowsRegistryKeyPathFilter(
            u'HEKY_LOCAL_MACHINE\\System\\CurrentControlSet\\Control\\'
            u'ProductOptions')])
    
  # Fill in here.
  URLS = []

  def GetEntries(
      self, parser_mediator, registry_key, **unused_kwargs):
    """Gather data from our plugin, please rewrite me."""
    
from plaso.parsers import winreg
winreg.WinRegistryParser.RegisterPlugin(MysteryPlugin)

[**optional code segment**] Remember if you make changes to the parser you need to de-register it before you run the code segment again (to register it):

In [None]:
# OPTIONAL DO NOT EXECUTE UNLESS YOU'VE MADE CHANGES TO THE PARSER CODE ABOVE AND
# NEED TO REGISTER THOSE CHANGES!

winreg.WinRegistryParser.DeregisterPlugin(MysteryPlugin)

And finally we need to create a unit test to make sure we are parsing the key properly.

In [None]:
class MysteryPluginTest(test_lib.RegistryPluginTestCase):
  """Tests for the mysterious registry key."""

  def setUp(self):
    """Sets up the needed objects used throughout the test."""
    # Once we would actually commit the code in we would use a real file
    # that is inside our test directory. But for now we call the temporary
    # file that we defined earlier.
    #test_file = self._GetTestFilePath(['SYSTEM'])
    self.test_file = test_registry_file_name
    
    self._plugin = MysteryPlugin()

  def testProcess(self):
    """Tests the Process function and change me, pretty please."""
    # Put in the actual path, dfwinreg will sort out the control set.
    file_entry = self._GetTestFileEntryFromPath([self.test_file])
    key_path = '\\ControlSet001\\Control\\ProductOptions'
    registry_key = self._GetKeyFromFileEntry(file_entry, key_path)

    event_queue_consumer = self._ParseKeyWithPlugin(self._plugin, registry_key)
    event_objects = self._GetEventObjectsFromQueue(event_queue_consumer)

    # At bare minimum we need to test that the parser successfully parsed
    # all the lines.
    self.assertEquals(len(event_objects), 1)

    # Let's read in this event object and verify some of it's fields.
    event_object = event_objects[0]

    # Make sure the timestamp is the correct one.
    # date -u -d "2011-09-17T13:43:39.129177" +"%s%N" / 1000
    self.assertEquals(event_object.timestamp, 1316267019129177)
    # Add here some tests to make sure we are parsing/extracting attribute names.

    # And now we need to test our formatter, create a message string and test it.
    expected_msg = u'This is a message string.'
    expected_msg_short = u'The short version.'

    self._TestGetMessageStrings(event_object, expected_msg, expected_msg_short)

And run these tests to make sure we have everything covered.

In [None]:
my_suite = unittest.TestSuite()
my_suite.addTest(MysteryPluginTest('testProcess'))

results = unittest.TextTestRunner(verbosity=3).run(my_suite)

if results.errors:
  print u'Errors came up while trying to run test.'
  for error in results.errors:
    if isinstance(error, basestring):
      print error
    else:
      for sub_error in error:
        print sub_error
elif results.failures:
  print u'Failures came up while trying to run test.'
  for failure in results.failures:
    if isinstance(failure, basestring):
      print failure
    else:
      for sub_failure in failure:
        print sub_failure
else:
  print u'All came out clean.'
  print results

And remember there may be some interesting code segments that were introduced in the overview of the boot execute plugin that might help you during this assignment (just change the appropriate calls to the boot execute plugin to the newly created one and execute them again).

And you can also start playing a bit with this registry hive to find other interesting keys (since the key in the assignment is not really that interesting).

You can do this using the Plaso 'preg.py' tool, outside of this ipython notebook. 

Run it like so: preg.py -c test_data/SOFTWARE


And now we can start exploring this registry file.

Try commands like:
 + cd KEY
 + ls [-v]
 + parse [-v]
 + plugin PLUGIN_NAME
 
Tab completion for key names should work (please try it out and play with it). Which means you can type something like:

    cd C<TAB>
    
And the tab completion will try to complete that registry key.

Try to begin with something like the code segment below, and then change it at will (the % in front of commands is to specify these are "magic" commands, it is not necessary to have there, but it's kept just in case since some setups of ipython/notebook don't seem to like not having it there)

In [None]:
%ls
%cd \ControlSet001\Enum\USBSTOR
%ls
%cd Disk&Ven_HP&Prod_v100w&Rev_1024\AA951D0000007252&0
%ls -v
%parse
%pwd

## Clean Up

During our test code we created a temporary file, that we may want to delete. To delete it, use the code below:


In [None]:
import os

if test_registry_file_name:
  os.remove(test_registry_file_name)