Skip to content

Commit

Permalink
Documentation on writing your own transformation functions
Browse files Browse the repository at this point in the history
  • Loading branch information
cmutel committed Sep 19, 2016
1 parent ca8bf23 commit 9e11f1c
Show file tree
Hide file tree
Showing 3 changed files with 125 additions and 36 deletions.
44 changes: 8 additions & 36 deletions docs/foundation.rst
Expand Up @@ -42,57 +42,29 @@ The ``system_model`` function is actually quite simple:
Transformation functions
========================

Transform functions are the heart of Ocelot - each one performs one distinct change to the collection of datasets. Transform functions can be any callable, bit are mostly commonly functions.
.. note:: See also :ref:`writing`.

The report generator will use information about each transform function when creating the report. Specifically, the report generator will look at the function name, its `docstring <https://www.python.org/dev/peps/pep-0257/>`__ (a text description of what the function does, included in the function code), and a new object attribute that you have to specify: ``__table__``.
Transform functions are the heart of Ocelot - each one performs one distinct change to the collection of datasets. Transform functions can be any callable, but most are simple functions.

Most of the time you will want to provide logging data that will be turned into tables in the run report. To do this, define the transformation function attribute ``__table__`` as follows:

.. code-block:: python
def some_transformation(data):
return data
some_transformation.__table__ = {
'title': 'Name of title to put in report',
'columns': ["names", "of", "columns"]
}
Functions take two input arguments: The input ``data``, and the ``logger``. Functions should return the transformed data. Log messages are written using ``logger.log(message)``. Log messages should be a dictionary, with at least the key ``type`` defined. Log messages that provide table data should look like this:

.. code-block:: python
logger.log({
'type': 'table element',
'data': [data in same order as columns]
})
Log messages that provide data in a list format look like this:

.. code-block:: python
logger.log({
'type': 'list element',
'data': HTML string
})
If you need to initialize functions using `functools.partial <https://docs.python.org/3.5/library/functools.html#functools.partial>`__, the report generator will still get the correct function metadata. Other forms of currying are not supported.
The report generator will use information about each transformation function when creating the report. Specifically, the report generator will look at the function name, its `docstring <https://www.python.org/dev/peps/pep-0257/>`__ (a text description of what the function does, included in the function code), and any additional tabular data your provide during the function call.

.. _logger:

Logging
=======

Ocelot uses standard `python logging <https://docs.python.org/3/library/logging.html>`_, with a custom formatter that encodes log messages to JSON dictionaries. Therefore, log messages must be **dictionaries**:
Ocelot uses standard `python logging <https://docs.python.org/3/library/logging.html>`__, with a custom formatter that encodes log messages to JSON dictionaries. Due to this custom formatter, the ``ocelot`` logger must be retrieved in each file which uses logging:

.. code-block:: python
import logging
logger = logging.getLogger('ocelot')
def my_transformation(data):
logging.info({"message": "something", "count": len(data)})
logger.info({"message": "something", "count": len(data)})
Log messages are written when a run is started or finished, when transformation functions are started or finished, and whenever the transformation function wants to log something. The log message format is documented in :ref:`logging-format`.
Log messages are written when a run is started or finished, when transformation functions are started or finished, and whenever the transformation function wants to log something. The message format for the log written to disk (i.e. with each line JSON encoded) is documented in :ref:`logging-format`.

.. note:: ``time`` is added automatically to each log message.

Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Expand Up @@ -22,6 +22,7 @@ Contents:
space
data_format
foundation
writing-transformations
filesystem
uncertainty
tests
Expand Down
116 changes: 116 additions & 0 deletions docs/writing-transformations.rst
@@ -0,0 +1,116 @@
.. _writing:

Writing your own transformation functions
*****************************************

A transformation function takes the current list of datasets as an input, does some transformation to these datasets, and then returns the changed data. For example, here is a transformation function that doubles the amount of each exchange:

.. code-block:: python
def doubler(data):
for ds in data:
for exc in ds['exchanges']:
exc['amount'] *= 2
return data
To make reporting meaningful, transformation functions should follow several conventions, explained in the the following sections.

Input arguments: all data or a single dataset
=============================================

This choice is entirely up to you. Both types of input arguments are used in the Ocelot codebase, and some functions seem to fit more naturally into one input type or the other.

In general, if your function takes all data as its input, this argument should be called ``data``. If your function takes a single dataset, then the argument should be labeled ``dataset``. This is not a hard rule, and is not enforced, but will make it easier for others to understand your code.

The `single_input` decorator
----------------------------

If you are writing a transformation function that takes a single dataset as its input, wrap it with the ``single_input`` decorator:

.. code-block:: python
from ocelot.transformations.utils import single_input
@single_input
def doubler_variation(dataset):
for exc in dataset['exchanges']:
exc['amount'] *= 2
return [dataset] # Note that this is a list
If you use the ``single_input`` decorator, be aware of the following rules:

* The input will be a single dataset
* The function should return a list of datasets, as transformation functions can split datasets

Logging what your function does
===============================

Report logging
--------------

Report logging is data which will go into the HTML report which is produced at the end of each Ocelot model run. Most of the time you will want to provide logging data that will be turned into tables in this report. To tell the report generator how to define these tables, define the function attribute ``__table__`` as follows:

.. code-block:: python
def some_transformation(data):
return data
some_transformation.__table__ = {
'title': 'Name of title to put in report',
'columns': ["names", "of", "columns"]
}
``__table__`` should define:

* ``title``: The title of the function data provided
* ``columns``: A list of column headings

There should also be logging inside the transformation function. You need to retrieve the ``ocelot`` logger, and log using the log level ``info``. So, a more complete example would actually look something like this:

.. code-block:: python
logger = logging.getLogger('ocelot') # Very important
def count_exchanges(data):
"""Function that counts things.
Does not change any data."""
for ds in data:
logger.info({
'type': 'table element',
'data': [ds['name'], len(ds['exchanges'])]
})
return data
count_exchanges.__table__ = {
'title': 'Count the number of exchanges in each dataset',
'columns': ["Name", "# of exchanges"]
}
Log messages should be a dictionary, with the key ``type`` (and value ``table element`` for tabular data). The key ``data`` should give a list of data in the same order as ``columns``.

If tables don't work for your transformation function, you can skip the ``__table__`` attribute, and just log ``list element`` log messages:

.. code-block:: python
logger = logging.getLogger('ocelot')
def count_exchanges(data):
"""Function that counts things.
Does not change any data."""
for ds in data:
logger.info({
'type': 'list element',
'data': "Dataset <b>{}</b> has <i>{}</i> exchanges".format(
ds['name'], len(ds['exchanges'])
)
})
return data
Messages that have the type ``list element`` can be HTML.

Currying transformation functions
=================================

If you need to initialize functions using `functools.partial <https://docs.python.org/3.5/library/functools.html#functools.partial>`__, the report generator will still get the correct function metadata. Other forms of currying are not supported.

0 comments on commit 9e11f1c

Please sign in to comment.