Documentation on writing your own transformation functions

OcelotProject · Sep 19, 2016 · 9e11f1c · 9e11f1c
1 parent ca8bf23
commit 9e11f1c
Show file tree

Hide file tree

Showing 3 changed files with 125 additions and 36 deletions.
diff --git a/docs/foundation.rst b/docs/foundation.rst
@@ -42,57 +42,29 @@ The ``system_model`` function is actually quite simple:
 Transformation functions
 ========================
 
-Transform functions are the heart of Ocelot - each one performs one distinct change to the collection of datasets. Transform functions can be any callable, bit are mostly commonly functions.
+.. note:: See also :ref:`writing`.
 
-The report generator will use information about each transform function when creating the report. Specifically, the report generator will look at the function name, its `docstring <https://www.python.org/dev/peps/pep-0257/>`__ (a text description of what the function does, included in the function code), and a new object attribute that you have to specify: ``__table__``.
+Transform functions are the heart of Ocelot - each one performs one distinct change to the collection of datasets. Transform functions can be any callable, but most are simple functions.
 
-Most of the time you will want to provide logging data that will be turned into tables in the run report. To do this, define the transformation function attribute ``__table__`` as follows:
-
-.. code-block:: python
-
-    def some_transformation(data):
-        return data
-
-    some_transformation.__table__ = {
-        'title': 'Name of title to put in report',
-        'columns': ["names", "of", "columns"]
-    }
-
-Functions take two input arguments: The input ``data``, and the ``logger``. Functions should return the transformed data. Log messages are written using ``logger.log(message)``. Log messages should be a dictionary, with at least the key ``type`` defined. Log messages that provide table data should look like this:
-
-.. code-block:: python
-
-    logger.log({
-        'type': 'table element',
-        'data': [data in same order as columns]
-    })
-
-Log messages that provide data in a list format look like this:
-
-.. code-block:: python
-
-    logger.log({
-        'type': 'list element',
-        'data': HTML string
-    })
-
-If you need to initialize functions using `functools.partial <https://docs.python.org/3.5/library/functools.html#functools.partial>`__, the report generator will still get the correct function metadata. Other forms of currying are not supported.
+The report generator will use information about each transformation function when creating the report. Specifically, the report generator will look at the function name, its `docstring <https://www.python.org/dev/peps/pep-0257/>`__ (a text description of what the function does, included in the function code), and any additional tabular data your provide during the function call.
 
 .. _logger:
 
 Logging
 =======
 
-Ocelot uses standard `python logging <https://docs.python.org/3/library/logging.html>`_, with a custom formatter that encodes log messages to JSON dictionaries. Therefore, log messages must be **dictionaries**:
+Ocelot uses standard `python logging <https://docs.python.org/3/library/logging.html>`__, with a custom formatter that encodes log messages to JSON dictionaries. Due to this custom formatter, the ``ocelot`` logger must be retrieved in each file which uses logging:
 
 .. code-block:: python
 
     import logging
 
+    logger = logging.getLogger('ocelot')
+
     def my_transformation(data):
-        logging.info({"message": "something", "count": len(data)})
+        logger.info({"message": "something", "count": len(data)})
 
-Log messages are written when a run is started or finished, when transformation functions are started or finished, and whenever the transformation function wants to log something. The log message format is documented in :ref:`logging-format`.
+Log messages are written when a run is started or finished, when transformation functions are started or finished, and whenever the transformation function wants to log something. The message format for the log written to disk (i.e. with each line JSON encoded) is documented in :ref:`logging-format`.
 
 .. note:: ``time`` is added automatically to each log message.
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -22,6 +22,7 @@ Contents:
    space
    data_format
    foundation
+   writing-transformations
    filesystem
    uncertainty
    tests

diff --git a/docs/writing-transformations.rst b/docs/writing-transformations.rst
@@ -0,0 +1,116 @@
+.. _writing:
+
+Writing your own transformation functions
+*****************************************
+
+A transformation function takes the current list of datasets as an input, does some transformation to these datasets, and then returns the changed data. For example, here is a transformation function that doubles the amount of each exchange:
+
+.. code-block:: python
+
+    def doubler(data):
+        for ds in data:
+            for exc in ds['exchanges']:
+                exc['amount'] *= 2
+        return data
+
+To make reporting meaningful, transformation functions should follow several conventions, explained in the the following sections.
+
+Input arguments: all data or a single dataset
+=============================================
+
+This choice is entirely up to you. Both types of input arguments are used in the Ocelot codebase, and some functions seem to fit more naturally into one input type or the other.
+
+In general, if your function takes all data as its input, this argument should be called ``data``. If your function takes a single dataset, then the argument should be labeled ``dataset``. This is not a hard rule, and is not enforced, but will make it easier for others to understand your code.
+
+The `single_input` decorator
+----------------------------
+
+If you are writing a transformation function that takes a single dataset as its input, wrap it with the ``single_input`` decorator:
+
+.. code-block:: python
+
+    from ocelot.transformations.utils import single_input
+
+    @single_input
+    def doubler_variation(dataset):
+        for exc in dataset['exchanges']:
+            exc['amount'] *= 2
+    return [dataset]  # Note that this is a list
+
+If you use the ``single_input`` decorator, be aware of the following rules:
+
+* The input will be a single dataset
+* The function should return a list of datasets, as transformation functions can split datasets
+
+Logging what your function does
+===============================
+
+Report logging
+--------------
+
+Report logging is data which will go into the HTML report which is produced at the end of each Ocelot model run. Most of the time you will want to provide logging data that will be turned into tables in this report. To tell the report generator how to define these tables, define the function attribute ``__table__`` as follows:
+
+.. code-block:: python
+
+    def some_transformation(data):
+        return data
+
+    some_transformation.__table__ = {
+        'title': 'Name of title to put in report',
+        'columns': ["names", "of", "columns"]
+    }
+
+``__table__`` should define:
+
+    * ``title``: The title of the function data provided
+    * ``columns``: A list of column headings
+
+There should also be logging inside the transformation function. You need to retrieve the ``ocelot`` logger, and log using the log level ``info``. So, a more complete example would actually look something like this:
+
+.. code-block:: python
+
+    logger = logging.getLogger('ocelot')  # Very important
+
+    def count_exchanges(data):
+        """Function that counts things.
+
+        Does not change any data."""
+        for ds in data:
+            logger.info({
+                'type': 'table element',
+                'data': [ds['name'], len(ds['exchanges'])]
+            })
+        return data
+
+    count_exchanges.__table__ = {
+        'title': 'Count the number of exchanges in each dataset',
+        'columns': ["Name", "# of exchanges"]
+    }
+
+Log messages should be a dictionary, with the key ``type`` (and value ``table element`` for tabular data). The key ``data`` should give a list of data in the same order as ``columns``.
+
+If tables don't work for your transformation function, you can skip the ``__table__`` attribute, and just log ``list element`` log messages:
+
+.. code-block:: python
+
+    logger = logging.getLogger('ocelot')
+
+    def count_exchanges(data):
+        """Function that counts things.
+
+        Does not change any data."""
+        for ds in data:
+            logger.info({
+                'type': 'list element',
+                'data': "Dataset <b>{}</b> has <i>{}</i> exchanges".format(
+                    ds['name'], len(ds['exchanges'])
+                )
+            })
+        return data
+
+Messages that have the type ``list element`` can be HTML.
+
+Currying transformation functions
+=================================
+
+If you need to initialize functions using `functools.partial <https://docs.python.org/3.5/library/functools.html#functools.partial>`__, the report generator will still get the correct function metadata. Other forms of currying are not supported.