Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Documentation on writing your own transformation functions
- Loading branch information
Showing
3 changed files
with
125 additions
and
36 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,6 +22,7 @@ Contents: | |
space | ||
data_format | ||
foundation | ||
writing-transformations | ||
filesystem | ||
uncertainty | ||
tests | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
.. _writing: | ||
|
||
Writing your own transformation functions | ||
***************************************** | ||
|
||
A transformation function takes the current list of datasets as an input, does some transformation to these datasets, and then returns the changed data. For example, here is a transformation function that doubles the amount of each exchange: | ||
|
||
.. code-block:: python | ||
def doubler(data): | ||
for ds in data: | ||
for exc in ds['exchanges']: | ||
exc['amount'] *= 2 | ||
return data | ||
To make reporting meaningful, transformation functions should follow several conventions, explained in the the following sections. | ||
|
||
Input arguments: all data or a single dataset | ||
============================================= | ||
|
||
This choice is entirely up to you. Both types of input arguments are used in the Ocelot codebase, and some functions seem to fit more naturally into one input type or the other. | ||
|
||
In general, if your function takes all data as its input, this argument should be called ``data``. If your function takes a single dataset, then the argument should be labeled ``dataset``. This is not a hard rule, and is not enforced, but will make it easier for others to understand your code. | ||
|
||
The `single_input` decorator | ||
---------------------------- | ||
|
||
If you are writing a transformation function that takes a single dataset as its input, wrap it with the ``single_input`` decorator: | ||
|
||
.. code-block:: python | ||
from ocelot.transformations.utils import single_input | ||
@single_input | ||
def doubler_variation(dataset): | ||
for exc in dataset['exchanges']: | ||
exc['amount'] *= 2 | ||
return [dataset] # Note that this is a list | ||
If you use the ``single_input`` decorator, be aware of the following rules: | ||
|
||
* The input will be a single dataset | ||
* The function should return a list of datasets, as transformation functions can split datasets | ||
|
||
Logging what your function does | ||
=============================== | ||
|
||
Report logging | ||
-------------- | ||
|
||
Report logging is data which will go into the HTML report which is produced at the end of each Ocelot model run. Most of the time you will want to provide logging data that will be turned into tables in this report. To tell the report generator how to define these tables, define the function attribute ``__table__`` as follows: | ||
|
||
.. code-block:: python | ||
def some_transformation(data): | ||
return data | ||
some_transformation.__table__ = { | ||
'title': 'Name of title to put in report', | ||
'columns': ["names", "of", "columns"] | ||
} | ||
``__table__`` should define: | ||
|
||
* ``title``: The title of the function data provided | ||
* ``columns``: A list of column headings | ||
|
||
There should also be logging inside the transformation function. You need to retrieve the ``ocelot`` logger, and log using the log level ``info``. So, a more complete example would actually look something like this: | ||
|
||
.. code-block:: python | ||
logger = logging.getLogger('ocelot') # Very important | ||
def count_exchanges(data): | ||
"""Function that counts things. | ||
Does not change any data.""" | ||
for ds in data: | ||
logger.info({ | ||
'type': 'table element', | ||
'data': [ds['name'], len(ds['exchanges'])] | ||
}) | ||
return data | ||
count_exchanges.__table__ = { | ||
'title': 'Count the number of exchanges in each dataset', | ||
'columns': ["Name", "# of exchanges"] | ||
} | ||
Log messages should be a dictionary, with the key ``type`` (and value ``table element`` for tabular data). The key ``data`` should give a list of data in the same order as ``columns``. | ||
|
||
If tables don't work for your transformation function, you can skip the ``__table__`` attribute, and just log ``list element`` log messages: | ||
|
||
.. code-block:: python | ||
logger = logging.getLogger('ocelot') | ||
def count_exchanges(data): | ||
"""Function that counts things. | ||
Does not change any data.""" | ||
for ds in data: | ||
logger.info({ | ||
'type': 'list element', | ||
'data': "Dataset <b>{}</b> has <i>{}</i> exchanges".format( | ||
ds['name'], len(ds['exchanges']) | ||
) | ||
}) | ||
return data | ||
Messages that have the type ``list element`` can be HTML. | ||
|
||
Currying transformation functions | ||
================================= | ||
|
||
If you need to initialize functions using `functools.partial <https://docs.python.org/3.5/library/functools.html#functools.partial>`__, the report generator will still get the correct function metadata. Other forms of currying are not supported. |