Running Ocelot usually involves the following steps:
- Get the path of an undefined database in ecospold2 format on your local computer.
- Decide on a system model configuration to transform the undefined datasets to a linked database. This configuration could be a list of Python functions, or could be the default Ocelot system model.
- Call the
system_model
function, either directly or through the command line application.system_model
takes the directory path from step one and the configuration from step two as inputs. - Look through the HTML report generated by the system model function, and either accept the given linked database, or make changes in your configuration definitions or transformation functions.
An Ocelot system model configuration is essentially just a list of transformation functions which, when applied in order, produce a realization of a linked database. Configurations are currently specified in Python code, but in the future will also be able to be defined in other formats such as Excel.
We are actively exploring various ways of defining these configurations. The built-in configurations will be provided as a list of transformation functions already in Ocelot, perhaps wrapped in a configuration object. Another simple configuration format would be a text file, where each line was the name of a transformation function that could be imported from ocelot.transformations
. However, this doesn't work well for user-defined functions, nor if you need to prepare functions by e.g. currying them. We are also looking at several configuration libraries, but haven't found anything that seems to fit our mental models or use cases well:
- ConfigParser (In Python standard library)
- configure
- PyStaticConfiguration
- pymlconf
So far no final decisions have been made, and things here will evolve along with the Ocelot codebase.
Running Ocelot without specifying a configuration will use the default configuration, which is the cutoff system model.
A typical system model may have many transformation functions, as each function should do exactly one specific change. To make configurations more readable, you can use a Collection
object to group transformation functions that are commonly used together, or that form one unit of work.
ocelot.Collection
The system_model
function is actually quite simple:
ocelot.model.system_model
Note
See also writing
.
Transform functions are the heart of Ocelot - each one performs one distinct change to the collection of datasets. Transform functions can be any callable, but most are simple functions.
The report generator will use information about each transformation function when creating the report. Specifically, the report generator will look at the function name, its docstring (a text description of what the function does, included in the function code), and any additional tabular data your provide during the function call.
Ocelot uses standard python logging, with a custom formatter that encodes log messages to JSON dictionaries. Due to this custom formatter, the ocelot
logger must be retrieved in each file which uses logging:
import logging
logger = logging.getLogger('ocelot')
def my_transformation(data):
logger.info({"message": "something", "count": len(data)})
Log messages are written when a run is started or finished, when transformation functions are started or finished, and whenever the transformation function wants to log something. The message format for the log written to disk (i.e. with each line JSON encoded) is documented in logging-format
.
Note
time
is added automatically to each log message.
In the last step in the workflow, the model run log data is formatted into an HTML report.
ocelot.HTMLReport