Skip to content

Commit

Permalink
added documentation for the FileStorageObserver and refactored it sli…
Browse files Browse the repository at this point in the history
…ghtly
  • Loading branch information
Qwlouse committed Jul 11, 2016
1 parent d88deb2 commit f6727bd
Show file tree
Hide file tree
Showing 4 changed files with 250 additions and 38 deletions.
182 changes: 153 additions & 29 deletions docs/observers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,32 @@ Sacred helps you doing that by providing an *Observer Interface* for your
experiments. By attaching an Observer you can gather all the information about
the run even while it is still running.

The only observer that is shipped with Sacred at this point is
:ref:`mongo_observer`, so we'll focus on that.
The ``MongoObserver`` collects information about an experiment and stores them
in a `MongoDB <http://www.mongodb.org/>`_.
At the moment there are three observers that are shipped with Sacred:

But if you want your run infos stored some other way, it is easy to write
* The main one is the :ref:`mongo_observer` which stores all information in a
`MongoDB <http://www.mongodb.org/>`_.
* The :ref:`file_observer` stores the run information as files in a given
directory and will therefore only work locally.
* The :ref:`sql_observer` connects to any SQL database and will store the
relevant information there.

But if you want the run information stored some other way, it is easy to write
your own :ref:`custom_observer`.

.. _mongo_observer:

MongoObserver
=============
The MongoObserver is the recommended way of storing the run information from
Sacred.
MongoDB allows very powerful querying of the entries that can deal with
almost any structure of the configuration and the custom info.
Furthermore it is easy to set-up and allows to connect to a central remote DB.
Most tools for further analysing the data collected by Sacred build upon this
observer.

Adding a MongoObserver
======================
----------------------
You can add a MongoObserver from the command-line via the ``-m MY_DB`` flag::

>> ./my_experiment.py -m MY_DB
Expand Down Expand Up @@ -56,7 +70,6 @@ network without authentication.

Authentication
--------------

If you need authentication a little more work might be necessary.
First you have to decide which
`authentication protocol <http://api.mongodb.org/python/current/examples/authentication.html>`_
Expand Down Expand Up @@ -85,7 +98,7 @@ If additional arguments need to be passed to the MongoClient they can just be in
ssl_ca_certs='/path/to/ca.pem'))
Database Entry
==============
--------------
The MongoObserver creates three collections to store information. The first,
``runs`` (that name can be changed), is the main collection that contains one
entry for each run.
Expand All @@ -101,12 +114,12 @@ So here is an example entry in the ``runs`` collection::
> db.runs.find()[0]
{
"_id" : ObjectId("5507248a1239672ae04591e2"),
"format" : "MongoObserver-0.7.0",
"status" : "COMPLETED",
"result" : null,
"start_time" : ISODate("2015-03-16T19:44:26.439Z"),
"heartbeat" : ISODate("2015-03-16T19:44:26.446Z"),
"stop_time" : ISODate("2015-03-16T19:44:26.447Z")

"start_time" : ISODate("2016-07-11T14:50:14.473Z"),
"heartbeat" : ISODate("2015-03-16T19:44:26.530Z"),
"stop_time" : ISODate("2015-03-16T19:44:26.532Z"),
"config" : {
"message" : "Hello world!",
"seed" : 909032414,
Expand All @@ -116,30 +129,27 @@ So here is an example entry in the ``runs`` collection::
"resources" : [ ],
"artifacts" : [ ],
"captured_out" : "Hello world!\n",

"experiment" : {
"name" : "hello_config_scope",
"dependencies" : [
["numpy", "1.9.1"],
["sacred", "0.6"]
],
"name" : "hello_cs",
"base_dir" : "$(HOME)/sacred/examples/"
"dependencies" : ["numpy==1.9.1", "sacred==0.7.0"],
"sources" : [
[
"$(HOME)/sacred/examples/03_hello_config_scope.py",
"da6a2d6e03d122b3abead21b0c621ba9"
"03_hello_config_scope.py",
ObjectId("5507248a1239672ae04591e3")
]
],
"doc" : "A configurable Hello World \"experiment\".\nIn this [...]"
"repositories" : [{
"url" : "git@github.com:IDSIA/sacred.git"
"dirty" : false,
"commit" : "d88deb2555bb311eb779f81f22fe16dd3b703527"}]
},

"host" : {
"os" : "Linux",
"os" : ["Linux",
"Linux-3.13.0-46-generic-x86_64-with-Ubuntu-14.04-trusty"],
"cpu" : "Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz",
"hostname" : "MyAwesomeMachine",
"python_version" : "3.4.0",
"python_compiler" : "GCC 4.8.2",
"os_info" : "Linux-3.13.0-46-generic-x86_64-with-Ubuntu-14.04-trusty",
"cpu_count" : 8
"python_version" : "3.4.0"
},
}

Expand All @@ -154,13 +164,127 @@ it stored the sourcecode of the experiment in the database::
{
"_id" : ObjectId("5507248a1239672ae04591e3"),
"filename" : "$(HOME)/sacred/examples/03_hello_config_scope.py",
"md5" : "da6a2d6e03d122b3abead21b0c621ba9",
"md5" : "897b2144880e2ee8e34775929943f496",
"chunkSize" : 261120,
"length" : 1526,
"uploadDate" : ISODate("2015-03-16T18:44:26.444Z")
"uploadDate" : ISODate("2016-07-11T12:50:14.522Z")
}


.. _file_observer:

FileStorageObserver
===================
The FileStorageObserver is the most basic observer and requires the least
amount of setup.
It is mostly meant for preliminary experiments and cases when setting up a
database is difficult or impossible.
But in combination with the template rendering integration it can be very
helpful.

Adding a FileStorageObserver
----------------------------
The FileStorageObserver can be added from the command-line via the
``-F BASEDIR`` and ``--file_storage=BASEDIR`` flags::

>> ./my_experiment.py -F BASEDIR
>> ./my_experiment.py --file_storage=BASEDIR

Here ``BASEDIR`` is the name of the directory in which all the subdirectories
for individual runs will be created.

You can, of course, also add it from code like this:

.. code-block:: python
from sacred.observers import FileStorageObserver
ex.observers.append(FileStorageObserver.create('my_runs'))
Directory Structure
-------------------
The FileStorageObserver creates a separate sub-directory for each run and stores
several files in there::

my_runs/
run_3mdq4amp/
config.json
cout.txt
info.json
run.json
run_zw82a7xg/
...
...

``config.json`` contains the JSON-serialized version of the configuration
and ``cout.txt`` the captured output.
The main information is stored in ``run.json`` and is very similar to the
database entries from the :ref:`mongo_observer`::

{
"command": "main",
"status": "COMPLETED",
"start_time": "2016-07-11T15:35:14.765152",
"heartbeat": "2016-07-11T15:35:14.766793",
"stop_time": "2016-07-11T15:35:14.768465",
"result": null,
"experiment": {
"base_dir": "/home/greff/Programming/sacred/examples",
"dependencies": [
"numpy==1.11.0",
"sacred==0.6.9"],
"name": "hello_cs",
"repositories": [{
"commit": "d88deb2555bb311eb779f81f22fe16dd3b703527",
"dirty": false,
"url": "git@github.com:IDSIA/sacred.git"}],
"sources": [
["03_hello_config_scope.py",
"_sources/03_hello_config_scope_897b2144880e2ee8e34775929943f496.py"]]
},
"host": {
"cpu": "Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz",
"hostname": "Liz",
"os": ["Linux",
"Linux-3.19.0-58-generic-x86_64-with-Ubuntu-15.04-vivid"],
"python_version": "3.4.3"
},
"artifacts": [],
"resources": [],
"meta": {},
}

In addition to that there is an ``info.json`` file holding :ref:`custom_info`
(if existing) and all the :ref:`artifacts`.

The FileStorageObserver also stores a snapshot of the source-code in a separate
``my_runs/_sources`` directory, and :ref:`resources` in ``my_runs/_resources``
(if present).
Their filenames are stored in the ``run.json`` file such that the corresponding
files can be easily linked to their respective run.

Template Rendering
------------------
In addition to these basic files, the FileStorageObserver can also generate a
report for each run from a given template file.
The prerequisite for this is that the `mako <http://www.makotemplates.org/>`_ package is installed and a
``my_runs/template.html`` file needs to exist.
The file can be located somewhere else, but then the filename must be passed to
the FileStorageObserver like this:

.. code-block:: python
from sacred.observers import FileStorageObserver
ex.observers.append(FileStorageObserver.create('my_runs', template='/custom/template.txt'))
The FileStorageObserver will then render that template into a
``report.html``/``report.txt`` file in the respective run directory.
``mako`` is a very powerful templating engine that can execute
arbitrary python-code, so be careful about the templates you use.
For an example see ``sacred/examples/my_runs/template.html``.

Events
======
A ``started_event`` is fired when a run starts.
Expand Down
70 changes: 70 additions & 0 deletions examples/my_runs/template.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>${run['experiment']['name']}</title>
</head>
<body>

<%
from datetime import datetime

import matplotlib.pyplot as plt

import seaborn as sns

def plot(obj, t):
fig, ax = plt.subplots(figsize=(4, 3))
ax.plot(obj)
fig.savefig(t + '.png', bbox_inches='tight', pad_inches=0)
return '<img src="{}.png" width="300px"></img>'.format(t)
%>


<%def name="cfg(obj)">
% if isinstance(obj, dict):
<ul>
% for k, v in sorted(obj.items()):
<li><b>${k}</b>${cfg(v)}</li>
% endfor
</ul>
% else:
= ${obj}
% endif
</%def>


<%def name="nfo(obj, t)">
% if isinstance(obj, dict):
<ul>
% for k, v in sorted(obj.items()):
<li><b>${k}</b>${nfo(v, t + '.' + k)}</li>
% endfor
</ul>
% elif isinstance(obj, list):
${plot(obj, t)}
% else:
= ${obj}
% endif
</%def>



<h1>${run['experiment']['name']}</h1>
<table>
<tr><td>Status</td><td>${run['status'] }</td></tr>
<tr><td>Result</td><td>${run.get('result') }</td></tr>
<tr><td>Elapsed Time</td><td>${run['stop_time'] - run['start_time'] }</td></tr>
<tr><td>Start Time</td><td>${run['start_time']}</td></tr>
<tr><td>Stop Time</td><td>${run['stop_time']}</td></tr>
<tr><td>Heartbeat</td><td>${run['heartbeat']}</td></tr>
</table>

<h2>Config</h2>
${cfg(config)}

<h2>Info</h2>
${nfo(info, 'info')}

</body>
</html>
35 changes: 26 additions & 9 deletions sacred/observers/file_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,28 @@ def json_serial(obj):
class FileStorageObserver(RunObserver):
VERSION = 'FileStorageObserver-0.7.0'

def __init__(self, basedir, resource_dir=None, source_dir=None):
@classmethod
def create(cls, basedir, resource_dir=None, source_dir=None,
template=None):
if not os.path.exists(basedir):
os.makedirs(basedir)
resource_dir = resource_dir or os.path.join(basedir, '_resources')
source_dir = source_dir or os.path.join(basedir, '_sources')
if template is not None:
if not os.path.exists(template):
raise FileNotFoundError("Couldn't find template file '{}'"
.format(template))
else:
template = os.path.join(basedir, 'template.html')
if not os.path.exists(template):
template = None
return cls(basedir, resource_dir, source_dir, template)

def __init__(self, basedir, resource_dir, source_dir, template):
self.basedir = basedir
self.resource_dir = resource_dir or os.path.join(basedir, '_resources')
self.source_dir = source_dir or os.path.join(basedir, '_sources')
self.resource_dir = resource_dir
self.source_dir = source_dir
self.template = template
self.dir = None
self.run_entry = None
self.config = None
Expand Down Expand Up @@ -128,24 +144,25 @@ def save_cout(self):
f.write(self.cout)

def render_template(self):
template_name = os.path.join(self.basedir, 'template.html')
if opt.has_mako and os.path.exists(template_name):
if opt.has_mako and self.template:
from mako.template import Template
template = Template(filename=template_name)
template = Template(filename=self.template)
report = template.render(run=self.run_entry,
config=self.config,
info=self.info,
cout=self.cout,
savedir=self.dir)
with open(os.path.join(self.dir, 'report.html'), 'w') as f:
_, ext = os.path.splitext(self.template)
with open(os.path.join(self.dir, 'report' + ext), 'w') as f:
f.write(report)

def heartbeat_event(self, info, cout_filename, beat_time):
self.info = info
self.run_entry['heartbeat'] = beat_time
self.save_file(cout_filename, 'cout.txt')
self.save_json(self.run_entry, 'run.json')
self.save_json(self.info, 'info.json')
if self.info:
self.save_json(self.info, 'info.json')

def completed_event(self, stop_time, result):
self.run_entry['stop_time'] = stop_time
Expand Down Expand Up @@ -196,4 +213,4 @@ class FileStorageOption(CommandLineOption):

@classmethod
def apply(cls, args, run):
run.observers.append(FileStorageObserver(args))
run.observers.append(FileStorageObserver.create(args))
1 change: 1 addition & 0 deletions tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ deps =
pyyaml
pytest-cov
coveralls
sqlalchemy

commands =
py.test \
Expand Down

0 comments on commit f6727bd

Please sign in to comment.