Alchemie is a tool to perform machine learning experiments using breze. It is also designed to work with cluster environments, such as slurm using Sebastian Urban's submit.
Alchemie aims to solve the following problems:
- quick generation of many experiments differing in hyper parameters,
- interruption and continuation of experiments,
- generation of rich reports and saving relevant results, e.g., model parameters for future analysis.
- A user will implement a python module representing an experiment. This module has a certain structure that is explained below.
- By running
alc.py create [--amount=]
randomized configurations are generated; each of these configurations corresponds to a directory on the file system contained in
<location>
, in which a filecfg.py
is placed. - In the next step, the experiment is executed via the command
alc.py run
The parameters are read from
cfg.py
in<location>
(note that the<location>
in therun
command points one level lower than in thecreate
command, namely to one specific configuration. The experiment will store relevant information in files in the configuration directory.
A alchemie experiment module is expected to implement specific functions. These are preamble
,
draw_pars
, load_data
, new_trainer
and make_report
. We will briefly describe their functionality below.
For an example, see the examples
directory of alchemie.
Signature:
def preamble(job_index):
# ...
return some_string
The idea is to make it possible to add a string prefix to each of the configuration
files cfg.py
. It will take an integer as an argument (which is unique in the
set of experiments generated in one call of create
) which can serve as meta data.
This is useful for cluster environments, where additional meta data can be stored in such files.
The cfg.py
file is automatically generated from strings, and should remain executable after adding the preamble, hence the string should be a valid Python comment.
We draw parameters randomly by calling the following function:
def draw_pars(n=1):
# ...
return iterable_of_length_n
Here, n
gives the amount of random configurations to draw. draw_pars
will then return an iterable of that length over dictionaries represeting
different parameters. This is compatible with, i.e., sklearn.grid_search.ParameterSampler
.
Each configuration is represented by a directory, which has a file cfg.py
in it. This python module contains a dictionary pars
,
which fully specifies the configuration needed in the experiment module.
Note that, apart from drawing random parameters, this function can specify (the usage of) any high-level properties of the experiment, such as preprocessing etc. The dictionary generated is passed to every other function involved.
Each machine learning experiment works on data. During each startup and resumption of an experiment, we will need to load this data and make it available to the training process. The signature of the function is as follows:
def load_data(pars):
# ...
return {'train': some_train_data, # required
'val': some_validation_data, # required
'test': some_testing_data, # not required
}
The function will take the parameters from cfg.py
above. This is particularly of interest if the parameters specify some kind of preprocessing which is done here. To work
with breze trainers, we require that the return value is a dictionary that
works for the .eval_data
field of a breze Trainer
object. The val
field will be used for validation during training.
It makes sense to populate the dictionary also with testing data, so that we can have access to it later.
This function is to create a new trainer for each configuration, and the model under consideration along with it.
def new_trainer(pars, data):
# ...
return trainer
This is the place to do things like:
- Create a model,
- Initialize its parameters,
- Set stopping, pausing and interruption criterions.
Note that you can make use of the data dictionary to fully specify things here, e.g. making use of a varying input dimensionality.
After each job running, alchemie will run make_report
and save the
results to a json file. If the job was interrupted, it will be numbered
consecutively, e.g. report-1.json
. Otherwise it will be called
report-last.json
. The signature is as follows:
def make_report(pars, trainer, data):
# ...
return {'test_loss': 0} # You want more useful info here.
The code is very short--and there is an example in the examples
directory.