Example of exporting a model to Kipoi.
Selene enables users to experiment with modifying existing architecture or creating entirely new architectures. Selene can support research projects across different stages (development, evaluation, and application) and is most useful when a researcher wants to develop and validate a model for a new publication.
After publication, we encourage users to archive and share their models through the Kipoi model zoo so that other researchers can access, use, and build on these models.
Here we demonstrate the steps needed to contribute a model to Kipoi. This is based on the Contributing models tutorial provided in the Kipoi documentation, and some content in this document will be pulled from that tutorial.
We do plan to automate much of the work needed to export a model to Kipoi so that users can run a command in the Selene CLI to generate the same output we have here.
To export a model to Kipoi, you should pip install
pip install -U kipoi pip install -U kipoiseq
This should also install jinja2 for you, which we use in a script called
Please also install
docopt if you have not done so already:
conda install -c anaconda docopt
In this example, we run a script
kipoi_export.py with the following command:
python kipoi_export.py <path>/best_model.pth.tar \ <path>/class_names.txt \ <path>/config.yaml \ <path-to-output-dir>
best_model.pth.tar: serialized dictionary containing the trained model state and other parameters, from Selene training
class_names.txt: the list of distinct classes that the model predicts
config.yaml: A configuration file that is used to fill out the values in
model-template.yaml. The filled out template is output to
model.yaml, which is a file required in Kipoi.
<path-to-output-dir>: the output directory (
The steps taken in the script:
- Save only the model state dictionary (
best_model.pth.tarand writes the resulting file to the output directory.
- Copies the file of class names to the output directory.
- Uses the config YAML to populate the values in
model-template.yamland writes a
model.yamlfile to the output directory.
kipoi, you should be able to view your Kipoi model folder (default:
~/.kipoi/models). For the model you want to submit, called
ModelName, you should specify the output directory as
Running the example
Before running the command below, please download the model weights file
After moving the file to
python kipoi_export.py ./example_inputs/best_model.pth.tar \ ./example_inputs/class_names.txt \ ./example_inputs/config.yaml \ ./ExampleDeeperDeepSEA
ExampleDeeperDeepSEA should be moved to
~/.kipoi/models/ in the end.)
If you don't want to run
kipoi_export.py and still want the complete
ExampleDeeperDeepSEA directory, please download the file
and move it to
(The difference between
DeeperDeepSEA.state.pth.tar is that the latter contains only the model state dictionary and weights, whereas the former contains some extra parameters that are useful for continuing model training in Selene.)
The config YAML file
This is used to generate
module_class: the module class name (see Formatting your model architecture file(s))
module_kwargs: optional, specify any arguments needed to initialize the model architecture class
- For example:
module_kwargs: arg1: val1 arg2: val2
- For example:
authors: list of authors (each item in the list is a dictionary with
- For example:
authors: - author: a1 github: g1 - author: a2 github: g2
- For example:
license: the model license (e.g. MIT, BSD 3-Clause). Only contribute models for which you have the rights to do so and only contribute models that permit redistribution.
model_name: the model name
trained_on_description: describe the data on which the model was trained, what the validation and testing holdouts were, etc.
selene_version: the version of Selene you used to train the model
tags: optional, specify relevant tags in a list (e.g. histone modification)
- For example:
tags: - Histone modification - DNA accessibility
- For example:
seq_len: the length of the sequences the model accepts as input
pytorch_version: the version of PyTorch used to train the model
n_tasks: the number of tasks (classes/labels) the model predicts
(List is ordered in the way the parameters appear in
We recommend that you run
kipoi_export.py with your filled-out
config.yaml and then manually make adjustments to the generated
model.yaml file in the output directory. There will be comments in the file to highlight where you might need to change something.
weights parameter in
args should be updated after you upload your model file to Zenodo or Figshare. See
model-template.yaml or the generated
model.yaml for details. You can also see an example of the final
model.yaml file here.
You should also update the
cite_as parameter with the DOI url to your publication.
Formatting your model architecture file(s)
Move your model architecture file or module into
~/.kipoi/models/ModelName. The next sections will explain what format is expected if your model architecture should be organized as a module.
If you did NOT use Selene's NonStrandSpecific module (i.e. the
If your model architecture is implemented in a single file called
model_name_arch.py, you can specify
module_class to be
Otherwise, you can move all your model architecture files into a directory called
model_arch and import the model
model_arch is then a Python module that Kipoi can use to import your model architecture class. You can view our example directory
model_arch to see how this is structured.
If you used Selene's NonStrandSpecific module:
We are working to automate this, but currently, we have to do a few manual steps to get our architecture formatted for export to Kipoi. This is applicable to our example, so you can refer to the files in there to see how we did this.
- Create a directory called
model_arch. This will be the Python module that Kipoi uses to import your model architecture class.
- Copy the file
non_strand_specific_module.pyfrom the Selene repostory to your directory.
- For an architecture where the main class
ModelNameis in the file
model_name_arch.py, you should add the line
from .model_name_arch import ModelNameto
- Next, remove the constructor input
__init__(self, model, mode="mean")and set
- Optional, but helpful: rename
non_strand_specific_module.pyto something more representative of your model architecture (e.g.
wrapped_model_name.py). You can rename the class from
WrappedModelNameor something else as well. Please note that you'll need to update
super(NonStrandSpecific, self).__init__()with the new class name too.
- Finally, import your class in the file
.wrapped_model_name import WrappedModelName).
How we applied these steps in our example:
- Create the directory
- The architecture file is
deeper_deepsea_arch.py, which contains the architecture class
wrapped_deeper_deepsea.pyand update the class name in the file to
WrappedDeeperDeepSEA(see lines 27 and 55).
DeeperDeepSEAwith the line
from .deeper_deepsea_arch import DeeperDeepSEA.
self.model = DeeperDeepSEA(1000, 919)(see lines 54 and 57 in the file).
- Create the file
model_archwith the line
from .wrapped_deeper_deepsea import WrapperDeeperDeepSEA.
The following commands assume that you are in
kipoi test . in your model folder to test whether the general setup is correct.
If this is successful, run
kipoi test-source dir --all to test whether all the software dependencies of the model are set up correctly and the automated tests pass.
Forking and submitting to Kipoi
Fork the https://github.com/kipoi/models repo on Github.
Add your fork as a git remote to
git remote add fork https://github.com/<username>/models.git
Push to your fork
git push fork master
Submit a pull request to https://github.com/kipoi/models!