Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
ExampleDeeperDeepSEA
example_inputs
README.md
kipoi_export.py
model-template.yaml

README.md

Example of exporting a model to Kipoi.

Selene enables users to experiment with modifying existing architecture or creating entirely new architectures. Selene can support research projects across different stages (development, evaluation, and application) and is most useful when a researcher wants to develop and validate a model for a new publication.

After publication, we encourage users to archive and share their models through the Kipoi model zoo so that other researchers can access, use, and build on these models.

Here we demonstrate the steps needed to contribute a model to Kipoi. This is based on the Contributing models tutorial provided in the Kipoi documentation, and some content in this document will be pulled from that tutorial.

We do plan to automate much of the work needed to export a model to Kipoi so that users can run a command in the Selene CLI to generate the same output we have here.

Requirements

To export a model to Kipoi, you should pip install kipoi and kipoiseq:

pip install -U kipoi
pip install -U kipoiseq

This should also install jinja2 for you, which we use in a script called kipoi_export.py.

Please also install docopt if you have not done so already:

conda install -c anaconda docopt

Running kipoi_export.py

In this example, we run a script kipoi_export.py with the following command:

python kipoi_export.py <path>/best_model.pth.tar \
                       <path>/class_names.txt \
                       <path>/config.yaml \
                       <path-to-output-dir>

Parameters

  • best_model.pth.tar: serialized dictionary containing the trained model state and other parameters, from Selene training
  • class_names.txt: the list of distinct classes that the model predicts
  • config.yaml: A configuration file that is used to fill out the values in model-template.yaml. The filled out template is output to model.yaml, which is a file required in Kipoi.
  • <path-to-output-dir>: the output directory (~/.kipoi/models/ModelName)

The steps taken in the script:

  1. Save only the model state dictionary (model.state_dict()) from best_model.pth.tar and writes the resulting file to the output directory.
  2. Copies the file of class names to the output directory.
  3. Uses the config YAML to populate the values in model-template.yaml and writes a model.yaml file to the output directory.

After installing kipoi, you should be able to view your Kipoi model folder (default: ~/.kipoi/models). For the model you want to submit, called ModelName, you should specify the output directory as ~/.kipoi/models/ModelName.

Running the example

Before running the command below, please download the model weights file best_model.pth.tar using

wget https://zenodo.org/record/2254804/files/best_model.pth.tar

After moving the file to example_inputs, run

python kipoi_export.py ./example_inputs/best_model.pth.tar \
                       ./example_inputs/class_names.txt \
                       ./example_inputs/config.yaml \
                       ./ExampleDeeperDeepSEA

(ExampleDeeperDeepSEA should be moved to ~/.kipoi/models/ in the end.)

If you don't want to run kipoi_export.py and still want the complete ExampleDeeperDeepSEA directory, please download the file DeeperDeepSEA.state.pth.tar using

wget https://zenodo.org/record/2254804/files/DeeperDeepSEA.state.pth.tar

and move it to ExampleDeeperDeepSEA.

(The difference between best_model.pth.tar and DeeperDeepSEA.state.pth.tar is that the latter contains only the model state dictionary and weights, whereas the former contains some extra parameters that are useful for continuing model training in Selene.)

The config YAML file

This is used to generate model.yaml from model-template.yaml.

Parameters

  • module_class: the module class name (see Formatting your model architecture file(s))
  • module_kwargs: optional, specify any arguments needed to initialize the model architecture class
    • For example:
      module_kwargs:
        arg1: val1
        arg2: val2
  • authors: list of authors (each item in the list is a dictionary with author and github)
    • For example:
      authors:
        - author: a1
          github: g1
        - author: a2
          github: g2
  • license: the model license (e.g. MIT, BSD 3-Clause). Only contribute models for which you have the rights to do so and only contribute models that permit redistribution.
  • model_name: the model name
  • trained_on_description: describe the data on which the model was trained, what the validation and testing holdouts were, etc.
  • selene_version: the version of Selene you used to train the model
  • tags: optional, specify relevant tags in a list (e.g. histone modification)
    • For example:
      tags:
        - Histone modification
        - DNA accessibility
  • seq_len: the length of the sequences the model accepts as input
  • pytorch_version: the version of PyTorch used to train the model
  • n_tasks: the number of tasks (classes/labels) the model predicts

(List is ordered in the way the parameters appear in model-template.yaml)

We recommend that you run kipoi_export.py with your filled-out config.yaml and then manually make adjustments to the generated model.yaml file in the output directory. There will be comments in the file to highlight where you might need to change something.

Specifically, the weights parameter in args should be updated after you upload your model file to Zenodo or Figshare. See model-template.yaml or the generated model.yaml for details. You can also see an example of the final model.yaml file here.

You should also update the cite_as parameter with the DOI url to your publication.

Formatting your model architecture file(s)

Move your model architecture file or module into ~/.kipoi/models/ModelName. The next sections will explain what format is expected if your model architecture should be organized as a module.

If you did NOT use Selene's NonStrandSpecific module (i.e. the non_strand_specific parameter)

If your model architecture is implemented in a single file called model_name_arch.py, you can specify module_class to be model_name_arch.ModelName.

Otherwise, you can move all your model architecture files into a directory called model_arch and import the model ModelName in __init__.py. model_arch is then a Python module that Kipoi can use to import your model architecture class. You can view our example directory model_arch to see how this is structured.

If you used Selene's NonStrandSpecific module:

We are working to automate this, but currently, we have to do a few manual steps to get our architecture formatted for export to Kipoi. This is applicable to our example, so you can refer to the files in there to see how we did this.

  1. Create a directory called model_arch. This will be the Python module that Kipoi uses to import your model architecture class.
  2. Copy the file non_strand_specific_module.py from the Selene repostory to your directory.
  3. For an architecture where the main class ModelName is in the file model_name_arch.py, you should add the line from .model_name_arch import ModelName to non_strand_specific_module.py.
  4. Next, remove the constructor input model from __init__(self, model, mode="mean") and set self.model to ModelName(**kwargs)
  5. Optional, but helpful: rename non_strand_specific_module.py to something more representative of your model architecture (e.g. wrapped_model_name.py). You can rename the class from NonStrandSpecific to WrappedModelName or something else as well. Please note that you'll need to update super(NonStrandSpecific, self).__init__() with the new class name too.
  6. Finally, import your class in the file model_arch/__init__.py (e.g. from .wrapped_model_name import WrappedModelName).

How we applied these steps in our example:

Testing

The following commands assume that you are in ~/.kipoi/models/ModelName.

Run kipoi test . in your model folder to test whether the general setup is correct.

If this is successful, run kipoi test-source dir --all to test whether all the software dependencies of the model are set up correctly and the automated tests pass.

Forking and submitting to Kipoi

Fork the https://github.com/kipoi/models repo on Github.

Add your fork as a git remote to ~/.kipoi/models.

git remote add fork https://github.com/<username>/models.git

Push to your fork

git push fork master

Submit a pull request to https://github.com/kipoi/models!

You can’t perform that action at this time.