GitHub - rrwick/sloika: Sloika is Oxford Nanopore Technologies' software for training neural network models for base calling

Sloika is ONT research software for training RNN models for basecalling Oxford Nanopore reads. Sloika is built on top of Theano and is compatible with python 3.4+

Fork details

I (Ryan) made this fork of Sloika to add the following features:

Reduce RAM requirements with large training sets:
- Multiple HDF5 training files can be provided, and Sloika will load a random subset at a time (configured by --input_load).
- Sloika will reload a fresh selection of training data after every N batches (configured by --reload_after_batches)
- Previously, the total amount of training data you could use was limited by your RAM (because Sloika loaded all training data into memory). Now it only loads a subset at a time, so there's no limit on the amount of training data you can use.
Don't start decaying the learning rate until the accuracy has exceeded 70%. This is to handle the fact that the training seems to fumble around for a while before having any success. I wanted to keep the learning rate high during this period.
Added some custom models that build upon the successful rgrgr model.
Changed the chunkify command to write strands to file as it goes:
- Helps when chunkify processes hang or run out of memory – you can still have the results up to that point.
- Also allows multiple chunkify processes to write to the same strands file, making parallelisation easier.

Installation of system prerequisites

sudo make deps

This will install required system packages on Debian-based Linux distros.

Setting up clean development environment

make cleanDevEnv
source build/env/bin/activate

This will create and activate a python virtual environment in build/env.

Running unit tests in development mode

make

For this step to function development environment needs to be set up, and make deps must have been installed.

Note on `THEANO_FLAGS`

To use Theano effectively, A typical set of Theano flags might look like:

export THEANO_FLAGS=openmp=True,floatX=float32,warn_float64=warn,optimizer=fast_run,device=gpu0,scan.allow_gc=False,lib.cnmem=0.3

The Theano flags used for the tests are defined in the environment file; you can edit these to test your configuration.

Flag	Description
openmp=True	Use openmp for calculations.
floatX=float32	Internal floats are single (32bit) precision. This is required for most GPUs.
warn_float64=warn	Warn if double (64bit) precision floats are accidentally used but continue. warn_float64=raise might be given instead to stop the calculation if a double precision float is encountered.
optimizer=fast_run	Spend more time optimising the expression graph to make the code run faster. For testing optimizer=fast_compile might be used instead.
device=gpu0	Which device to run the calculation on? Common options are cpu and gpuX, where X is the id of the GPU to be used (commonly gpu0).
scan.allow_gc=False	Don't allow garbage collection (freeing of memory) during 'scan' operations. This makes recurrent layers quicker at the expensive of higher memory usage.
lib.cnmem=0.4	Use the CUDA CNMEM library for memory allocation. This will improve GPU performance but requires all the memory to be allocated at the beginning of the calculation. The argument is the proportion of the GPU memory to initially allocate. As a guide, 0.4 is a good number for training since it allows two runs to both use the same GPU. For programs run on a per-read basis, basecalling and mapping, a smaller proportion like 0.05 is more appropriate.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
bin		bin
data		data
misc		misc
models		models
scripts		scripts
sloika		sloika
test/unit		test/unit
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
Makefile		Makefile
Makefile.common		Makefile.common
README.md		README.md
environment		environment
requirements.txt		requirements.txt
setup-dev-env.sh		setup-dev-env.sh
setup-dev-env.txt		setup-dev-env.txt
setup-virtualenv.sh		setup-virtualenv.sh
setup.py		setup.py

Navigation Menu

License

rrwick/sloika

Folders and files

Latest commit

History

Repository files navigation

Fork details

Installation of system prerequisites

Setting up clean development environment

Running unit tests in development mode

Note on THEANO_FLAGS

About

Resources

License

Stars

Watchers

Forks

Languages

Note on `THEANO_FLAGS`