Skip to content

Latest commit

 

History

History
88 lines (56 loc) · 2.63 KB

install.md

File metadata and controls

88 lines (56 loc) · 2.63 KB

How to install DeepMicrobes

A step-by-step tutorial on installation.

Note: We are working to making DeepMicrobes available on Bioconda, so that this tutorial may be frequently updated.

1. Clone this repository

git clone https://github.com/MicrobeLab/DeepMicrobes.git

2. Create a conda environment for DeepMicrobes

(Optional) Please install conda first.

Note: We build and test DeepMicrobes on TensorFlow 1.9.0 and the compatibility to other versions is still being tested.

conda env create -f DeepMicrobes/install.yml

# activate it
source activate DeepMicrobes

3. Install other dependencies

(Optional) Install seq-shuf (only for shuffling sequences in a training set)
seq-shuf is available at https://github.com/thackl/seq-scripts
To facilitate the installation, we'd included the script in bin:

export PATH=/path/to/DeepMicrobes/bin:$PATH

(Optional) Install GNU Parallel (for acceleration of TFRecord conversion)
The parallel is also included in bin. Alternatively, you could install it using the command:

(wget -O - pi.dk/3 || curl pi.dk/3/) | bash

4. Download k-mer vocabulary files

A vocabulary file of k-mers is required for TFRecord conversion. To download the 12-mer vocabulary:

wget https://github.com/MicrobeLab/DeepMicrobes-data/raw/master/vocabulary/tokens_merged_12mers.txt.gz
gunzip tokens_merged_12mers.txt.gz

Although the vocabularies of other k-mers are not used in DeepMicrobes (except for the k-mer variant models), they could be useful when training a custom model.

wget https://github.com/MicrobeLab/DeepMicrobes-data/raw/master/vocabulary/tokens_merged_11mers.txt.gz
wget https://github.com/MicrobeLab/DeepMicrobes-data/raw/master/vocabulary/tokens_merged_10mers.txt.gz
wget https://github.com/MicrobeLab/DeepMicrobes-data/raw/master/vocabulary/tokens_merged_9mers.txt.gz
wget https://github.com/MicrobeLab/DeepMicrobes-data/raw/master/vocabulary/tokens_merged_8mers.txt.gz

gunzip tokens_merged_11mers.txt.gz
gunzip tokens_merged_10mers.txt.gz
gunzip tokens_merged_9mers.txt.gz
gunzip tokens_merged_8mers.txt.gz

The vocabulary file can be stored in any dictionary (hereafter referred to as /path/to/vocab/).


5. Add scripts to path

The pipelines dictionary contains wrapper shell scripts for TFRecord conversion, model training, and classification.

export PATH=/path/to/DeepMicrobes/pipelines:$PATH
export PATH=/path/to/DeepMicrobes/scripts:$PATH
export PATH=/path/to/DeepMicrobes:$PATH