DeepSelectNet

DeepSelectNet is an improved 1D ResNet based model to classify Oxford Nanopore raw electrical signals as target or non-target for Read-Until sequence enrichment or depletion. DeepSelectNet provides enhanced model performances.

Abstract

Nanopore sequencing allows selective sequencing, the ability to programmatically reject unwanted reads in a sample. Selective sequencing has many present and future applications in genomics research and the classification of species from a pool of species is an example. Existing methods for selective sequencing for species classification are still immature and the accuracy highly varies depending on the datasets. For the five datasets we tested, the accuracy of existing methods varied in the range of ∼77%-97% (average accuracy <89%). Here we present DeepSelectNet, an accurate deep-learning-based method that can directly classify nanopore current signals belonging to a particular species. DeepSelectNet utilizes novel data preprocessing techniques and improved neural network architecture for regularization.

Pre-print: https://www.biorxiv.org/content/10.1101/2022.10.24.513498v1

Installation

Prerequisities

Python 3.5 >= version <= 3.8
Python venv

Steps(On Linux)

Open a terminal in the root directory of the code repository.
Create a python3 virtual environment named deepselectenv

python3 -m venv deepselectenv

Use the following command to activate the virtual environment created.

source deepselectenv/bin/activate

Install required packages in the virtual enviroment.

pip install -r requirements.txt

[Optional] To Lave the environment when not in use.

deactivate

Scripts

1. Preprocessor

Preprocess the slow5 files into numpy dumps so that they can be used for training

Args
- pos_s5 - Path to positive slow5 file
- neg_s5 - Path to negative slow5 file
- c - Read signal cutoff value (default=1500)
- sz - Read signal sample size (default=3000)
- sco - Subsampling coefficient/Number of random samplings from a single read (default=4)
- b - Number of slow5 read samples should be in a numpy dump (default=20000)
- pico - Is enabled pico conversion (default=True)
- mad - Median absolute deviation value for data normalization (default=3)
- rep - Is repeatedly normalized or not (default=False)
- o - Numpy dump output path

python scripts/preprocessor.py -pos_s5 <pos_slow5> -neg_s5 <neg_slow5> -b 20000 -c 1500 sco 4 -mad 3 -o <output_dir>

Note:

Num. of reads from source(pos_s5, neg_s5) should be enough to generate a balanced dataset with Num. of reads equal to batch size(b).
Read lengths in source reads should be larger than cutoff value(c) + sample size(sz).
support scripts may come useful in these manipulations.

2. Trainer

Train the model for given dataset using dumped numpy arrays

Args
- d - Path to numpy dump directory
- c - Classifier architecture -> FCN|ResNet|InceptionNet|TransformerNet (default=ResNet)
- lf - Loss function of the model -> bc|cc|scc (default=bc)
- s - The split ratio between train and validation (default=0.75)
- occ - Perform one class classification or not (default=False)
- oh - Is one hot encoded labels or not (default=False)
- k - Number cross validation folds (default=10)
- e - Number epochs (default=10)
- b - Batch size (default=1000)
- o - Trained model output path

python scripts/trainer.py -d <npy_dump_dir> -s 0.7 -k 5 -e 200 -o <output_dir>

3. Inference

Predict the class of unseen slow5 reads with trained model

Args
- model - Path to trained model directory
- s5 - Path to slow5 directory required to predict
- b - The batch size (default=1)
- c - The read signal cutoff value (default=1500)
- sz - Read signal sample size (default=3000)
- sco - Subsampling coefficient/Number of random samplings from a single read (default=1)
- pico - Is enabled pico conversion (default=True)
- lb - Class label of the preprocessing dataset (default=1) | 1 -> positive class | 0 -> negative class
- mad - Median absolute deviation value for data normalization (default=3)
- rep - Is repeatedly normalized or not (default=False)
- o - Predictions output path

python scripts/inference.py -model <saved_model_dir> -s5 <slow5_dir> -lb 1 -mad 3 -o <output_dir>

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
core		core
figures		figures
scripts		scripts
support		support
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSelectNet

Abstract

Installation

Prerequisities

Steps(On Linux)

Scripts

1. Preprocessor

2. Trainer

3. Inference

Support Scripts

About

Releases

Packages

Languages

License

AnjanaSenanayake/DeepSelectNet

Folders and files

Latest commit

History

Repository files navigation

DeepSelectNet

Abstract

Installation

Prerequisities

Steps(On Linux)

Scripts

1. Preprocessor

2. Trainer

3. Inference

Support Scripts

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages