-
Notifications
You must be signed in to change notification settings - Fork 3
4. Detection of non‐adenosines
check_tails()
is the main function which allows to classify sequencing reads based on presence/absence of non-adenosine residues within their poly(A) tails (and additional conditions, such as minimal read length and qc_tag assigned by Nanopolish polya function).
Below is an example of how to use check_tails()
function:
results <- ninetails::check_tails(
nanopolish = system.file('extdata',
'test_data',
'nanopolish_output.tsv',
package = 'ninetails'),
sequencing_summary = system.file('extdata',
'test_data',
'sequencing_summary.txt',
package = 'ninetails'),
workspace = system.file('extdata',
'test_data',
'basecalled_fast5',
package = 'ninetails'),
num_cores = 2,
basecall_group = 'Basecall_1D_000',
pass_only=TRUE,
save_dir = '~/Downloads')
This function returns a list consisting of two tables: read_classes and nonadenosine_residues. In addition, the function saves results to text files in the user-specified directory.
Moreover, the function also creates a log file in the directory specified by the user.
The Ninetails pipeline may be also launched without the wrapper - as sometimes it might be useful, especially if the input files are large and/or you would like to plot some produced matrices.
The first function in processing pipeline is create_tail_feature_list()
. It extracts the read data from the provided outputs and merges them based on read identifiers (readnames). This function works as follows:
tfl <- ninetails::create_tail_feature_list(
nanopolish = system.file('extdata',
'test_data',
'nanopolish_output.tsv',
package = 'ninetails'),
sequencing_summary = system.file('extdata',
'test_data',
'sequencing_summary.txt',
package = 'ninetails'),
workspace = system.file('extdata',
'test_data',
'basecalled_fast5',
package = 'ninetails'),
num_cores = 2,
basecall_group = 'Basecall_1D_000',
pass_only=TRUE)
The second function, create_tail_chunk_list()
, segments the reads and produces a list of segments in which a change of state (move = 1) along with significant local signal anomaly (so-called "pseudomove") has been recorded, possibly indicating the presence of a non-adenosine residue.
tcl <- ninetails::create_tail_chunk_list(tail_feature_list = tfl,
num_cores = 2)
The list of fragments should be then passed to the function create_gaf_list()
, which transforms the signals into gramian angular fields (GAFs). The function outputs a list of arrays (100,100,2). First channel of each array consists of gramian angular summation field (GASF), while the second channel consists of gramian angular difference field (GADF).
gl <- ninetails::create_gaf_list(tail_chunk_list = tcl,
num_cores = 2)
The penultimate function, predict_gaf_classes()
, launches the neural network to classify the input data. This function uses the tensorflow backend.
pl <- ninetails::predict_gaf_classes(gl)
The last function, create_outputs()
, allows to obtain the final output: a list composed of read_classes (reads are labelled accordingly as "modified", "unmodified" and "unclassified" based on applied criteria) and nonadenosine_residues (detailed positional info regarding detected nonadenosine residues) data frames. Note that in this form the function does not automatically save data to files.
out <- ninetails::create_outputs(
tail_feature_list = tfl,
tail_chunk_list = tcl,
nanopolish = system.file('extdata',
'test_data',
'nanopolish_output.tsv',
package = 'ninetails'),
predicted_list = pl,
num_cores = 2,
pass_only=TRUE)
column name | content |
---|---|
readname | an identifier of a given read (36 characters) |
contig | reference to which the given read was mapped (inherited from nanopolish) |
polya_length | tail length estimation provided by nanopolish polya function |
qc_tag | quality tag assigned by nanopolish polya function |
class | the crude result of classification |
comments | a code indicating whether the classification criteria were met/unmet |
The class
column contains information whether the given read was recognized as decorated (containing non-adenosine residue) or not. Whereas the comment
column contains details underlying the classification outcome. The content of these columns is explained below:
class | comments | explanation |
---|---|---|
decorated | YAY | move transition present, nonA residue detected |
blank | MAU | move transition absent, nonA residue undetected |
blank | MPU | move transition present, nonA residue undetected |
unclassified | QCF | nanopolish qc failed |
unclassified | NIN | not included in the analysis (pass only = T) |
unclassified | IRL | insufficient read length |
column name | content |
---|---|
readname | an identifier of a given read (36 characters) |
prediction | the result of classification (basic model: C, G, U assignment) |
est_nonA_pos | the approximate nucleotide position where nonadenosine is to be expected; reported from 5' end |
polya_length | the tail length estimated according to Nanopolish polya function |
qc_tag | quality tag assigned by nanopolish polya function |
Ninetails has been developed in the Laboratory of RNA Biology (Dziembowski Lab) at the International Institute of Molecular and Cell Biology in Warsaw.