4. Detection of non‐adenosines

Classification of reads using wrapper function

check_tails() is the main function which allows to classify sequencing reads based on presence/absence of non-adenosine residues within their poly(A) tails (and additional conditions, such as minimal read length and qc_tag assigned by Nanopolish polya function).

Below is an example of how to use check_tails() function:

results <- ninetails::check_tails(
  nanopolish = system.file('extdata', 
                           'test_data', 
                           'nanopolish_output.tsv', 
                           package = 'ninetails'),
  sequencing_summary = system.file('extdata', 
                                   'test_data', 
                                   'sequencing_summary.txt', 
                                   package = 'ninetails'),
  workspace = system.file('extdata', 
                          'test_data', 
                          'basecalled_fast5', 
                          package = 'ninetails'),
  num_cores = 2,
  basecall_group = 'Basecall_1D_000',
  pass_only=TRUE,
  save_dir = '~/Downloads')

This function returns a list consisting of two tables: read_classes and nonadenosine_residues. In addition, the function saves results to text files in the user-specified directory.

Moreover, the function also creates a log file in the directory specified by the user.

Classification of reads using standalone functions

The Ninetails pipeline may be also launched without the wrapper - as sometimes it might be useful, especially if the input files are large and/or you would like to plot some produced matrices.

The first function in processing pipeline is create_tail_feature_list(). It extracts the read data from the provided outputs and merges them based on read identifiers (readnames). This function works as follows:

tfl <- ninetails::create_tail_feature_list(
  nanopolish = system.file('extdata',
                           'test_data', 
                           'nanopolish_output.tsv', 
                           package = 'ninetails'),
  sequencing_summary = system.file('extdata', 
                                   'test_data', 
                                   'sequencing_summary.txt',
                                   package = 'ninetails'),
  workspace = system.file('extdata', 
                          'test_data', 
                          'basecalled_fast5', 
                          package = 'ninetails'), 
  num_cores = 2,
  basecall_group = 'Basecall_1D_000', 
  pass_only=TRUE)

The second function, create_tail_chunk_list(), segments the reads and produces a list of segments in which a change of state (move = 1) along with significant local signal anomaly (so-called "pseudomove") has been recorded, possibly indicating the presence of a non-adenosine residue.

tcl <- ninetails::create_tail_chunk_list(tail_feature_list = tfl, 
                                         num_cores = 2)

The list of fragments should be then passed to the function create_gaf_list(), which transforms the signals into gramian angular fields (GAFs). The function outputs a list of arrays (100,100,2). First channel of each array consists of gramian angular summation field (GASF), while the second channel consists of gramian angular difference field (GADF).

gl <- ninetails::create_gaf_list(tail_chunk_list = tcl, 
                                 num_cores = 2)

The penultimate function, predict_gaf_classes(), launches the neural network to classify the input data. This function uses the tensorflow backend.

pl <- ninetails::predict_gaf_classes(gl)

The last function, create_outputs(), allows to obtain the final output: a list composed of read_classes (reads are labelled accordingly as "modified", "unmodified" and "unclassified" based on applied criteria) and nonadenosine_residues (detailed positional info regarding detected nonadenosine residues) data frames. Note that in this form the function does not automatically save data to files.

out <- ninetails::create_outputs(
  tail_feature_list = tfl,
  tail_chunk_list = tcl,
  nanopolish = system.file('extdata', 
                           'test_data', 
                           'nanopolish_output.tsv', 
                           package = 'ninetails'),
  predicted_list = pl,
  num_cores = 2,
  pass_only=TRUE)

Output explanation

read_classes dataframe:

column name	content
readname	an identifier of a given read (36 characters)
contig	reference to which the given read was mapped (inherited from nanopolish)
polya_length	tail length estimation provided by nanopolish polya function
qc_tag	quality tag assigned by nanopolish polya function
class	the crude result of classification
comments	a code indicating whether the classification criteria were met/unmet

The class column contains information whether the given read was recognized as decorated (containing non-adenosine residue) or not. Whereas the comment column contains details underlying the classification outcome. The content of these columns is explained below:

class	comments	explanation
decorated	YAY	move transition present, nonA residue detected
blank	MAU	move transition absent, nonA residue undetected
blank	MPU	move transition present, nonA residue undetected
unclassified	QCF	nanopolish qc failed
unclassified	NIN	not included in the analysis (pass only = T)
unclassified	IRL	insufficient read length

nonadenosine_residues dataframe:

column name	content
readname	an identifier of a given read (36 characters)
prediction	the result of classification (basic model: C, G, U assignment)
est_nonA_pos	the approximate nucleotide position where nonadenosine is to be expected; reported from 5' end
polya_length	the tail length estimated according to Nanopolish polya function
qc_tag	quality tag assigned by nanopolish polya function

Ninetails has been developed in the Laboratory of RNA Biology (Dziembowski Lab) at the International Institute of Molecular and Cell Biology in Warsaw.

Provide feedback

Saved searches