Output documentation and examples

QuagmiR is a novel algorithm for isomiR analysis. Each miRNA sequence is divided into three regions: 5’ part, 3’ part and a central region. The latter is unique to each miRNA and will be termed as the “motif” (see more). Reads matching a certain motif are as considered potential isomiRs for the corresponding miRNA (Step 1. Motif pull).

QuagmiR algorithm

The potential isomiR reads are further filtered according to the nucleotides that precede and follow the motif (5’ part and 3’ part, respectively) (Step 2. Filtering). The sequence pairwise similarity between a read and the reference miRNA is calculated using the Levenshtein distances - the number of deletions, insertions, or substitutions required to transform one string into the other.

Example of Levenshtein distances
Reference sequence: TAGC
Seq.1 TAGC dist=0
Seq.2 TACC dist=1
Seq.3 TAGT dist=1
Seq.4 TAGC dist=0
Seq.5  AGC dist=1

The penalty or distance of any change may be fine-tuned, although the default value is 1. The filtering parameters for the 5' and 3' regions may be set independently to capture the asymmetrical distribution of the sequence heterogeneity. By this approach, users can customize the mapping process to focus on particular types of isomiRs. For example, 3’ isomiRs can be specifically targeted by setting the 5’ distance to 0 and leaving the 3’ distance open to any value (set to -1).

Description of QuagmiR Outputs

Intermediate data

file.collapsed are generated on the first step of analysis and stored under the collapsed folder. They contain the list of all sequences reported on the fastq file and the number of counts for each of them.

7608 TCTTTGGTTATCTAGCTGTATG
14861 TCTTTGGTTATCTAGCTGTATGA
 261 TCTTTGGTTATCTAGCTGTATGAA
  74 TCTTTGGTTATCTAGCTGTATGAAA
   2 TCTTTGGTTATCTAGCTGTATGAAAA
   1 TCTTTGGTTATCTAGCTGTATGAAACCA
   1 TCTTTGGTTATCTAGCTGTATGAAACCG
   1 TCTTTGGTTATCTAGCTGTATGAAAGA
   1 TCTTTGGTTATCTAGCTGTATGAAAGGGTTTGA
   1 TCTTTGGTTATCTAGCTGTATGAAATA
   1 TCTTTGGTTATCTAGCTGTATGAAATCTTGATTTTT

In this example we can see some of the reads pulled with a motif against miR-9.

Isomir summary

The report file.isomir.tsv provides the user with a comprehensive list of the isomiRs detected for each reference miRNA. The descriptor provided are the following:

MIRNA name of the miRNA
MOTIF nucleotide sequence used as motif to pull reads
CONSENSUS miRNA nucleotide reference sequence (based on miRBase)
TOTAL_READS total number of reads for that miRNA after the filtering step
TOTAL_ISOMIRS total number of isomiRs detected for that miRNA after the filtering step
FIDELITY_5P 5' isomiRs cleavage fidelity as defined in Gu et al. Cell (2012)
A_TAILING percentage of reads with A in the tail sequence
C_TAILING percentage of reads with C in the tail sequence
G_TAILING percentage of reads with G in the tail sequence
T_TAILING percentage of reads with T in the tail sequence
SEQUENCE_TRIMMING_ONLY percentage of reads with trimming, no tail added
SEQUENCE_TRIMMING percentage of reads with trimming
SEQUENCE_TAILING_ONLY percentage of reads with tail added on the top of reference sequence
SEQUENCE_TAILING percentage of reads with added tail
SEQUENCE_TRIMMING_AND_TAILING percentage of reads with both trim and tail
TOTAL_READS_IN_SAMPLE total number of reads in the sample (fastq file)

MIRNA	MOTIF	CONSENSUS	TOTAL_READS	TOTAL_ISOMIRS	FIDELITY_5P	A_TAILING	C_TAILING	G_TAILING	T_TAILING	SEQUENCE_TRIMMING_ONLY	SEQUENCE_TRIMMING	SEQUENCE_TAILING_ONLY	SEQUENCE_TAILING	SEQUENCE_TRIMMING_AND_TAILING	TOTAL_READS_IN_SAMPLE
mir-9-5p	GGTTATCTAG	TCTTTGGTTATCTAGCTGTATGA	16401	502	1.8017	8.7	5.28	59.6	26.42	8.13	18.05	57.75	67.67	9.92	744379
mir-9-3p	GCTAGATAAC	ATAAAGCTAGATAACCGAAAGT	1321	85	0.1234	16.26	28.53	35.58	19.63	24.21	41.19	1.63	18.61	16.98	744379

Isomir descriptors

The report file.isomir.sequence_info.tsvprovides the user a tabulated description of all the isomiRs detected in a sample and their isomiR features.

MIRNA name of the sequence motif that pulled that read
SEQUENCE sequence of that pulled read
LEN_READ length (nucleotides) of that pulled read
READS number of counts of that pulled read in the sample
RATIO percentage of counts of that particular isomiR sequence relative to the total number of counts considered for that miRNA in the sample
LEN_TRIM length (nucleotides) trimmed of that pulled read compared to the reference miRNA
LEN_TAIL length (nucleotides) tailed of that pulled read compared to the reference miRNA
SEQ_TAIL Nucleotide sequence of that pulled read that have been tailed compared to the reference miRNA
VAR_5P length (nucleotides) miscleaved by Drosha or Dicer of that pulled read compared to the reference miRNA
MATCH Any other motif where the pulled read is also a match
CPM or Counts per Million of that isomiR relative to the total number of reads in the sample
RPKM of that isomiR relative to the total number of reads in the sample
DISTANCE Levenshtein distances of that particular sequence to the reference miRNA for which it has been selected.

MIRNA	SEQUENCE	LEN_READ	READS	RATIO	LEN_TRIM	LEN_TAIL	SEQ_TAIL	VAR_5P	MATCH	CPM	RPKM	DISTANCE		16401
mir-9-5p	TTTGGTTATCTAGCTGTATGAG	22	4774	29.11%	0	1	G	2		269382.6882	291518.1648	3	0	2.664471679
mir-9-5p	TTTGGTTATCTAGCTGTATGAGT	23	2063	12.58%	0	2	GT	2		116408.9832	120497.2899	4	1	17.34650326
mir-9-5p	TTTGGTTATCTAGCTGTATGA	21	2052	12.51%	0	0	-	2		115788.2857	131269.5357	2	2	78.06841046
mir-9-5p	CTTTGGTTATCTAGCTGTATGA	22	1881	11.47%	0	0	-	1		106139.2619	114860.8437	1	3	1.335284434
mir-9-5p	TTTGGTTATCTAGCTGTATG	20	594	3.62%	1	0	-	2		33517.66166	39899.02993	3	4	0.268276325
mir-9-5p	TTTGGTTATCTAGCTGTATGAA	22	484	2.95%	0	1	A	2		27310.68728	29554.83698	3	5	0.310956649
mir-9-5p	TTTGGTTATCTAGCTGTATGAGC	23	282	1.72%	0	2	GC	2		15912.42523	16471.27279	4		
mir-9-5p	TTTGGTTATCTAGCTGTATGAGA	23	282	1.72%	0	2	GA	2		15912.42523	16471.27279	4		
mir-9-5p	TTTGGTTATCTAGCTGTATGG	21	231	1.41%	1	1	G	2		13034.6462	14777.41849	3		
mir-9-5p	TTTGGTTATCTAGCTGTATGAGG	23	217	1.32%	0	2	GG	2		12244.66764	12674.70282	4		
mir-9-5p	CTTTGGTTATCTAGCTGTATG	21	217	1.32%	1	0	-	1		12244.66764	13881.81737	2		
mir-9-5p	CTTTGGTTATCTAGCTGTATGG	22	176	1.07%	1	1	G	1		9931.159011	10747.21345	2		
mir-9-5p	TTTGGTTATCTAGCTGTA	18	148	0.90%	3	0	-	2		8351.201896	11045.74716	5		
mir-9-5p	TCTTTGGTTATCTAGCTGTATG	22	146	0.89%	1	0	-	0		8238.347816	8915.302066	1

Nucleotide composition at base-pair level

The report file.isomir.nucleotide_dist.tsv provides for each position on the mature miRNA sequence the number of reads covering it and the composition in nucleotides. This allows the user to easily spot any particular position where and editing event takes place or preferred nucleotide tailings by position.

MIRNA	NT_POSITION	A	C	G	T	N	READS
mir-9-5p	-4	0.0	0.0	0.0	1.0	0.0	1
mir-9-5p	-3	0.0	0.0	1.0	0.0	0.0	1
mir-9-5p	-2	0.0	0.0	1.0	0.0	0.0	2
mir-9-5p	-1	0.75	0.0	0.0	0.25	0.0	4
mir-9-5p	0	0.0022	0.0022	0.0	0.9955	0.0	16395
mir-9-5p	1	0.0003	0.9976	0.0006	0.0015	0.0	16395
mir-9-5p	2	0.0006	0.0036	0.0004	0.9954	0.0	16395
mir-9-5p	3	0.0002	0.0021	0.0003	0.9974	0.0	16395
mir-9-5p	4	0.0004	0.0021	0.0002	0.9973	0.0	16395
mir-9-5p	5	0.0	0.0	1.0	0.0	0.0	16401
mir-9-5p	6	0.0	0.0	1.0	0.0	0.0	16401
mir-9-5p	7	0.0	0.0	0.0	1.0	0.0	16401
mir-9-5p	8	0.0	0.0	0.0	1.0	0.0	16401
mir-9-5p	9	1.0	0.0	0.0	0.0	0.0	16401
mir-9-5p	10	0.0	0.0	0.0	1.0	0.0	16401
mir-9-5p	11	0.0	1.0	0.0	0.0	0.0	16401
mir-9-5p	12	0.0	0.0	0.0	1.0	0.0	16401
mir-9-5p	13	1.0	0.0	0.0	0.0	0.0	16401
mir-9-5p	14	0.0	0.0	1.0	0.0	0.0	16401
mir-9-5p	15	0.0002	0.9988	0.0	0.0009	0.0	16401
mir-9-5p	16	0.0005	0.0017	0.0002	0.9975	0.0	16393
mir-9-5p	17	0.0004	0.0002	0.9988	0.0006	0.0	16390
mir-9-5p	18	0.0006	0.003	0.0007	0.9957	0.0	16389
mir-9-5p	19	0.9935	0.0008	0.0043	0.0015	0.0	16342
mir-9-5p	20	0.0004	0.0055	0.0063	0.9878	0.0	16040
mir-9-5p	21	0.0079	0.0008	0.9878	0.0036	0.0	15975
mir-9-5p	22	0.9316	0.0064	0.0407	0.0212	0.0	14790
mir-9-5p	23	0.0585	0.0097	0.8991	0.0327	0.0	9834
mir-9-5p	24	0.0848	0.1026	0.0735	0.7392	0.0	3880
mir-9-5p	25	0.0519	0.1398	0.0652	0.743	0.0	751
mir-9-5p	26	0.1684	0.1158	0.1368	0.5789	0.0	190
mir-9-5p	27	0.0577	0.1731	0.1731	0.5962	0.0	52
mir-9-5p	28	0.16	0.04	0.28	0.52	0.0	25
mir-9-5p	29	0.1667	0.0	0.1667	0.6667	0.0	12

Example on how the nucleotide composition at base-pair level can be represented. Example miR-9-5p nucleotides

Group results

The group_results allows the user to aggregate results from multiple samples into a single report file. By default the name of this aggregated file is set to cohort1.

An example of this aggregate format looks as follows:

SAMPLE	MIRNA	MOTIF	CONSENSUS	TOTAL_READS	TOTAL_ISOMIRS	FIDELITY_5P	A_TAILING	C_TAILING	G_TAILING	T_TAILING	SEQUENCE_TRIMMING_ONLY	SEQUENCE_TRIMMING	SEQUENCE_TAILING_ONLY	SEQUENCE_TAILING	SEQUENCE_TRIMMING_AND_TAILING	TOTAL_READS_IN_SAMPLE
sample1.fastq	mir-9-5p	GGTTATCTAG	TCTTTGGTTATCTAGCTGTATGA	16401	502	1.8017	8.7	5.28	59.6	26.42	8.13	18.05	57.75	67.67	9.92	744379
sample1.fastq	mir-9-3p	GCTAGATAAC	ATAAAGCTAGATAACCGAAAGT	1321	85	0.1234	16.26	28.53	35.58	19.63	24.21	41.19	1.63	18.61	16.98	744379

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output documentation and examples

Description of QuagmiR Outputs

Intermediate data

Isomir summary

Isomir descriptors

Nucleotide composition at base-pair level

Group results

QuagmiR Basics

QuagmiR Advanced

About us

Citation

Clone this wiki locally