Skip to content
Xavier edited this page Aug 7, 2018 · 28 revisions

In an effort to integrate QuagmiR outputs with the guidelines provided by the miRTop community, we have incorporated in the reports an output in the mirGFF3 format.

The generation of the GFF report requires the use of miRBase21-master.tsv as a reference file.

Columns

  • Column 1: seqID: precursor name

  • Column 2: source: databases used (miRBase21, miRBase22)

  • Column 3: type: Based on sequence ontology guidelines ref_miRNA, isomiR(SO:0002166 for ref_miRNA and SO:0002167 for isomiR)

  • Column 4: start position based on the precursor sequence

  • Column 5: end position based on the precursor sequence

  • Column 6: Distance score calculated by QuagmiR based on Levenshtein edit distance

  • Column 7: strand: In this case we are mapping against precursor sequence therefore we leave it always as +.

  • Column 8: phase: Currently not relevant. This can be: .

  • Column 9: attributes:

    • UID: unique ID based on MINTplates sequences following such structure isomiRNA-length-unique_sequence_code (eg. isomiRNA-22-RKREZNPN1)
    • Name: mature miRNA name
    • Parent: primary miRNA name
    • Variant: Categorical types describing the gain and loss of nucleotides (adapted from isomiR-SEA)
      • iso_5p: gain/loss if nucleotides on the 5' end
      • iso_3p: gain/loss if nucleotides on the 3' end
    • Hits: Number of matching locations in the database
    • Genomic: chr:start-end genomic position as indicated by QuagmiR (reference genome: GRCh38/hg38)
    • Expression: number of counts
    • Fliter: PASS/REJECT (current version only reports PASS sequences)
    • sequence: sequence of the read
    • number_of_paralogs: number of matching paralogs

Example

## VERSION: 1.0
## source-ontology: miRBase v21 doi:10.25504/fairsharing.hmgte8
## COLDATA: data/sample.fastq
hsa-mir-20a	miRBase21	SO:0002167	7	29	1.0	+	.	UID=isomiRNA-22-U0XZH3PKO; Name=hsa-miR-20a-5p; Parent=hsa-mir-20a; Variant=iso_3p:-1; Hits=1; Genomic=chr13:91350972-91350994; Expression=27; Filter=Pass; sequence=TAAAGTGCTTATAGTGCAGGTA; number_of_paralogs=1
hsa-mir-20a	miRBase21	SO:0002166	7	30	0.0	+	.	UID=isomiRNA-23-U0XZH3PK0H; Name=hsa-miR-20a-5p; Parent=hsa-mir-20a; Variant=NA; Hits=1; Genomic=chr13:91350972-91350995; Expression=23; Filter=Pass; sequence=TAAAGTGCTTATAGTGCAGGTAG; number_of_paralogs=1
hsa-mir-20a	miRBase21	SO:0002167	7	28	2.0	+	.	UID=isomiRNA-21-U0XZH3PKE; Name=hsa-miR-20a-5p; Parent=hsa-mir-20a; Variant=iso_3p:-2; Hits=1; Genomic=chr13:91350972-91350993; Expression=3; Filter=Pass; sequence=TAAAGTGCTTATAGTGCAGGT; number_of_paralogs=1
hsa-mir-20a	miRBase21	SO:0002167	7	31	1.0	+	.	UID=isomiRNA-24-U0XZH3PK2X; Name=hsa-miR-20a-5p; Parent=hsa-mir-20a; Variant=iso_3p:+1; Hits=1; Genomic=chr13:91350972-91350996; Expression=3; Filter=Pass; sequence=TAAAGTGCTTATAGTGCAGGTAGA; number_of_paralogs=1
hsa-mir-20a	miRBase21	SO:0002167	7	30	1.0	+	.	UID=isomiRNA-23-U0XZH3PK0I; Name=hsa-miR-20a-5p; Parent=hsa-mir-20a; Variant=iso_snp; Hits=1; Genomic=chr13:91350972-91350995; Expression=3; Filter=Pass; sequence=TAAAGTGCTTATAGTGCAGGTAT; number_of_paralogs=1
hsa-mir-20a	miRBase21	SO:0002167	7	27	3.0	+	.	UID=isomiRNA-20-U0XZH3PK; Name=hsa-miR-20a-5p; Parent=hsa-mir-20a; Variant=iso_3p:-3; Hits=1; Genomic=chr13:91350972-91350992; Expression=1; Filter=Pass; sequence=TAAAGTGCTTATAGTGCAGG; number_of_paralogs=1
hsa-mir-20a	miRBase21	SO:0002167	7	30	1.0	+	.	UID=isomiRNA-23-U0XZH3PK0F; Name=hsa-miR-20a-5p; Parent=hsa-mir-20a; Variant=iso_snp; Hits=1; Genomic=chr13:91350972-91350995; Expression=1; Filter=Pass; sequence=TAAAGTGCTTATAGTGCAGGTAA; number_of_paralogs=1
hsa-mir-20a	miRBase21	SO:0002167	7	31	1.0	+	.	UID=isomiRNA-24-U0XZH3PK29; Name=hsa-miR-20a-5p; Parent=hsa-mir-20a; Variant=iso_3p:+1; Hits=1; Genomic=chr13:91350972-91350996; Expression=1; Filter=Pass; sequence=TAAAGTGCTTATAGTGCAGGTAGC; number_of_paralogs=1
hsa-mir-20a	miRBase21	SO:0002167	7	31	1.0	+	.	UID=isomiRNA-24-U0XZH3PK2Z; Name=hsa-miR-20a-5p; Parent=hsa-mir-20a; Variant=iso_3p:+1; Hits=1; Genomic=chr13:91350972-91350996; Expression=1; Filter=Pass; sequence=TAAAGTGCTTATAGTGCAGGTAGT; number_of_paralogs=1
hsa-mir-20a	miRBase21	SO:0002167	7	29	2.0	+	.	UID=isomiRNA-22-UKXZH3PKO; Name=hsa-miR-20a-5p; Parent=hsa-mir-20a; Variant=iso_3p:-1,iso_snp_seed; Hits=1; Genomic=chr13:91350972-91350994; Expression=1; Filter=Pass; sequence=TAAGGTGCTTATAGTGCAGGTA; number_of_paralogs=1
hsa-mir-20a	miRBase21	SO:0002167	8	31	2.0	+	.	UID=isomiRNA-23-B3QXV4J3Z; Name=hsa-miR-20a-5p; Parent=hsa-mir-20a; Variant=iso_5p:-1,iso_3p:+1; Hits=1; Genomic=chr13:91350973-91350996; Expression=1; Filter=Pass; sequence=AAAGTGCTTATAGTGCAGGTAGT; number_of_paralogs=1
hsa-mir-19b-1	miRBase21	SO:0002166	53	76	0.0	+	.	UID=isomiRNA-23-9VBMJVBD0L; Name=hsa-miR-19b-3p-1-2; Parent=hsa-mir-19b-1; Variant=NA; Hits=2; Genomic=chr13:91351145-91351168; Expression=4; Filter=Pass; sequence=TGTGCAAATCCATGCAAAACTGA; number_of_paralogs=2
hsa-mir-19b-1	miRBase21	SO:0002167	53	75	1.0	+	.	UID=isomiRNA-22-9VBMJVBDP; Name=hsa-miR-19b-3p-1-2; Parent=hsa-mir-19b-1; Variant=iso_3p:-1; Hits=2; Genomic=chr13:91351145-91351167; Expression=2; Filter=Pass; sequence=TGTGCAAATCCATGCAAAACTG; number_of_paralogs=2
hsa-mir-92a-1	miRBase21	SO:0002166	47	69	0.0	+	.	UID=isomiRNA-22-VY2ZSR67N; Name=hsa-miR-92a-3p-1-2; Parent=hsa-mir-92a-1; Variant=NA; Hits=2; Genomic=chr13:91351261-91351283; Expression=7; Filter=Pass; sequence=TATTGCACTTGTCCCGGCCTGT; number_of_paralogs=2
hsa-mir-92a-1	miRBase21	SO:0002167	47	68	1.0	+	.	UID=isomiRNA-21-VY2ZSR670; Name=hsa-miR-92a-3p-1-2; Parent=hsa-mir-92a-1; Variant=iso_3p:-1; Hits=2; Genomic=chr13:91351261-91351282; Expression=2; Filter=Pass; sequence=TATTGCACTTGTCCCGGCCTG; number_of_paralogs=2
hsa-mir-92a-1	miRBase21	SO:0002167	47	70	1.0	+	.	UID=isomiRNA-23-VY2ZSR670B; Name=hsa-miR-92a-3p-1-2; Parent=hsa-mir-92a-1; Variant=iso_3p:+1; Hits=2; Genomic=chr13:91351261-91351284; Expression=1; Filter=Pass; sequence=TATTGCACTTGTCCCGGCCTGTA; number_of_paralogs=2

GFF Reference File

GFF report requires the use of miRBase21-master.tsv with the following structure:

MIRNA	PRI.ACCESSION	PRIMIRNA	PRI.SEQUENCE	ACCESSION	SEQUENCE	PARALOGS	MOTIF.13	N.MOTIF	NON.N	DUPLIMOTIF	UNIQUE.MOTIF	MOTIF.LEN	SEED	FAMILY	STRAND	CHROMOSOME	X.COORDINATE	Y.COORDINATE	DIRECTION	EXTENDED.SEQUENCE	DUPLI.ID	SECONDARY.STRUCTURE	ENERGY
hsa-let-7a-2-3p	MI0000061	hsa-let-7a-2	AGGTTGAGGTAGTAGGTTGTATAGTTTAGAATTACATCAAGGGAGATAACTGTACAGCCTCCTAGCTTTCCT	MIMAT0010195	CTGTACAGCCTCCTAGCTTTCC	0	ACAGCCTCCTAGC	NNANCCTCCTNNN	7	FALSE	AGCCTCCTA	9	TGTACAG	TRUE	3P	chr11	122146422	122146693	-	GCCCAAATAGGTGACAGCACGATGAATCATTATAAGACTAACTTGTAATTTCCCTGCTTAAGAAATGGTAGTTTTCCAGCCATTGTGACTGCATGCTCCCAGGTTGAGGTAGTAGGTTGTATAGTTTAGAATTACATCAAGGGAGATAACTGTACAGCCTCCTAGCTTTCCTTGGGTCTTGCACTAAACAACATGGTGAGAACGATCATGATTCCTCCAGGCCTTTTCTCCCTATGAAAGGTAAGATTGGGTACGATTATTTTATGGTATTT		(((..(((.(((.(((((((((((((.........(((......)))))))))))))))).))).))).)))	-25.2
hsa-let-7a-3p-1-2	MI0000060	hsa-let-7a-1	TGGGATGAGGTAGTAGGTTGTATAGTTTTAGGGTCACACCCACCACTGGGAGATAACTATACAATCTACTGTCTTTCCTA	MIMAT0004481	CTATACAATCTACTGTCTTTC	2	TACAATCTACTGT	NNNAATCNACNNN	6	FALSE	AATCTAC	7	TATACAA	TRUE	3P	chr9	94175857	94176136	+	TCACACAGGAAACCAGGATTACCGAGGAGGAAAAAAAGCCTTCCTGTGGTGCTCAACTGTGATTCCTTTTCACCATTCACCCTGGATGTTCTCTTCACTGTGGGATGAGGTAGTAGGTTGTATAGTTTTAGGGTCACACCCACCACTGGGAGATAACTATACAATCTACTGTCTTTCCTAACGTGATAGAAAAGTCTGCATCCAGGCGGTCTGATAGAAAGTCAGTTAACTAATTGTACAATATTTAAGATTAACTTGTCTTAAAGAGATGTAGTGCAGC		(((((.(((((((((((((((((((((.....(((...((((....)))).)))))))))))))))))))))))))))))	-34.2
hsa-let-7a-3p-1-2	MI0000062	hsa-let-7a-3	GGGTGAGGTAGTAGGTTGTATAGTTTGGGGCTCTGCCCTGCTATGGGATAACTATACAATCTACTGTCTTTCCT	MIMAT0004481	CTATACAATCTACTGTCTTTC	2	TACAATCTACTGT	NNNAATCNACNNN	6	FALSE	AATCTAC	7	TATACAA	TRUE	3P	chr22	46112649	46112922	+	TCGAGCCCCTGTTCTCCTCAGCCCTCTTTCCTCCCGCGTCCCCAGGAGGTGCCTCTGGAAGCCACGGAGTCCCATCGGCACCAAGACCGACTGCCCTTTGGGGTGAGGTAGTAGGTTGTATAGTTTGGGGCTCTGCCCTGCTATGGGATAACTATACAATCTACTGTCTTTCCTGAAGTGGCTGTAATATCTGCGGTGGACAGAGCGTCTGGAACCCTGGCTGGGAGCGGGCAGGGCCAGGTTTGGGGGCAGCCTTGGCAGCAGTCGGGGGCAG		(((((.(((((((((((((((((((((.....(((...((((....)))).)))))))))))))))))))))))))))))	-34.2
hsa-let-7b-3p	MI0000063	hsa-let-7b	CGGGGTGAGGTAGTAGGTTGTGTGGTTTCAGGGCAGTGATGTTGCCCCTCGGAAGATAACTATACAACCTACTGCCTTCCCTG	MIMAT0004482	CTATACAACCTACTGCCTTCCC	0	ACAACCTACTGCC	NNNACCTACTNNN	7	FALSE	ACCTACT	7	TATACAA	TRUE	3P	chr22	46113586	46113868	+	CCTGCCCAGCCCTCCTGCTCTGGTGACTGAGGACCGCCAGGCAGGGGCTGGTGCTGGGCGGGGGGCGGCGGGCCCTCCCGCAGTGCAAGGCCGGGCCTGGCGGGGTGAGGTAGTAGGTTGTGTGGTTTCAGGGCAGTGATGTTGCCCCTCGGAAGATAACTATACAACCTACTGCCTTCCCTGAGGAGCCCAGTGACACGACCCCATGGGAGGGCCGCCCCCTACCTCAGTGACACGACCCCACGGGAGGGCTGCCCCCCACCTCAGTGACCTGCAGGGGGCC		(((((.(((((((((((((((((((((((.((((((.....))))))...))).....)))))))))))))))))))))))))	-46.7
hsa-let-7c-3p	MI0000064	hsa-let-7c	GCATCCGGGTTGAGGTAGTAGGTTGTATGGTTTAGAGTTACACCCTGGGAGTTAACTGTACAACCTTCTAGCTTTCCTTGGAGC	MIMAT0026472	CTGTACAACCTTCTAGCTTTCC	0	ACAACCTTCTAGC	NNNANCTTNTANN	6	FALSE	ACCTTCTA	8	TGTACAA	FALSE	3P	chr21	16539728	16540011	+	TATCTATATCCTTGCCAAGCCCTTAGGTGTATGGCTGCCATATTTGGAGGAGCTGACTGAAGATATGATAAGGAGTTTGAAGCAACATTGGAAGCTGTGTGCATCCGGGTTGAGGTAGTAGGTTGTATGGTTTAGAGTTACACCCTGGGAGTTAACTGTACAACCTTCTAGCTTTCCTTGGAGCACACTTGAGCCGTCGAGGAATTCTTCATCACTTTAACCTGATTGAGCCAATTTGTGTGCAAGAAGGTAATGTGTCATGAGTATCTTGGATCATTGATTTG		((.((((((..(((.(((.(((((((((((((..((.(..((...))..).))))))))))))))).))).)))..))))))))	-31.6
hsa-let-7d-3p	MI0000065	hsa-let-7d	CCTAGGAAGAGGTAGTAGGTTGCATAGTTTTAGGGCAGGGATTTTGCCCACAAGGAGGTAACTATACGACCTGCTGCCTTTCTTAGG	MIMAT0004484	CTATACGACCTGCTGCCTTTCT	0	ACGACCTGCTGCC	NNGANCTGCTNNN	7	FALSE	GACCTGCTG	9	TATACGA	FALSE	3P	chr9	94178734	94179020	+	TTGAATTAGAAACAAAACTCAAAGAACATGACCTAATTTAACAGGTTAATTTGAAGTGCATCTGCCAAGTAGAAGACCAGCAAGAAAAAAAAAATGGGTTCCTAGGAAGAGGTAGTAGGTTGCATAGTTTTAGGGCAGGGATTTTGCCCACAAGGAGGTAACTATACGACCTGCTGCCTTTCTTAGGGCCTTATTATTCACCGATAACCTGTTTCCTTGCTACTTTGCTTTGGTGTAAGCAGAGTTCTTTCTGTAGGTTTTTTCAAATGAAAACATTGCAAGAATAT		(((((((.((((((((((((((.((((((...((((((.....))))))..........)))))).)))))))))))))))))))))	-42.7