Skip to content

Commit

Permalink
add doc
Browse files Browse the repository at this point in the history
  • Loading branch information
Juke34 committed Apr 14, 2021
1 parent bf48b0d commit 6af524d
Show file tree
Hide file tree
Showing 78 changed files with 4,620 additions and 0 deletions.
106 changes: 106 additions & 0 deletions docs/Home.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# AGAT - **A**nother **G**tf/gff **A**nalysis **T**oolkit
## Suite of tools to handle gene annotations in any GTF/GFF format.
---------------------------------------------

# Table of Contents

* [Foreword](#foreword)
* [List of AGAT tools (v0.6.0)](#list-of-agat-tools-v060)
* [Topological sorting of gff features](https://github.com/NBISweden/AGAT/wiki/Topological-sorting-of-gff-features)

## Foreword
Providing support in genome annotation within [NBIS](https://nbis.se) the GTF/GFF format is the main format I handle. I receive from customers file in GTF/GFF format coming from a broad range of sources. Even sometimes files from mixed sources (concatenated in the same file), or manually edited.
The problem is that often those files do not follow the official specifications or even if they do, they are not even be sure to be compatible we the inputs expected by the tools.

* The main idea was **first** to be able to **parse all possible cases** that can be met (I listed more than 30 cases). To my knowledge AGAT is the only one able to handle all of them.

* The **second** idea was to be able to **create a full standardised GFF3** file that could actually fit in any tool.
Once again AGAT is the only one recreating fully the missing information:
* missing features (gene, mRNA, tRNA, exon, UTRs, etc...)
* missing attributes (ID, Parent).

and fixing wrong information:
* identifier to be uniq.
* feature location (e.g mRNA will be stretched if shorter than its exons).
* remove duplicated features.
* merge overlapping loci (if option activate because for prokaryote is not something we would like)

* The **third** idea was to have a **correct topological sorting output**. To my knowledge AGAT is the only one dealing properly with this task. More information about it [here](https://github.com/NBISweden/AGAT/wiki/Topological-sorting-of-gff-features).

* **Finally**, based on the abilities described previously I have developed a **toolkit to perform different tasks**. Some are originals, some are similar than what other tools could offer, but within AGAT they will always have the strength of the 3 first points.


**A final word**
AGAT can solve lot of complicated cases and save headaches.
Enjoy!!

## List of AGAT tools (v0.6.1)
[agat_convert_bed2gff.pl](https://github.com/NBISweden/AGAT/wiki/agat_convert_bed2gff)
[agat_convert_embl2gff.pl](https://github.com/NBISweden/AGAT/wiki/agat_convert_embl2gff)
[agat_convert_genscan2gff.pl](https://github.com/NBISweden/AGAT/wiki/agat_convert_genscan2gff)
[agat_convert_mfannot2gff.pl](https://github.com/NBISweden/AGAT/wiki/agat_convert_mfannot2gff)
[agat_convert_minimap2_bam2gff.pl](https://github.com/NBISweden/AGAT/wiki/agat_convert_minimap2_bam2gff)
[agat_convert_sp_gff2bed.pl](https://github.com/NBISweden/AGAT/wiki/agat_convert_sp_gff2bed)
[agat_convert_sp_gff2gtf.pl](https://github.com/NBISweden/AGAT/wiki/agat_convert_sp_gff2gtf)
[agat_convert_sp_gff2tsv.pl](https://github.com/NBISweden/AGAT/wiki/agat_convert_sp_gff2tsv)
[agat_convert_sp_gff2zff.pl](https://github.com/NBISweden/AGAT/wiki/agat_convert_sp_gff2zff)
[agat_convert_sp_gxf2gxf.pl](https://github.com/NBISweden/AGAT/wiki/agat_convert_sp_gxf2gxf)
[agat_sp_Prokka_inferNameFromAttributes.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_Prokka_inferNameFromAttributes)
[agat_sp_add_introns.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_add_introns)
[agat_sp_add_start_and_stop.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_add_start_and_stop)
[agat_sp_alignment_output_style.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_alignment_output_style)
[agat_sp_clipN_seqExtremities_and_fixCoordinates.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_clipN_seqExtremities_and_fixCoordinates)
[agat_sp_compare_two_BUSCOs.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_compare_two_BUSCOs)
[agat_sp_compare_two_annotations.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_compare_two_annotations)
[agat_sp_complement_annotations.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_complement_annotations)
[agat_sp_ensembl_output_style.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_ensembl_output_style)
[agat_sp_extract_attributes.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_extract_attributes)
[agat_sp_extract_sequences.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_extract_sequences)
[agat_sp_filter_by_ORF_size.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_filter_by_ORF_size)
[agat_sp_filter_by_locus_distance.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_filter_by_locus_distance)
[agat_sp_filter_by_mrnaBlastValue.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_filter_by_mrnaBlastValue)
[agat_sp_filter_feature_by_attribute_presence.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_filter_feature_by_attribute_presence)
[agat_sp_filter_feature_by_attribute_value.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_filter_feature_by_attribute_value)
[agat_sp_filter_feature_from_keep_list.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_filter_feature_from_keep_list)
[agat_sp_filter_feature_from_kill_list.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_filter_feature_from_kill_list)
[agat_sp_filter_gene_by_intron_numbers.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_filter_gene_by_intron_numbers)
[agat_sp_filter_gene_by_length.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_filter_gene_by_length)
[agat_sp_filter_incomplete_gene_coding_models.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_filter_incomplete_gene_coding_models)
[agat_sp_filter_record_by_coordinates.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_filter_record_by_coordinates)
[agat_sp_fix_cds_phases.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_fix_cds_phases)
[agat_sp_fix_features_locations_duplicated.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_fix_features_locations_duplicated)
[agat_sp_fix_fusion.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_fix_fusion)
[agat_sp_fix_longest_ORF.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_fix_longest_ORF)
[agat_sp_fix_overlaping_genes.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_fix_overlaping_genes)
[agat_sp_fix_small_exon_from_extremities.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_fix_small_exon_from_extremities)
[agat_sp_flag_premature_stop_codons.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_flag_premature_stop_codons)
[agat_sp_flag_short_introns.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_flag_short_introns)
[agat_sp_functional_statistics.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_functional_statistics)
[agat_sp_keep_longest_isoform.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_keep_longest_isoform)
[agat_sp_kraken_assess_liftover.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_kraken_assess_liftover)
[agat_sp_list_short_introns.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_list_short_introns)
[agat_sp_load_function_from_protein_align.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_load_function_from_protein_align)
[agat_sp_manage_IDs.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_manage_IDs)
[agat_sp_manage_UTRs.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_manage_UTRs)
[agat_sp_manage_attributes.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_manage_attributes)
[agat_sp_manage_functional_annotation.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_manage_functional_annotation)
[agat_sp_manage_introns.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_manage_introns)
[agat_sp_merge_annotations.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_merge_annotations)
[agat_sp_prokka_fix_fragmented_gene_annotations.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_prokka_fix_fragmented_gene_annotations)
[agat_sp_sensitivity_specificity.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_sensitivity_specificity)
[agat_sp_separate_by_record_type.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_separate_by_record_type)
[agat_sp_statistics.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_statistics)
[agat_sp_webApollo_compliant.pl](https://github.com/NBISweden/AGAT/wiki/agat_sp_webApollo_compliant)
[agat_sq_add_attributes_from_tsv.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_add_attributes_from_tsv)
[agat_sq_add_hash_tag.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_add_hash_tag)
[agat_sq_add_locus_tag.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_add_locus_tag)
[agat_sq_keep_annotation_from_fastaSeq.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_keep_annotation_from_fastaSeq)
[agat_sq_list_attributes.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_list_attributes)
[agat_sq_manage_IDs.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_manage_IDs)
[agat_sq_manage_attributes.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_manage_attributes)
[agat_sq_mask.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_mask)
[agat_sq_remove_redundant_entries.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_remove_redundant_entries)
[agat_sq_repeats_analyzer.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_repeats_analyzer)
[agat_sq_rfam_analyzer.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_rfam_analyzer)
[agat_sq_split.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_split)
[agat_sq_stat_basic.pl](https://github.com/NBISweden/AGAT/wiki/agat_sq_stat_basic)
58 changes: 58 additions & 0 deletions docs/agat_convert_bed2gff.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# NAME

agat\_convert\_bed2gff.pl

# DESCRIPTION

The script takes a bed file as input, and will translate it in gff format.
The BED format is described here: https://genome.ucsc.edu/FAQ/FAQformat.html#format1
The script converts 0-based, half-open \[start-1, end) bed file to
1-based, closed \[start, end\] General Feature Format v3 (GFF3).

# SYNOPSIS

```
agat_convert_bed2gff.pl --bed infile.bed [ -o outfile ]
agat_convert_bed2gff.pl -h
```

# OPTIONS

- **--bed**

Input bed file that will be converted.

- **--source**

The source informs about the tool used to produce the data and is stored in 2nd field of a gff file.
Example: Stringtie,Maker,Augustus,etc. \[default: data\]

- **--primary\_tag**

The primary\_tag corresponds to the data type and is stored in 3rd field of a gff file.
Example: gene,mRNA,CDS,etc. \[default: gene\]

- **--inflate\_off**

By default we inflate the block fields (blockCount, blockSizes, blockStarts) to create subfeatures
of the main feature (primary\_tag). The type of subfeature created is based on the
inflate\_type parameter. If you do not want this inflating behaviour you can deactivate it
by using the --inflate\_off option.

- **--inflate\_type**

Feature type (3rd column in gff) created when inflate parameter activated \[default: exon\].

- **--verbose**

add verbosity

- **-o** , **--output** , **--out** , **--outfile** or **--gff**

Output GFF file. If no output file is specified, the output will be
written to STDOUT.

- **-h** or **--help**

Display this helpful text.

38 changes: 38 additions & 0 deletions docs/agat_convert_embl2gff.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# NAME

gaas\_converter\_embl2gff.pl

# DESCRIPTION

The script takes an EMBL file as input, and will translate it in gff format.

# SYNOPSIS

```
gaas_converter_embl2gff.pl --embl infile.embl [ -o outfile ]
```

# OPTIONS

- **--embl**

Input EMBL file that will be read

- **--primary\_tag**, **--pt**, **-t**

List of "primary tag". Useful to discard or keep specific features.
Multiple tags must be coma-separated.

- **-d**

Means that primary tags provided by the option "primary\_tag" will be discarded.

- **-o**, **--output**, **--out**, **--outfile** or **--gff**

Output GFF file. If no output file is specified, the output will be
written to STDOUT.

- **-h** or **--help**

Display this helpful text.

65 changes: 65 additions & 0 deletions docs/agat_convert_genscan2gff.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# NAME

agat\_convert\_genscan2gff.pl

# DESCRIPTION

The script takes a genscan file as input, and will translate it in gff format.
The genscan format is described here: http://genome.crg.es/courses/Bioinformatics2003\_genefinding/results/genscan.html
/!\\ vvv Known problem vvv /!\\
You must have submited only DNA sequence, wihtout any header!!
Indeed the tool expects only DNA sequences and does not crash/warn if an header
is submited along the sequence.
e.g If you have an header ">seq" s-e-q are seen as the 3 first nucleotides of the sequence.
Then all prediction location are shifted accordingly.
(checked only on the online version http://argonaute.mit.edu/GENSCAN.html. I don't
know if there is the same pronlem elsewhere.)
/!\\ ^^^ Known problem ^^^^ /!\\

# SYNOPSIS

```
agat_convert_genscan2gff.pl --genscan infile.bed [ -o outfile ]
agat_convert_genscan2gff.pl -h
```

# OPTIONS

- **--genscan** or **-g**

Input bed file that will be convert.

- **--source**

The source informs about the tool used to produce the data and is stored in 2nd field of a gff file.
Example: Stringtie,Maker,Augustus,etc. \[default: data\]

- **--primary\_tag**

The primary\_tag corresponf to the data type and is stored in 3rd field of a gff file.
Example: gene,mRNA,CDS,etc. \[default: gene\]

- **--inflate\_off**

By default we inflate the block fields (blockCount, blockSizes, blockStarts) to create subfeatures
of the main feature (primary\_tag). Type of subfeature created based on the
inflate\_type parameter. If you don't want this inflating behaviour you can deactivate it
by using the option --inflate\_off.

- **--inflate\_type**

Feature type (3rd column in gff) created when inflate parameter activated \[default: exon\].

- **--verbose**

add verbosity

- **-o** , **--output** , **--out** , **--outfile** or **--gff**

Output GFF file. If no output file is specified, the output will be
written to STDOUT.

- **-h** or **--help**

Display this helpful text.

36 changes: 36 additions & 0 deletions docs/agat_convert_mfannot2gff.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# NAME

gaas\_convert\_mfannot2gff.pl

# DESCRIPTION

Conversion utility for MFannot "masterfile" annotation produced by the MFannot
pipeline (http://megasun.bch.umontreal.ca/RNAweasel/). Reports GFF3 format.

# SYNOPSIS

```
gaas_convert_mfannot2gff.pl -m <mfannot> -o <gff>
gaas_convert_mfannot2gff.pl --help
```

# COPYRIGHT AND LICENSE

Copyright (C) 2015, Brandon Seah (kbseah@mpi-bremen.de)
... GPL-3 ...
modified by jacques dainat 2017-11

# OPTIONS

- **-m** or **-i** or **--mfannot**

The mfannot input file

- **-g** or **-o** or **--gff**

the gff output file

- **-h** or **--help**

Display this helpful text.

44 changes: 44 additions & 0 deletions docs/agat_convert_minimap2_bam2gff.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# NAME

agat\_convert\_sp\_minimap2\_bam2gff.pl

# DESCRIPTION

The script converts output from minimap2 (bam or sam) into gff file.
To get bam from minimap2 use the following command:
minimap2 -ax splice:hq genome.fa Asecodes\_parviclava.nucest.fa | samtools sort -O BAM -o output.bam
To use bam with this script you will need samtools in your path.

# SYNOPSIS

```
agat_convert_sp_minimap2_bam2gff.pl -i infile.bam [ -o outfile ]
agat_convert_sp_minimap2_bam2gff.pl -i infile.sam [ -o outfile ]
agat_convert_sp_minimap2_bam2gff.pl --help
```

# OPTIONS

if ( !GetOptions( 'i|input=s' => \\$opt\_in,

- **-i** or **--input**

Input file in sam (.sam extension) or bam (.bam extension) format.

- **-b** or **--bam**

To force to use the input file as sam file.

- **-s** or **--sam**

To force to use the input file as sam file.

- **-o**, **--out** or **--output**

Output GFF file. If no output file is specified, the output will be
written to STDOUT.

- **-h** or **--help**

Display this helpful text.

0 comments on commit 6af524d

Please sign in to comment.