Skip to content

Commit

Permalink
Merge e81750d into aaf0e4e
Browse files Browse the repository at this point in the history
  • Loading branch information
Juke34 committed Oct 22, 2022
2 parents aaf0e4e + e81750d commit 4495e8c
Show file tree
Hide file tree
Showing 339 changed files with 8,105 additions and 3,517 deletions.
5 changes: 2 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@ Makefile
pm_to_blib
blib
MANIFEST*
t/scripts_output/*.index
t/scripts_output/agat_sp_manage_functional_annotation/*.index
t/scripts_output/in/*.index
t/scripts_output/in/agat_sp_manage_functional_annotation/*.index
cover_db
*.log

3 changes: 2 additions & 1 deletion MYMETA.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,12 +52,13 @@ requires:
Time::Seconds: '0'
Try::Tiny: '0'
URI::Escape: '0'
YAML: '0'
perl: '5.006'
strict: '0'
warnings: '0'
resources:
bugtracker: https://github.com/NBISweden/AGAT/issues
homepage: https://nbis.se
repository: https://github.com/NBISweden/AGAT.git
version: v0.9.1
version: v1.0.0
x_serialization_backend: 'CPAN::Meta::YAML version 0.018'
3 changes: 2 additions & 1 deletion Makefile.PL
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ my %prereq_hash = ( "Bio::DB::Fasta" => 0,
"IO::File" => 0,
"IPC::Open2" => 0,
"JSON" => 0,
"YAML" => 0,
"LWP::UserAgent" => 0,
"LWP::Protocol::https" => 0,
"List::MoreUtils" => 0,
Expand All @@ -72,7 +73,7 @@ my %prereq_hash = ( "Bio::DB::Fasta" => 0,
my %WriteMakefileArgs = (
NAME => 'AGAT',
AUTHOR => 'Jacques Dainat <jacques.dainat@nbis.se>',
VERSION_FROM => 'lib/AGAT/Omniscient.pm',
VERSION_FROM => 'lib/AGAT/AGAT.pm',
ABSTRACT => 'Module to deal comprehensively with GFF and GTF format',
LICENSE => 'gpl_3',
PREREQ_PM => \%prereq_hash, # give a ref to the hash
Expand Down
21 changes: 13 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ You will have to install all prerequisites and AGAT manually.
* using cpan or cpanm

```
cpanm install bioperl Clone Graph::Directed LWP::UserAgent JSON Carp Sort::Naturally File::Share File::ShareDir::Install Moose LWP::Protocol::https
cpanm install bioperl Clone Graph::Directed LWP::UserAgent Carp Sort::Naturally File::Share File::ShareDir::Install Moose YAML LWP::Protocol::https
```

* using conda
Expand All @@ -206,13 +206,13 @@ You will have to install all prerequisites and AGAT manually.
* manually

```
conda install perl-bioperl perl-clone perl-graph perl-lwp-simple perl-json perl-carp perl-sort-naturally perl-file-share perl-file-sharedir-install perl-moose perl-lwp-protocol-https
conda install perl-bioperl perl-clone perl-graph perl-lwp-simple perl-carp perl-sort-naturally perl-file-share perl-file-sharedir-install perl-moose perl-yaml perl-lwp-protocol-https
```

* using your package management tool (e.g apt for Debian, Ubuntu, and related Linux distributions)

```
apt install libbio-perl-perl libclone-perl libgraph-perl liblwp-useragent-determined-perl libstatistics-r-perl libjson-perl libcarp-clan-perl libsort-naturally-perl libfile-share-perl libfile-sharedir libfile-sharedir-install-perl libany-moose-perl liblwp-protocol-https-perl
apt install libbio-perl-perl libclone-perl libgraph-perl liblwp-useragent-determined-perl libstatistics-r-perl libcarp-clan-perl libsort-naturally-perl libfile-share-perl libfile-sharedir libfile-sharedir-install-perl libyaml-perl liblwp-protocol-https-perl
```

* Optional
Expand Down Expand Up @@ -279,10 +279,15 @@ From the folder where the repository is located.
```

## List of tools
See [here](https://agat.readthedocs.io/en/latest/?badge=latest) for a list of tools.
As AGAT is a toolkit, it contains a lot of tools. The main one is `agat_convert_sp_gxf2gxf.pl` that allows to check, fix, pad missing information (features/attributes) of any kind of gtf and gff to create complete, sorted and standardised gff3 format.

As AGAT is a toolkit, it contains a lot of tools. The main one is `agat_convert_sp_gxf2gxf.pl` that allows to check, fix, pad missing information (features/attributes) of any kind of gtf and gff to create complete, sorted and standardised gff3 format.
All the installed scripts have the `agat_` prefix.
Typing `agat_` in your terminal followed by the <TAB> key to activate the autocompletion will display the complete list of available tool installed.

To have a look to the available tools you have several approaches:
* `agat --tools`
* Typing `agat_` in your terminal followed by the <TAB> key to activate the autocompletion will display the complete list of available tool installed.
* [The documentation](https://agat.readthedocs.io/en/latest/?badge=latest).


### More about the tools

Expand All @@ -308,8 +313,8 @@ and the parsing approach used.
The method create a hash structure containing all the data in memory. We call it OMNISCIENT. The OMNISCIENT structure is a three levels structure:
```
$omniscient{level1}{tag_l1}{level1_id} = feature <= tag could be gene, match
$omniscient{level2}{tag_l2}{idY} = @featureListL2 <= tag could be mRNA,rRNA,tRNA,etc. idY is a level1_id (know as Parent attribute within the level2 feature). The @featureList is a list to be able to manage isoform cases.
$omniscient{level3}{tag_l3}{idZ} = @featureListL3 <= tag could be exon,cds,utr3,utr5,etc. idZ is the ID of a level2 feature (know as Parent attribute within the level3 feature). The @featureList is a list to be able to put all the feature of a same tag together.
$omniscient{level2}{tag_l2}{idY} = @featureListL2 <= tag could be mRNA,rRNA,tRNA,etc. idY is a level1_id (know as Parent attribute within the level2 feature). The @featureListL2 is a list to be able to manage isoform cases.
$omniscient{level3}{tag_l3}{idZ} = @featureListL3 <= tag could be exon,cds,utr3,utr5,etc. idZ is the ID of a level2 feature (know as Parent attribute within the level3 feature). The @featureListL3 is a list to be able to put all the feature of a same tag together.
```

#### How does the Omniscient parser work
Expand Down
242 changes: 242 additions & 0 deletions bin/agat
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
#!/usr/bin/env perl
use v5.24;
use warnings;
use experimental 'signatures';
no warnings 'experimental::signatures';
use File::Basename;

use AGAT::AppEaser qw< run d >;
use AGAT::AGAT;

# set env variable to save location of this script
my $agat_bin;
if( -l __FILE__){
$agat_bin = dirname(readlink(__FILE__));
} else{
$agat_bin = dirname(__FILE__);
}
$ENV{'AGAT_BIN'}=$agat_bin;

my $APPNAME = 'agat';
my $header = get_agat_header();
my $application = {
factory => {prefixes => {'#' => 'AGAT::AGAT#'}},
configuration => {

# the name of the application, set it above in $APPNAME
name => $APPNAME,

# figure out names of environment variables automatically
# 'auto-environment' => 1,

# sub-commands without children are leaves (no sub help/commands)
'auto-leaves' => 0,

# help goes to standard error by default, override to stdout
# 'help-on-stderr' => 0,

# Where to get the specifications for commands
# specfetch => '+SpecFromHash', # default
# specfetch => '+SpecFromHashOrModule', # possible alternative
},
commands => {
MAIN => {
help => $header,
description =>
'AGAT has the power to check, fix, pad missing information (features/
attributes) of any kind of GTF and GFF to create complete, sorted and
standardised gff3 format. Over the years it has been enriched by many many
tools to perform just about any tasks that is possible related to GTF/GFF
format files (sanitizing, conversions, merging, modifying, filtering, FASTA
sequence extraction, adding information, etc). Comparing to other methods
AGAT is robust to even the most despicable GTF/GFF files.',
children => [qw< levels config >],

# allow for configuration files
sources => '+SourcesWithFiles',
# 'config-files' => ["/etc/$APPNAME.json"],
options => [
{
getopt => 'help|h!',
shortbool => 1, #shortbool added by JDainat to avoid --no-help
help => 'Display the help',
},
{
getopt => 'version|v!',
shortbool => 1,
help => 'Display the AGAT version',
},
{
getopt => 'tools|t!',
shortbool => 1,
help => 'Display the AGAT tools available',
},
],
commit => '#handle_main',
},

# ================================== LEVELS ====================================
levels => {
help => 'Handle feature types and relationships',
description =>
'If you want to see, add or modified the feature relationships you will have to
use this option. It will copy past in you working directory the feature_levels.yaml file
used to define the relationships between feature types and their level organisation.
Typical level organisation: Level1 => gene; Level2 => mRNA; level3 => exon,cds,
utrs. If you get warning from the Omniscient parser that a feature relationship
is not defined, you can provide information about it within the exposed feature_levels.yaml
file. Indeed, if the feature_levels.yaml file exists in your working directory, it will be
used by default.',
options => [
{
getopt => 'help|h!',
shortbool => 1,
help => 'Display the help',
},
{
getopt => 'expose|e!',
shortbool => 1,
help => 'Expose the feature_levels.yaml file.',
},
],
commit => '#handle_levels',
},
# ================================== CONFIG ====================================
config => {
help => 'Handle agat configuration used by _sp_ scripts',
description =>
'The _sp_ scripts use the AGAT parser that can be tuned in many way. The
default parameters are stored within a YAML file. You can see this configuration
file using the expose command that will copy it in you working directory. You
are then free to modify it at your convenience. Wen a config YAML file is
available within the working directory, AGAT will use it in priority. (For
convenience and automation, the parameters can be modified on the fly when using
the expose command. In such case you will get a modified copy of the config file.)
The _sq_ scripts can be tuned only by few options of the config file:
force_gff_input_version, gtf_output_version, gff_output_version and output_format.
',
options => [
{
getopt => 'help|h!',
help => 'Display the help',
shortbool => 1,
},
{
getopt => 'expose|e!',
help => 'Expose the config file (A config.yaml will be set in your working directory). If any AGAT use the config.yaml from the current directory over the default one.',
shortbool => 1,
},
{
getopt => 'verbose=i',
help => 'Verbosity during the GFF/GTF parsing. 0 is quiet. 1,2,3 or 4 to increase verbosity.[Default 1]',
},
{
getopt => 'progress_bar!',
help => 'To activate / deactivate the progress bar. [Default activated]',
},
{
getopt => 'log!',
help => 'To create a log file while parsing the input file to keep track of modification made by AGAT. [Default activated]',
},
{
getopt => 'debug!',
help => 'Extra verbosity for debugging. [Default deactivated]',
},
{
getopt => 'tabix!',
help => 'To sort the output in tabix format. [Default deactivated]',
},
{
getopt => 'merge_loci!',
help => 'To merge loci that overlap at CDS level in a single locus. [Default deactivated]',
},
{
getopt => 'throw_fasta!',
help => 'To throw the fasta embedded in the input file. [Default deactivated]',
},
{
getopt => 'force_gff_input_version=f',
help => 'To force AGAT to use a specific version of the bioperl parser. Choice: 0, 1, 2, 2.5 or 3. 0 means let AGAT choose automatically. [Default 0]',
},
{
getopt => 'output_format=s',
help => 'Set the output format. Choice GFF or GTF. [Default GFF]',
},
{
getopt => 'gff_output_version=f',
help => 'Set the GFF output version. Choice 1, 2, 2.5, 3. [Default 3]',
},
{
getopt => 'gtf_output_version=s',
help => 'Set the GTF output vesion. Choice 1, 2, 2.1, 2.2, 2.5, 3 or relax. [Default relax]',
},
{
getopt => 'create_l3_for_l2_orphan!',
help => 'To create l3 feature for l2 feature without any. [Default activated]',
},
{
getopt => 'locus_tag=s',
help => 'Coma separated list of attribute tag to use to define locus. Will be used if no Parent/id GFF relationship exist or gene_id/transcript_id GTF tag. [Default locus_tag, gene_id ]',
},
{
getopt => 'prefix_new_id=s',
help => 'Prefix to be used for ID of newly created feature. [Default nbis]',
},
{
getopt => 'check_sequential!',
help => 'Expert only - To take care of feature without any proper relationship.[Default activated]',
},
{
getopt => 'check_l2_linked_to_l3!',
help => 'Expert only - To check that all l3 feature has a parental l2 feature. And create one if any missing. [Default activated]',
},
{
getopt => 'check_l1_linked_to_l2!',
help => 'Expert only - To check that all l2 feature has a prental l1 feature. And create one if any missing. [Default activated]',
},
{
getopt => 'remove_orphan_l1!',
help => 'Expert only - To remove level1 feature whithout children features (except top and standalone features that do not have children by definition). [Default activated]',
},
{
getopt => 'check_all_level3_locations!',
help => 'To check location of level3 features: merge overlapping and adjacent exons and adjacent CDS. [Default activated]',
},
{
getopt => 'check_cds!',
help => 'To check when stop codons are definied that they are part of the CDS. If not AGAT extends the CDS to include it. [Default activated]',
},
{
getopt => 'check_exons!',
help => 'To check that exons include all other l3 feature types that are included within exon (see feature_levels.yaml file e.g: cds:"exon"). [Default activated]',
},
{
getopt => 'check_utrs!',
help => 'To create UTRs if missing based on CDS and exon features. [Default activated]',
},
{
getopt => 'check_all_level2_locations!',
help => 'To check that l2 feature locations do not span belong their exon locations. [Default activated]',
},
{
getopt => 'check_all_level1_locations!',
help => 'To check that l1 feature locations do not span belong their l2 locations. [Default activated]',
},
{
getopt => 'check_identical_isoforms!',
help => 'To remove identical isoforms (same exon,cds,locations). [Default activated]',
},
],
commit => '#handle_config',
},
}
};
run($application, [@ARGV]);
exit;


# implementation of sub-command bar
sub cat ($general, $config, $args) {
say defined($config->{galook}) ? $config->{galook} : '*undef*';
return;
}
16 changes: 4 additions & 12 deletions bin/agat_convert_bed2gff.pl
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
use Clone;
use Pod::Usage;
use Getopt::Long;
use Bio::Tools::GFF;
use AGAT::Omniscient;
use AGAT::AGAT;

my $header = get_agat_header();
my $config = get_agat_config();
my $outfile = undef;
my $bed = undef;
my $source_tag = "data";
Expand Down Expand Up @@ -47,14 +47,7 @@
}

## Manage output file
my $gffout;
if ($outfile) {
open(my $fh, '>', $outfile) or die "Could not open file '$outfile' $!";
$gffout= Bio::Tools::GFF->new(-fh => $fh, -gff_version => 3);
}
else{
$gffout = Bio::Tools::GFF->new(-fh => \*STDOUT, -gff_version => 3);
}
my $gffout = prepare_gffout($config, $outfile);

# Ask for specific GFF information
if (!$source_tag or !$primary_tag){
Expand Down Expand Up @@ -224,7 +217,7 @@
-end => $end ,
-frame => $frame ,
-strand =>$strand,
tag => {'ID' => $id}
-tag => {'ID' => $id}
) ;

if( exists_keys ( \%bedOmniscent, ($id, 'name') ) ){
Expand Down Expand Up @@ -416,7 +409,6 @@
}
}


close $fh;

# check if the line has to be skipped or not
Expand Down

0 comments on commit 4495e8c

Please sign in to comment.