Permalink
...
Comparing changes
Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also .
Open a pull request
Create a new pull request by comparing changes across two branches. If you need to, you can also .
Choose a Base Repository
PASApipeline/PASApipeline
DNAStories-CD/PASApipeline
GisCarreon/PASApipeline
Imoteph/PASApipeline
Li-Michael/PASApipeline
bowhan/PASApipeline
corburn/PASApipeline
creggian/PASApipeline
eernst/PASApipeline
hmontenegro/PASApipeline
mimarsh2/PASApipeline
nilesh-iiita/PASApipeline
sandeshsth/PASApipeline
yuragal/PASApipeline
Nothing to show
Choose a Head Repository
PASApipeline/PASApipeline
DNAStories-CD/PASApipeline
GisCarreon/PASApipeline
Imoteph/PASApipeline
Li-Michael/PASApipeline
bowhan/PASApipeline
corburn/PASApipeline
creggian/PASApipeline
eernst/PASApipeline
hmontenegro/PASApipeline
mimarsh2/PASApipeline
nilesh-iiita/PASApipeline
sandeshsth/PASApipeline
yuragal/PASApipeline
Nothing to show
4
contributors
Commits on Mar 13, 2015
|
|
brianjohnhaas |
dont need the optimize tables script
|
521ba80
|
Commits on Mar 27, 2015
|
|
brianjohnhaas |
include gff3-alignment to gtf conversion util
|
6c9f9e3
|
|||
|
|
brianjohnhaas |
ok
Merge branch 'master' of https://github.com/PASApipeline/pasapipeline |
ea6856d
|
Commits on Mar 30, 2015
|
|
brianjohnhaas |
store gene id not trans id as the gene_id att, bugfix
|
8d055a3
|
|||
|
|
brianjohnhaas |
bugfix.. got it this time.
|
f540b86
|
Commits on Apr 04, 2015
|
|
brianjohnhaas |
deprecate
|
d5c40f3
|
|||
|
|
brianjohnhaas |
renamed
|
68bf080
|
Commits on Apr 22, 2015
|
|
brianjohnhaas |
treat transcript features identically to mRNA features in gff3
|
3fbba77
|
Commits on Apr 28, 2015
|
|
brianjohnhaas |
added the process ID into the inx filename to better provide uniquene…
…ss when the target gff3 file is being shared among several parallel proesses |
8a5ec2d
|
Commits on May 08, 2015
|
|
brianjohnhaas |
if -r mode, try to remove existing db but dont fail if it does not exist
|
04342a2
|
Commits on Jun 11, 2015
|
|
brianjohnhaas |
update
|
92d87d0
|
|||
|
|
brianjohnhaas |
ok
Merge branch 'master' of https://github.com/PASApipeline/pasapipeline an updated upstream into a topic branch. |
74418d3
|
Commits on Jun 13, 2015
|
|
brianjohnhaas |
the Error, have 3prime partial protein when 3prime partials werent al…
…lowed! message was due to annotations or transcript structures being imported with coordinates that extend beyond the length of the scaffold |
7fe6b78
|
|||
|
|
brianjohnhaas |
ok
Merge branch 'master' of https://github.com/PASApipeline/pasapipeline |
3c8e88a
|
Commits on Aug 10, 2015
|
|
mimarsh2 |
Allow mysql connection through socket
The Perl DBI library allows for connections to the mysql server though a specific socket file by using the mysql_socket=<pathToFile> host name. When calling the mysql binaries directly, this socket file needs to be passed using the -S argument rather than the -h for the hostname. |
7d5dc8a
|
|||
|
|
mimarsh2 |
Socket usage documentation
Add documentation to the configuration template file about using the mysql_socket option |
11dff71
|
Commits on Aug 11, 2015
|
|
brianjohnhaas |
Merge pull request #12 from mimarsh2/mysqlSocket
Mysql socket |
7d6e2e6
|
Commits on Dec 07, 2015
|
|
brianjohnhaas |
update to recognize latest Trinity accession values
|
69927de
|
Commits on Jan 20, 2016
|
|
corburn |
docs: fix Launch_PASA_pipeline.pl -x flag description
|
a30894d
|
Commits on Oct 17, 2016
|
|
brianjohnhaas |
Merge pull request #15 from corburn/Launch_PASA_pipeline.pl
docs: fix Launch_PASA_pipeline.pl -x flag description |
ea25011
|
Commits on Dec 27, 2016
|
|
yuragal |
Compatibility with GMAP versions 2016-05+
In gmap version 2016-05 new arguments [were introduced] (http://research-pub.gene.com/gmap/archive.html) >Added separate flags in GMAP for controlling intron length of end introns separately from middle introns. Flag names are now --max-intronlength-middle (previously --intronlength) and --max-intronlength-ends This patch is to evaluate gmap version and to prepare arguments of gmap call accordingly. |
440ec0e
|
Commits on Dec 28, 2016
|
|
brianjohnhaas |
Merge pull request #29 from yuragal/patch-1
Compatibility with GMAP versions 2016-05+ |
6f6f2e9
|
Commits on Feb 15, 2017
|
|
brianjohnhaas |
compatible w/ latest mysql
|
2bf11be
|
Commits on Apr 21, 2017
|
|
brianjohnhaas |
activate debug option
|
752ce7f
|
Commits on Jul 11, 2017
|
|
brianjohnhaas |
more useful error messaging and troubleshooting options
|
8d31504
|
Unified
Split
Showing
with
636 additions
and 93 deletions.
- +28 −7 PerlLib/Exons_to_geneobj.pm
- +5 −3 PerlLib/Fastq_reader.pm
- +72 −8 PerlLib/GFF3_alignment_utils.pm
- +2 −2 PerlLib/GFF3_utils.pm
- +187 −0 PerlLib/GTF_alignment_utils.pm
- +21 −13 PerlLib/GTF_utils.pm
- +141 −0 PerlLib/Pipeliner.pm
- +54 −27 PerlLib/SingleLinkageClusterer.pm
- +4 −1 PerlLib/Thread_helper.pm
- +1 −1 SAMPLE_HOOKS/GFF3/GFF3_annot_retriever.pm
- 0 misc_utilities/{gff3_gene_to_transcript_gff3.pl → deprecated/old.gff3_gene_to_transcript_gff3.pl}
- +41 −0 misc_utilities/gff3_alignment_to_gtf_format.pl
- +2 −1 misc_utilities/gff3_file_toString.pl
- +12 −1 misc_utilities/gtf_file_to_proteins.pl
- +1 −0 pasa_conf/pasa.CONFIG.template
- +4 −4 pasa_conf/sample_test.conf
- BIN pasa_cpp/pasa
- +2 −2 schema/cdna_alignment_mysqlschema
- +4 −3 schema/notes
- +1 −13 scripts/Launch_PASA_pipeline.pl
- +3 −0 scripts/build_comprehensive_transcriptome.dbi
- +26 −3 scripts/cDNA_annotation_comparer.dbi
- +17 −3 scripts/create_mysql_cdnaassembly_db.dbi
- +8 −1 scripts/process_GMAP_alignments_gff3_chimeras_ok.pl
View
35
PerlLib/Exons_to_geneobj.pm
| @@ -27,9 +27,13 @@ sub create_gene_obj { | ||
| ## exons_ref should be end5's keyed to end3's for all exons. | ||
| my ($gene_struct_mod, $cdna_seq) = &get_cdna_seq ($exons_href, $sequence_ref); | ||
| + | ||
| + | ||
| my $cdna_seq_length = length $cdna_seq; | ||
| my $long_orf_obj = new Longest_orf(); | ||
| + #print STDERR "CDNA_SEQ: [$cdna_seq], length: $cdna_seq_length\n"; | ||
| + | ||
| # establish long orf finding parameters. | ||
| $long_orf_obj->forward_strand_only(); | ||
| if ($partial_info_href->{"5prime"}) { | ||
| @@ -43,24 +47,28 @@ sub create_gene_obj { | ||
| $long_orf_obj->get_longest_orf($cdna_seq); | ||
| my ($end5, $end3) = $long_orf_obj->get_end5_end3(); | ||
| - print "CDS: $end5, $end3\n" if $SEE; | ||
| + #print STDERR "*** CDS: $end5, $end3\n";# if $SEE; | ||
| my $gene_obj = &create_gene ($gene_struct_mod, $end5, $end3); | ||
| + #print STDERR $gene_obj->toString(); | ||
| + | ||
| $gene_obj->create_all_sequence_types($sequence_ref); | ||
| my $protein = $gene_obj->get_protein_sequence(); | ||
| - | ||
| + my $recons_cds = $gene_obj->get_CDS_sequence(); | ||
| + #print STDERR "reconsCDS: $recons_cds\n"; | ||
| + | ||
| ## check partiality | ||
| if ($protein) { # it is possible that we won't have any cds structure | ||
| if ($protein !~ /^M/) { | ||
| # this would require that we allowed for 5prime partials | ||
| unless ($partial_info_href->{"5prime"}) { | ||
| - confess "Error, have 5' partial protein when 5prime partials weren't allowed!\n$protein\n"; | ||
| + confess "Error, have 5' partial protein when 5prime partials weren't allowed!\n$protein\n$cdna_seq\n"; | ||
| } | ||
| } | ||
| if ($protein !~ /\*$/) { | ||
| # this would require that we allowed for 3prime partials | ||
| unless ($partial_info_href->{"3prime"}) { | ||
| - confess "Error, have 3' partial protein when 3prime partials weren't allowed!\n$protein\n"; | ||
| + confess "Error, have 3' partial protein when 3prime partials weren't allowed!\n$protein\n$cdna_seq\n"; | ||
| } | ||
| } | ||
| @@ -80,6 +88,9 @@ sub create_gene_obj { | ||
| #### | ||
| sub get_cdna_seq { | ||
| my ($gene_struct, $assembly_seq_ref) = @_; | ||
| + | ||
| + my $seq_length = length($$assembly_seq_ref); | ||
| + | ||
| my (@end5s) = sort {$a<=>$b} keys %$gene_struct; | ||
| my $strand = "?"; | ||
| foreach my $end5 (@end5s) { | ||
| @@ -90,7 +101,7 @@ sub get_cdna_seq { | ||
| } | ||
| if ($strand eq "?") { | ||
| print Dumper ($gene_struct); | ||
| - die "ERROR: I can't determine what orientation the cDNA is in!\n"; | ||
| + confess "ERROR: I can't determine what orientation the cDNA is in!\n"; | ||
| } | ||
| print NOTES "strand: $strand\n"; | ||
| my $cdna_seq; | ||
| @@ -99,6 +110,11 @@ sub get_cdna_seq { | ||
| foreach my $end5 (@end5s) { | ||
| #print $end5; | ||
| my $end3 = $gene_struct->{$end5}; | ||
| + | ||
| + if ($end5 > $seq_length || $end3 > $seq_length) { | ||
| + confess "Error, coords are out of bounds of sequence length: $seq_length:\n" . Dumper(\$gene_struct); | ||
| + } | ||
| + | ||
| my ($coord1, $coord2) = sort {$a<=>$b} ($end5, $end3); | ||
| my $exon_seq = substr ($$assembly_seq_ref, $coord1 - 1, ($coord2 - $coord1 + 1)); | ||
| $cdna_seq .= $exon_seq; | ||
| @@ -114,21 +130,26 @@ sub get_cdna_seq { | ||
| #### | ||
| sub create_gene { | ||
| my ($gene_struct_mod, $cds_pointer_lend, $cds_pointer_rend) = @_; | ||
| + | ||
| + #use Data::Dumper; | ||
| + #print STDERR Dumper($gene_struct_mod) . "CDS: $cds_pointer_lend, $cds_pointer_rend\n"; | ||
| + | ||
| my $strand = $gene_struct_mod->{strand}; | ||
| - my @exons = @{$gene_struct_mod->{exons}}; | ||
| + my @exons = sort {$a->[0]<=>$b->[0]} @{$gene_struct_mod->{exons}}; | ||
| if ($strand eq '-') { | ||
| @exons = reverse (@exons); | ||
| } | ||
| my $mRNA_pointer_lend = 1; | ||
| my $mRNA_pointer_rend = 0; | ||
| my $gene_obj = new Gene_obj(); | ||
| foreach my $coordset_ref (@exons) { | ||
| - my ($coord1, $coord2) = @$coordset_ref; | ||
| + my ($coord1, $coord2) = sort {$a<=>$b} @$coordset_ref; | ||
| my ($end5, $end3) = ($strand eq '+') ? ($coord1, $coord2) : ($coord2, $coord1); | ||
| my $exon_obj = new mRNA_exon_obj($end5, $end3); | ||
| my $exon_length = ($coord2 - $coord1 + 1); | ||
| $mRNA_pointer_rend = $mRNA_pointer_lend + $exon_length - 1; | ||
| ## see if cds is within current cDNA range. | ||
| + #print STDERR "mRNA coords: $mRNA_pointer_lend-$mRNA_pointer_rend\n"; | ||
| if ( $cds_pointer_rend >= $mRNA_pointer_lend && $cds_pointer_lend <= $mRNA_pointer_rend) { #overlap | ||
| my $diff = $cds_pointer_lend - $mRNA_pointer_lend; | ||
| my $delta_lend = ($diff >0) ? $diff : 0; | ||
View
8
PerlLib/Fastq_reader.pm
| @@ -3,8 +3,6 @@ package Fastq_reader; | ||
| use strict; | ||
| use warnings; | ||
| -use PerlIO::gzip; | ||
| - | ||
| sub new { | ||
| my ($packagename, $fastqFile) = @_; | ||
| @@ -24,7 +22,11 @@ sub new { | ||
| } | ||
| else { | ||
| if ( $fastqFile =~ /\.gz$/ ) { | ||
| - open ($filehandle, "<:gzip", $fastqFile) or die "Error: Couldn't open compressed $fastqFile\n"; | ||
| + open ($filehandle, "gunzip -c $fastqFile | ") or die "Error: Couldn't open compressed $fastqFile\n"; | ||
| + } | ||
| + elsif ($fastqFile =~ /\.bz2$/) { | ||
| + open ($filehandle, "bunzip2 -c $fastqFile | ") or die "Error, couldn't open compressed $fastqFile $!"; | ||
| + | ||
| } else { | ||
| open ($filehandle, $fastqFile) or die "Error: Couldn't open $fastqFile\n"; | ||
| } | ||
View
80
PerlLib/GFF3_alignment_utils.pm
| @@ -1,14 +1,20 @@ | ||
| +#!/usr/bin/env perl | ||
| + | ||
| package GFF3_alignment_utils; | ||
| use strict; | ||
| use warnings; | ||
| use Carp; | ||
| use Gene_obj; | ||
| +use Gene_obj_indexer; | ||
| use CDNA::Alignment_segment; | ||
| use CDNA::CDNA_alignment; | ||
| +use File::Basename; | ||
| + | ||
| +__run_test() unless caller; | ||
| -sub index_GFF3_alignment_objs { | ||
| +sub index_alignment_objs { | ||
| my ($gff3_alignment_file, $genome_alignment_indexer_href) = @_; | ||
| unless ($gff3_alignment_file && -s $gff3_alignment_file) { | ||
| @@ -17,10 +23,11 @@ sub index_GFF3_alignment_objs { | ||
| unless (ref $genome_alignment_indexer_href) { | ||
| confess "Error, need genome indexer href as param "; | ||
| } | ||
| - | ||
| - | ||
| + | ||
| my %genome_trans_to_alignment_segments; | ||
| + my %trans_to_gene_id; | ||
| + | ||
| open (my $fh, $gff3_alignment_file) or die "Error, cannot open file $gff3_alignment_file"; | ||
| while (<$fh>) { | ||
| @@ -69,17 +76,19 @@ sub index_GFF3_alignment_objs { | ||
| my ($end5, $end3) = ($orient eq '+') ? ($lend, $rend) : ($rend, $lend); | ||
| - $info =~ /Target=\S+ (\d+) (\d+) ([\+\-])/ or die "Error, cannot extract match coordinates from info: $info"; | ||
| + $info =~ /Target=\S+ (\d+) (\d+)/ or die "Error, cannot extract match coordinates from info: $info"; | ||
| my $cdna_seg_lend = $1; | ||
| my $cdna_seg_rend = $2; | ||
| - my $cdna_orient = $3; # always set to + in pasa | ||
| + | ||
| + ($cdna_seg_lend, $cdna_seg_rend) = sort {$a<=>$b} ($cdna_seg_lend, $cdna_seg_rend); # always + orient for transcript coords. | ||
| + | ||
| my $alignment_segment = new CDNA::Alignment_segment($end5, $end3, $cdna_seg_lend, $cdna_seg_rend, $per_id); | ||
| - push (@{$genome_trans_to_alignment_segments{$scaff}->{$gene_id}}, $alignment_segment); | ||
| + push (@{$genome_trans_to_alignment_segments{$scaff}->{$trans_id}}, $alignment_segment); | ||
| - | ||
| + $trans_to_gene_id{$trans_id} = $gene_id; | ||
| } | ||
| @@ -109,14 +118,69 @@ sub index_GFF3_alignment_objs { | ||
| $cdna_alignment_obj->set_acc($alignment_acc); | ||
| $cdna_alignment_obj->{genome_acc} = $scaff; | ||
| - $genome_alignment_indexer_href->{$alignment_acc} = $cdna_alignment_obj; | ||
| + my $gene_id = $trans_to_gene_id{$alignment_acc} or confess "Error no gene_id for acc: $alignment_acc"; | ||
| + | ||
| + $cdna_alignment_obj->{gene_id} = $gene_id; | ||
| + | ||
| + | ||
| + $cdna_alignment_obj->{source} = basename($gff3_alignment_file); | ||
| + if (ref $genome_alignment_indexer_href eq "Gene_obj_indexer") { | ||
| + $genome_alignment_indexer_href->store_gene($alignment_acc, $cdna_alignment_obj); | ||
| + | ||
| + } | ||
| + else { | ||
| + | ||
| + $genome_alignment_indexer_href->{$alignment_acc} = $cdna_alignment_obj; | ||
| + } | ||
| push (@{$scaff_to_align_list{$scaff}}, $alignment_acc); | ||
| } | ||
| } | ||
| return(%scaff_to_align_list); | ||
| } | ||
| + | ||
| + | ||
| +################# | ||
| +## Testing | ||
| +################# | ||
| + | ||
| + | ||
| +sub __run_test { | ||
| + | ||
| + my $usage = "usage: $0 file.alignment.gff3\n\n"; | ||
| + | ||
| + my $gff3_file = $ARGV[0] or die $usage; | ||
| + | ||
| + my $indexer = {}; | ||
| + my %scaff_to_alignments = &index_alignment_objs($gff3_file, $indexer); | ||
| + | ||
| + | ||
| + foreach my $scaffold (keys %scaff_to_alignments) { | ||
| + | ||
| + my @align_ids = @{$scaff_to_alignments{$scaffold}}; | ||
| + | ||
| + foreach my $align_id (@align_ids) { | ||
| + my $cdna_obj = $indexer->{$align_id}; | ||
| + | ||
| + print $cdna_obj->toString(); | ||
| + } | ||
| + } | ||
| + | ||
| + | ||
| + | ||
| + exit(0); | ||
| + | ||
| + | ||
| + | ||
| +} | ||
| + | ||
| + | ||
| + | ||
| + | ||
| + | ||
| + | ||
| + | ||
| 1; #EOM | ||
View
4
PerlLib/GFF3_utils.pm
| @@ -66,7 +66,7 @@ sub index_GFF3_gene_objs { | ||
| unless ($feat_type) { die "Error, $_, no feat_type: line\[$_\]"; } | ||
| - unless ($feat_type =~ /^(gene|mRNA|CDS|exon)$/) { next;} | ||
| + unless ($feat_type =~ /^(gene|mRNA|transcript|CDS|exon)$/) { next;} | ||
| $gene_info = uri_unescape($gene_info); | ||
| @@ -107,7 +107,7 @@ sub index_GFF3_gene_objs { | ||
| # print "id: $id, parent: $parent\n"; | ||
| - if ($feat_type eq 'mRNA') { | ||
| + if ($feat_type eq 'mRNA' || $feat_type eq 'transcript') { | ||
| ## just get the identifier info | ||
| $transcript_to_gene{$id} = $parent; | ||
| next; | ||
Oops, something went wrong.