Skip to content
Browse files

sync with main trunk

svn path=/bioperl-live/branches/branch-1-6/; revision=16147
  • Loading branch information...
1 parent 89fe0dd commit 086fcff6fd99ca99c0590e5a6d6702b3e0da4394 cjfields committed Sep 22, 2009
View
123 Bio/Align/DNAStatistics.pm
@@ -73,21 +73,37 @@ in brackets are the pattern which will match
=over 3
-=item JukesCantor [jc|jukes|jukescantor|jukes-cantor]
+=item *
-=item Uncorrected [jcuncor|uncorrected]
+JukesCantor [jc|jukes|jukescantor|jukes-cantor]
-=item F81 [f81|felsenstein]
+=item *
-=item Kimura [k2|k2p|k80|kimura]
+Uncorrected [jcuncor|uncorrected]
-=item Tamura [t92|tamura|tamura92]
+=item *
-=item F84 [f84|felsenstein84]
+F81 [f81|felsenstein]
-=item TajimaNei [tajimanei|tajima\-nei]
+=item *
-=item JinNei [jinnei|jin\-nei] (not implemented)
+Kimura [k2|k2p|k80|kimura]
+
+=item *
+
+Tamura [t92|tamura|tamura92]
+
+=item *
+
+F84 [f84|felsenstein84]
+
+=item *
+
+TajimaNei [tajimanei|tajima\-nei]
+
+=item *
+
+JinNei [jinnei|jin\-nei] (not implemented)
=back
@@ -104,7 +120,7 @@ several pre-requisites for the alignment.
=item 1
DNA alignment must be based on protein alignment. Use the subroutine
-L<aa_to_dna_aln> in Bio::Align::Utilities to achieve this.
+L<Bio::Align::Utilities/aa_to_dna_aln> to achieve this.
=item 2
@@ -140,49 +156,53 @@ comparisons in an MSA. The statistics returned are:
=over 3
-=item S_d
+=item *
-Number of synonymous mutations between the 2 sequences.
+S_d - Number of synonymous mutations between the 2 sequences.
-=item N_d
+=item *
-Number of non-synonymous mutations between the 2 sequences.
+N_d - Number of non-synonymous mutations between the 2 sequences.
-=item S
+=item *
-Mean number of synonymous sites in both sequences.
+S - Mean number of synonymous sites in both sequences.
-=item N
+=item *
-mean number of synonymous sites in both sequences.
+N - mean number of synonymous sites in both sequences.
-=item P_s
+=item *
-proportion of synonymous differences in both sequences given by P_s = S_d/S.
+P_s - proportion of synonymous differences in both sequences given by
+P_s = S_d/S.
-=item P_n
+=item *
-proportion of non-synonymous differences in both sequences given by P_n = S_n/S.
+P_n - proportion of non-synonymous differences in both sequences given
+by P_n = S_n/S.
-=item D_s
+=item *
-estimation of synonymous mutations per synonymous site (by Jukes-Cantor).
+D_s - estimation of synonymous mutations per synonymous site (by
+Jukes-Cantor).
-=item D_n
+=item *
-estimation of non-synonymous mutations per non-synonymous site (by Jukes-Cantor).
+D_n - estimation of non-synonymous mutations per non-synonymous site (by
+Jukes-Cantor).
-=item D_n_var
+=item *
-estimation of variance of D_n .
+D_n_var - estimation of variance of D_n .
-=item D_s_var
+=item *
-estimation of variance of S_n.
+D_s_var - estimation of variance of S_n.
-=item z_value
+=item *
-calculation of z value.Positive value indicates D_n E<gt> D_s,
+z_value - calculation of z value.Positive value indicates D_n E<gt> D_s,
negative value indicates D_s E<gt> D_n.
=back
@@ -191,25 +211,25 @@ The statistics returned by calc_average_KaKs are:
=over 3
-=item D_s
+=item *
-Average number of synonymous mutations/synonymous site.
+D_s - Average number of synonymous mutations/synonymous site.
-=item D_n
+=item *
-Average number of non-synonymous mutations/non-synonymous site.
+D_n - Average number of non-synonymous mutations/non-synonymous site.
-=item D_s_var
+=item *
-Estimated variance of Ds from bootstrapped alignments.
+D_s_var - Estimated variance of Ds from bootstrapped alignments.
-=item D_n_var
+=item *
-Estimated variance of Dn from bootstrapped alignments.
+D_n_var - Estimated variance of Dn from bootstrapped alignments.
-=item z_score
+=item *
-calculation of z value. Positive value indicates D_n E<gt>D_s,
+z_score - calculation of z value. Positive value indicates D_n E<gt>D_s,
negative values vice versa.
=back
@@ -222,7 +242,6 @@ the book, and reproduce those results. If people like having this sort
of analysis in BioPerl other methods for estimating Ds and Dn can be
provided later.
-
Much of the DNA distance code is based on implementations in EMBOSS
(Rice et al, www.emboss.org) [distmat.c] and PHYLIP (J. Felsenstein et
al) [dnadist.c]. Insight also gained from Eddy, Durbin, Krogh, &
@@ -232,26 +251,36 @@ Mitchison.
=over 3
-=item D_JukesCantor
+=item *
+
+D_JukesCantor
"Phylogenetic Inference", Swoffrod, Olsen, Waddell and Hillis, in
Mol. Systematics, 2nd ed, 1996, Ch 11. Derived from "Evolution of
Protein Molecules", Jukes & Cantor, in Mammalian Prot. Metab., III,
1969, pp. 21-132.
-=item D_Tamura
+=item *
+
+D_Tamura
K Tamura, Mol. Biol. Evol. 1992, 9, 678.
-=item D_Kimura
+=item *
+
+D_Kimura
M Kimura, J. Mol. Evol., 1980, 16, 111.
-=item JinNei
+=item *
+
+JinNei
Jin and Nei, Mol. Biol. Evol. 82, 7, 1990.
-=item D_TajimaNei
+=item *
+
+D_TajimaNei
Tajima and Nei, Mol. Biol. Evol. 1984, 1, 269.
View
16 Bio/AnalysisI.pm
@@ -198,8 +198,8 @@ sub describe { shift->throw_not_implemented(); }
The analysis input data are named, and can be also associated with a
default value, with allowed values and with few other attributes. The
names are important for feeding the service with the input data (the
-inputs are given to methods C<create_job>, C<run>, and/or C<wait_for>
-as name/value pairs).
+inputs are given to methods C<create_job>, C<Bio::AnalysisI|run>, and/or
+C<Bio::AnalysisI|wait_for> as name/value pairs).
Here is a (slightly shortened) example of an input specification:
@@ -324,8 +324,8 @@ tool.
Call this method if you wish to "stage the scene" - to create a job
with all input data but without actually running it. This method is
-called automatically from other methods (C<run> and C<wait_for>) so
-usually you do not need to call it directly.
+called automatically from other methods (C<Bio::AnalysisI|run> and
+C<Bio::AnalysisI|wait_for>) so usually you do not need to call it directly.
The input data and prameters for this execution can be specified in
various ways:
@@ -459,24 +459,24 @@ sub id { shift->throw_not_implemented(); }
# -----------------------------------------------------------------------------
-=head2 run
+=head2 Bio::AnalysisI::JobI::run
Usage : $job->run
Returns : itself
Args : none
It starts previously created job. The job already must have all input
data filled-in. This differs from the method of the same name of the
-C<Bio::Tools::Run::Analysis> object where the C<run> method creates
-also a new job allowing to set input data.
+C<Bio::Tools::Run::Analysis> object where the C<Bio::AnalysisI::JobI::run> method
+creates also a new job allowing to set input data.
=cut
sub run { shift->throw_not_implemented(); }
# -----------------------------------------------------------------------------
-=head2 wait_for
+=head2 Bio::AnalysisI::JobI::wait_for
Usage : $job->wait_for
Returns : itself
View
2 Bio/Assembly/Tools/ContigSpectrum.pm
@@ -785,7 +785,7 @@ sub average {
}
-=head2 average
+=head2 score
Title : score
Usage : my $score = $csp->score();
View
117 Bio/DB/GFF.pm
@@ -94,7 +94,9 @@ directory under a subdirectory named Bio::DB::GFF:
=over 4
-=item bp_load_gff.pl
+=item *
+
+bp_load_gff.pl
This script will load a Bio::DB::GFF database from a flat GFF file of
sequence annotations. Only the relational database version of
@@ -108,7 +110,9 @@ for most of their functionality.
load_gff.pl also has a --upgrade option, which will perform a
non-destructive upgrade of older schemas to newer ones.
-=item bp_bulk_load_gff.pl
+=item *
+
+bp_bulk_load_gff.pl
This script will populate a Bio::DB::GFF database from a flat GFF file
of sequence annotations. Only the MySQL database version of
@@ -120,7 +124,9 @@ This script takes a --fasta argument to load raw DNA into the database
as well. However, GFF databases do not require access to the raw DNA
for most of their functionality.
-=item bp_fast_load_gff.pl
+=item *
+
+bp_fast_load_gff.pl
This script is as fast as bp_bulk_load_gff.pl but uses Unix pipe
tricks to allow for incremental updates. It only supports the MySQL
@@ -129,13 +135,17 @@ non-Unix platforms.
Arguments are the same as bp_load_gff.pl
-=item gadfly_to_gff.pl
+=item *
+
+gadfly_to_gff.pl
This script will convert the GFF-like format used by the Berkeley
Drosophila Sequencing project into a format suitable for use with this
module.
-=item sgd_to_gff.pl
+=item *
+
+sgd_to_gff.pl
This script will convert the tab-delimited feature files used by the
Saccharomyces Genome Database into a format suitable for use with this
@@ -155,57 +165,75 @@ The 9 columns are as follows:
=over 4
-=item 1. reference sequence
+=item 1.
+
+reference sequence
This is the ID of the sequence that is used to establish the
coordinate system of the annotation. In the example above, the
reference sequence is "Chr1".
-=item 2. source
+=item 2.
+
+source
The source of the annotation. This field describes how the annotation
was derived. In the example above, the source is "curated" to
indicate that the feature is the result of human curation. The names
and versions of software programs are often used for the source field,
as in "tRNAScan-SE/1.2".
-=item 3. method
+=item 3.
+
+method
The annotation method. This field describes the type of the
annotation, such as "CDS". Together the method and source describe
the annotation type.
-=item 4. start position
+=item 4.
+
+start position
The start of the annotation relative to the reference sequence.
-=item 5. stop position
+=item 5.
+
+stop position
The stop of the annotation relative to the reference sequence. Start
is always less than or equal to stop.
-=item 6. score
+=item 6.
+
+score
For annotations that are associated with a numeric score (for example,
a sequence similarity), this field describes the score. The score
units are completely unspecified, but for sequence similarities, it is
typically percent identity. Annotations that don't have a score can
use "."
-=item 7. strand
+=item 7.
+
+strand
For those annotations which are strand-specific, this field is the
strand on which the annotation resides. It is "+" for the forward
strand, "-" for the reverse strand, or "." for annotations that are
not stranded.
-=item 8. phase
+=item 8.
+
+phase
For annotations that are linked to proteins, this field describes the
phase of the annotation on the codons. It is a number from 0 to 2, or
"." for features that have no phase.
-=item 9. group
+=item 9.
+
+group
GFF provides a simple way of generating annotation hierarchies ("is
composed of" relationships) by providing a group field. The group
@@ -315,13 +343,17 @@ specifying which tag to group on:
=over 4
-=item Using -preferred_groups
+=item *
+
+Using -preferred_groups
When you create a Bio::DB::GFF object, pass it a -preferred_groups=E<gt>
argument. This specifies a tag that will be used for grouping. You
can pass an array reference to specify a list of such tags.
-=item In the GFF header
+=item *
+
+In the GFF header
The GFF file itself can specify which tags are to be used for
grouping. Insert a comment like the following:
@@ -409,7 +441,9 @@ it adaptable to use with a variety of databases.
=over 4
-=item Adaptors
+=item *
+
+Adaptors
The core of the module handles the user API, annotation coordinate
arithmetic, and other common issues. The details of fetching
@@ -441,7 +475,9 @@ There are currently five adaptors recommended for general use:
Check the Bio/DB/GFF/Adaptor directory and subdirectories for other,
more specialized adaptors, as well as experimental ones.
-=item Aggregators
+=item *
+
+Aggregators
The GFF format uses a "group" field to indicate aggregation properties
of individual features. For example, a set of exons and introns may
@@ -513,7 +549,7 @@ has some limitations.
=over 4
-=item 1. GFF version string is required
+=item GFF version string is required
The GFF file B<must> contain the version comment:
@@ -523,7 +559,7 @@ Unless this version string is present at the top of the GFF file, the
loader will attempt to parse the file in GFF2 format, with
less-than-desirable results.
-=item 2. Only one level of nesting allowed
+=item Only one level of nesting allowed
A major restriction is that Bio::DB::GFF only allows one level of
nesting of features. For nesting, the Target tag will be used
@@ -1742,27 +1778,37 @@ This method takes a single overloaded argument, which can be any of:
=over 4
-=item 1. a scalar corresponding to a GFF file on the system
+=item *
+
+a scalar corresponding to a GFF file on the system
A pathname to a local GFF file. Any files ending with the .gz, .Z, or
.bz2 suffixes will be transparently decompressed with the appropriate
command-line utility.
-=item 2. an array reference containing a list of GFF files on the system
+=item *
+
+an array reference containing a list of GFF files on the system
For example ['/home/gff/gff1.gz','/home/gff/gff2.gz']
-=item 3. directory path
+=item *
+
+directory path
The indicated directory will be searched for all files ending in the
suffixes .gff, .gff.gz, .gff.Z or .gff.bz2.
-=item 4. filehandle
+=item *
+
+filehandle
An open filehandle from which to read the GFF data. Tied filehandles
now work as well.
-=item 5. a pipe expression
+=item *
+
+a pipe expression
A pipe expression will also work. For example, a GFF file on a remote
web server can be loaded with an expression like this:
@@ -1837,27 +1883,37 @@ This method takes a single overloaded argument, which can be any of:
=over 4
-=item 1. scalar corresponding to a FASTA file on the system
+=item *
+
+scalar corresponding to a FASTA file on the system
A pathname to a local FASTA file. Any files ending with the .gz, .Z, or
.bz2 suffixes will be transparently decompressed with the appropriate
command-line utility.
-=item 2. array reference containing a list of FASTA files on the
+=item *
+
+array reference containing a list of FASTA files on the
system
For example ['/home/fasta/genomic.fa.gz','/home/fasta/genomic.fa.gz']
-=item 3. path to a directory
+=item *
+
+path to a directory
The indicated directory will be searched for all files ending in the
suffixes .fa, .fa.gz, .fa.Z or .fa.bz2.
-a=item 4. filehandle
+=item *
+
+filehandle
An open filehandle from which to read the FASTA data.
-=item 5. pipe expression
+=item *
+
+pipe expression
A pipe expression will also work. For example, a FASTA file on a remote
web server can be loaded with an expression like this:
@@ -3775,7 +3831,6 @@ fixed.
=head1 SEE ALSO
-L<bioperl>,
L<Bio::DB::GFF::RelSegment>,
L<Bio::DB::GFF::Aggregator>,
L<Bio::DB::GFF::Feature>,
View
32 Bio/DB/GFF/Aggregator.pm
@@ -39,20 +39,26 @@ Instances of Bio::DB::GFF::Aggregator have three attributes:
=over 3
-=item method
+=item *
+
+method
This is the GFF method field of the composite feature as a whole. For
example, "transcript" may be used for a composite feature created by
aggregating individual intron, exon and UTR features.
-=item main method
+=item *
+
+main method
Sometimes GFF groups are organized hierarchically, with one feature
logically containing another. For example, in the C. elegans schema,
methods of type "Sequence:curated" correspond to regions covered by
curated genes. There can be zero or one main methods.
-=item subparts
+=item *
+
+subparts
This is a list of one or more methods that correspond to the component
features of the aggregates. For example, in the C. elegans database,
@@ -65,14 +71,18 @@ subclasses:
=over 4
-=item disaggregate()
+=item *
+
+disaggregate()
This method is called by the Adaptor object prior to fetching a list
of features. The method is passed an associative array containing the
[method,source] pairs that the user has requested, and it returns a
list of raw features that it would like the adaptor to fetch.
-=item aggregate()
+=item *
+
+aggregate()
This method is called by the Adaptor object after it has fetched
features. The method is passed a list of raw features and is expected
@@ -86,15 +96,21 @@ case, it suffices for subclasses to override the following methods:
=over 4
-=item method()
+=item *
+
+method()
Return the default method for the composite feature as a whole.
-=item main_name()
+=item *
+
+main_name()
Return the default main method name.
-=item part_names()
+=item *
+
+part_names()
Return a list of subpart method names.
View
155 Bio/DB/HIV/HIVQueryHelper.pm
@@ -95,7 +95,7 @@ BEGIN {
=head2 HIVSchema - objects/methods to manipulate a version of the LANL HIV DB schema
-=head3 SYNOPSIS
+=head3 HIVSchema SYNOPSIS
$schema = new HIVSchema( 'lanl-schema.xml' );
@tables = $schema->tables;
@@ -109,7 +109,7 @@ BEGIN {
$table = $schema->tablepart('SEQ_SAMple.SSAM_badseq'); # returns 'SEQ_SAMple'
$column = $schema->columnpart('SEQ_SAMple.SSAM_badseq'); # returns 'SSAM_badseq'
-=head3 DESCRIPTION
+=head3 HIVSchema DESCRIPTION
HIVSchema methods are used in L<Bio::DB::Query::HIVQuery> for table,
column, primary/foreign key manipulations based on the observed Los
@@ -131,9 +131,9 @@ use strict;
### constructor
-=head3 CONSTRUCTOR
+=head3 HIVSchema CONSTRUCTOR
-=head4 new
+=head4 HIVSchema::new
Title : new
Usage : $schema = new HIVSchema( "lanl-schema.xml ");
@@ -157,9 +157,9 @@ sub new {
### object methods
-=head3 INSTANCE METHODS
+=head3 HIVSchema INSTANCE METHODS
-=head4 tables
+=head4 HIVSchema tables
Title : tables
Usage : $schema->tables()
@@ -186,7 +186,7 @@ sub tables {
return @k;
}
-=head4 columns
+=head4 HIVSchema columns
Title : columns
Usage : $schema->columns( [$tablename] );
@@ -218,7 +218,7 @@ sub columns {
return @k;
}
-=head4 fields
+=head4 HIVSchema fields
Title : fields
Usage : $schema->fields();
@@ -238,7 +238,7 @@ sub fields {
return @k;
}
-=head4 options
+=head4 HIVSchema options
Title : options
Usage : $schema->options(@fieldnames)
@@ -259,7 +259,7 @@ sub options {
return $$sref{$sfield}{option} ? @{$$sref{$sfield}{option}} : ();
}
-=head4 aliases
+=head4 HIVSchema aliases
Title : aliases
Usage : $schema->aliases(@fieldnames)
@@ -286,7 +286,7 @@ sub aliases {
}
}
-=head4 ankh
+=head4 HIVSchema ankh
Title : ankh (annotation key hash)
Usage : $schema->ankh(@fieldnames)
@@ -314,7 +314,7 @@ sub ankh {
return %ret;
}
-=head4 tablepart
+=head4 HIVSchema tablepart
Title : tablepart (alias: tbl)
Usage : $schema->tbl(@fieldnames)
@@ -353,7 +353,7 @@ sub tbl {
shift->tablepart(@_);
}
-=head4 columnpart
+=head4 HIVSchema columnpart
Title : columnpart (alias: col)
Usage : $schema->col(@fieldnames)
@@ -382,7 +382,7 @@ sub col {
shift->columnpart(@_);
}
-=head4 primarykey
+=head4 HIVSchema primarykey
Title : primarykey [alias: pk]
Usage : $schema->pk(@tablenames);
@@ -416,7 +416,7 @@ sub pk {
shift->primarykey(@_);
}
-=head4 foreignkey
+=head4 HIVSchema foreignkey
Title : foreignkey [alias: fk]
Usage : $schema->fk($intable [, $totable])
@@ -461,7 +461,7 @@ sub fk {
shift->foreignkey(@_);
}
-=head4 foreigntable
+=head4 HIVSchema foreigntable
Title : foreigntable [alias ftbl]
Usage : $schema->ftbl( @foreign_key_fieldnames );
@@ -495,7 +495,7 @@ sub ftbl {
shift->foreigntable(@_);
}
-=head4 find_join
+=head4 HIVSchema find_join
Title : find_join
Usage : $sch->find_join('Table1', 'Table2')
@@ -527,7 +527,7 @@ sub find_join {
}
}
-=head4 _find_join_guts
+=head4 HIVSchema _find_join_guts
Title : _find_join_guts
Usage : $sch->_find_join_guts($table1, $table2, $stackref, \$found, $reverse)
@@ -610,7 +610,7 @@ sub _find_join_guts {
}
}
-=head4 loadSchema
+=head4 HIVSchema loadSchema
Title : loadHIVSchema [alias: loadSchema]
Usage : $schema->loadSchema( $XMLfilename )
@@ -686,7 +686,7 @@ sub loadSchema {
# below, dangerous
-=head4 _sfieldh
+=head4 HIVSchema _sfieldh
Title : _sfieldh
Usage : $schema->_sfieldh($fieldname)
@@ -708,7 +708,7 @@ sub _sfieldh {
=head2 Class QRY - a query algebra for HIVQuery
-=head3 SYNOPSIS
+=head3 QRY SYNOPSIS
$Q = new QRY(
new R(
@@ -729,7 +729,7 @@ sub _sfieldh {
$Q3 = QRY::Or($Q, $Q2);
print $Q3->A; # prints '(CCR5 CXCR4)[coreceptor] (ZA)[country]'
-=head3 DESCRIPTION
+=head3 QRY DESCRIPTION
The QRY package provides a query parser for
L<Bio::DB::Query::HIVQuery>. Currently, the parser supports AND, OR,
@@ -823,9 +823,7 @@ use overload
# QRY object will be translated into (possibly multiple) hashes
# conforming to HIVQuery parameter requirements.
-=head3 CLASS METHODS
-
-=head4 _make_q
+=head4 QRY _make_q
Title : _make_q
Usage : QRY::_make_q($parsetree)
@@ -862,7 +860,7 @@ sub _make_q {
return @dbq;
}
-=head4 _make_q_guts
+=head4 QRY _make_q_guts
Title : _make_q_guts (Internal class method)
Usage : _make_q_guts($ptree, $q_expr, $qarry, $anarry)
@@ -974,7 +972,7 @@ sub _make_q_guts {
: return 1;
}
-=head4 _parse_q
+=head4 QRY _parse_q
Title : _parse_q
Usage : QRY::_parse_q($query_string)
@@ -1045,7 +1043,7 @@ sub _parse_q {
## QRY constructor
-=head3 CONSTRUCTOR
+=head3 QRY CONSTRUCTOR
=head4 QRY Constructor
@@ -1070,9 +1068,9 @@ sub new {
## QRY instance methods
-=head3 INSTANCE METHODS
+=head3 QRY INSTANCE METHODS
-=head4 requests
+=head4 QRY requests
Title : requests
Usage : $QRY->requests
@@ -1089,7 +1087,7 @@ sub requests {
return @{$self->{'requests'}};
}
-=head4 put_requests
+=head4 QRY put_requests
Title : put_requests
Usage : $QRY->put_request(@R)
@@ -1110,7 +1108,7 @@ sub put_requests {
return @args;
}
-=head4 isnull
+=head4 QRY isnull
Title : isnull
Usage : $QRY->isnull
@@ -1126,7 +1124,7 @@ sub isnull {
return ($self->requests) ? 0 : 1;
}
-=head4 A
+=head4 QRY A
Title : A
Usage : print $QRY->A
@@ -1142,7 +1140,7 @@ sub A {
return join( "\n", map {$_->A} $self->requests );
}
-=head4 len
+=head4 QRY len
Title : len
Usage : $QRY->len
@@ -1158,7 +1156,7 @@ sub len {
return scalar @{$self->{'requests'}};
}
-=head4 clone
+=head4 QRY clone
Title : clone
Usage : $QRY2 = $QRY1->clone;
@@ -1181,9 +1179,9 @@ sub clone {
## QRY class methods
-=head3 CLASS METHODS
+=head3 QRY CLASS METHODS
-=head4 Or
+=head4 QRY Or
Title : Or
Usage : $QRY3 = QRY::Or($QRY1, $QRY2)
@@ -1237,7 +1235,7 @@ sub Or {
return new QRY( @ret_rq );
}
-=head4 And
+=head4 QRY And
Title : And
Usage : $QRY3 = QRY::And($QRY1, $QRY2)
@@ -1268,7 +1266,7 @@ sub And {
return new QRY( @ret_rq );
}
-=head4 Bool
+=head4 QRY Bool
Title : Bool
Usage : QRY::Bool($QRY1)
@@ -1285,7 +1283,7 @@ sub Bool {
return $q->isnull ? 0 : 1;
}
-=head4 Eq
+=head4 QRY Eq
Title : Eq
Usage : QRY::Eq($QRY1, $QRY2)
@@ -1319,7 +1317,7 @@ sub Eq {
=head2 Class R - request objects for QRY algebra
-=head3 SYNOPSIS
+=head3 R SYNOPSIS
$R = new R( $q1, $q2 );
$R->put_atoms($q3);
@@ -1334,7 +1332,7 @@ sub Eq {
QRY::Eq( new QRY(R::Or($R1, $R2)), new QRY($R1, $R2) ); # returns 1
R::In( (R::And($R1, $R2))[0], $R1 ); # returns 1
-=head3 DESCRIPTION
+=head3 R DESCRIPTION
Class R objects contain a list of atomic queries (class Q
objects). Each class R object represents a single HTTP request to the
@@ -1350,7 +1348,7 @@ $R::NULL = new R();
## R constructor
-=head3 CONSTRUCTOR
+=head3 R CONSTRUCTOR
=head4 R constructor
@@ -1375,9 +1373,9 @@ sub new {
## R instance methods
-=head3 INSTANCE METHODS
+=head3 R INSTANCE METHODS
-=head4 len
+=head4 R len
Title : len
Usage : $R->len
@@ -1393,7 +1391,7 @@ sub len {
return scalar @{[keys %{$self->{'atoms'}}]};
}
-=head4 atoms
+=head4 R atoms
Title : atoms
Usage : $R->atoms( [optional $field])
@@ -1415,7 +1413,7 @@ sub atoms {
return wantarray ? map { $self->{'atoms'}->{$_} } @flds : $self->{'atoms'}->{$flds[0]};
}
-=head4 fields
+=head4 R fields
Title : fields
Usage : $R->fields
@@ -1431,7 +1429,7 @@ sub fields {
return keys %{$self->{'atoms'}};
}
-=head4 put_atoms
+=head4 R put_atoms
Title : put_atoms
Usage : $R->put_atoms( @q )
@@ -1465,7 +1463,7 @@ sub put_atoms {
return;
}
-=head4 del_atoms
+=head4 R del_atoms
Title : del_atoms
Usage : $R->del_atoms( @qfields )
@@ -1490,7 +1488,7 @@ sub del_atoms {
return @ret;
}
-=head4 isnull
+=head4 R isnull
Title : isnull
Usage : $R->isnull
@@ -1506,7 +1504,7 @@ sub isnull {
return ($self->len) ? 0 : 1;
}
-=head4 A
+=head4 R A
Title : A
Usage : print $R->A
@@ -1523,7 +1521,7 @@ sub A {
return join(" ", map {$_->A} @a);
}
-=head4 clone
+=head4 R clone
Title : clone
Usage : $R2 = $R1->clone;
@@ -1546,9 +1544,9 @@ sub clone {
## R class methods
-=head3 CLASS METHODS
+=head3 R CLASS METHODS
-=head4 In
+=head4 R In
Title : In
Usage : R::In($R1, $R2)
@@ -1578,7 +1576,7 @@ sub In {
return 1;
}
-=head4 And
+=head4 R And
Title : And
Usage : @Rresult = R::And($R1, $R2)
@@ -1624,7 +1622,7 @@ sub And {
}
-=head4 Or
+=head4 R Or
Title : Or
Usage : @Rresult = R::Or($R1, $R2)
@@ -1672,7 +1670,7 @@ sub Or {
}
-=head4 Eq
+=head4 R Eq
Title : Eq
Usage : R::Eq($R1, $R2)
@@ -1703,7 +1701,7 @@ sub Eq {
=head2 Class Q - atomic query objects for QRY algebra
-=head3 SYNOPSIS
+=head3 Q SYNOPSIS
$q = new Q('coreceptor', 'CXCR4 CCR5');
$u = new Q('coreceptor', 'CXCR4');
@@ -1715,7 +1713,7 @@ sub Eq {
Q::qin($u, $q) # returns 1
Q::qeq(Q::qand($u, $q), $u ); # returns 1
-=head3 DESCRIPTION
+=head3 Q DESCRIPTION
Class Q objects represent atomic queries, that can be described by a
single LANL cgi parameter=value pair. Class R objects (requests) are
@@ -1731,7 +1729,7 @@ $Q::NULL = new Q();
## Q constructor
-=head3 CONSTRUCTOR
+=head3 Q CONSTRUCTOR
=head4 Q constructor
@@ -1758,9 +1756,9 @@ sub new {
## Q instance methods
-=head3 INSTANCE METHODS
+=head3 Q INSTANCE METHODS
-=head4 isnull
+=head4 Q isnull
Title : isnull
Usage : $q->isnull
@@ -1778,7 +1776,7 @@ sub isnull {
return 0;
}
-=head4 fld
+=head4 Q fld
Title : fld
Usage : $q->fld($field)
@@ -1802,7 +1800,7 @@ sub fld {
}
-=head4 dta
+=head4 Q dta
Title : dta
Usage : $q->dta($data)
@@ -1825,7 +1823,7 @@ sub dta {
return $self->{dta};
}
-=head4 A
+=head4 Q A
Title : A
Usage : print $q->A
@@ -1844,7 +1842,7 @@ sub A {
return "(".join(' ', sort {$a cmp $b} @a).")[".$self->fld."]";
}
-=head4 clone
+=head4 Q clone
Title : clone
Usage : $q2 = $q1->clone;
@@ -1864,9 +1862,9 @@ sub clone {
### Q class methods
-=head3 CLASS METHODS
+=head3 Q CLASS METHODS
-=head4 qin
+=head4 Q qin
Title : qin
Usage : Q::qin($q1, $q2)
@@ -1885,7 +1883,7 @@ sub qin {
return Q::qeq( $b, Q::qor($a, $b) );
}
-=head4 qeq
+=head4 Q qeq
Title : qeq
Usage : Q::qeq($q1, $q2)
@@ -1909,7 +1907,7 @@ sub qeq {
return @cd == @bd;
}
-=head4 qor
+=head4 Q qor
Title : qor
Usage : @qresult = Q::qor($q1, $q2)
@@ -1941,7 +1939,7 @@ sub qor {
return @ret;
}
-=head4 qand
+=head4 Q qand
Title : qand
Usage : @qresult = Q::And($q1, $q2)
@@ -1992,9 +1990,9 @@ sub qand {
}
}
-=head3 INTERNALS
+=head3 Q INTERNALS
-=head4 unique
+=head4 Q unique
Title : unique
Usage : @ua = unique(@a)
@@ -2016,7 +2014,7 @@ sub unique {
=head2 Additional tools for Bio::AnnotationCollectionI
-=head3 SYNOPSIS
+=head3 Bio::AnnotationCollectionI SYNOPSIS (additional methods)
$seq->annotation->put_value('patient_id', 1401)
$seq->annotation->get_value('patient_ids') # returns 1401
@@ -2027,9 +2025,11 @@ sub unique {
$blood_readings{$_} = $seq->annonation->get_value(['clinical', $_]);
}
-=head3 DESCRIPTION
+=head3 Bio::AnnotationCollectionI DESCRIPTION (additional methods)
-C<get_value()> and C<put_value> allow easy creation of and access to an annotation collection tree with nodes of L<Bio::Annotation::SimpleValue>. These methods obiviate direct accession of the SimpleValue objects.
+C<get_value()> and C<put_value> allow easy creation of and access to an
+annotation collection tree with nodes of L<Bio::Annotation::SimpleValue>. These
+methods obiviate direct accession of the SimpleValue objects.
=cut
@@ -2082,7 +2082,8 @@ sub get_value {
\@tagnames, $value (or as -KEYS=>\@tagnames, -VALUE=>$value )
Note : If intervening nodes do not exist, put_value creates them, replacing
existing nodes. So if $ac->put_value('x', 10) was done, then later,
- $ac->put_value(['x', 'y'], 20), the original value of 'x' is trashed, and $ac->get_value('x') will now return the annotation collection
+ $ac->put_value(['x', 'y'], 20), the original value of 'x' is trashed,
+ and $ac->get_value('x') will now return the annotation collection
with tagname 'y'.
=cut
View
17 Bio/DB/SeqFeature/Store.pm
@@ -114,26 +114,34 @@ with the following differences:
=over 4
-=item 1. No limitation on Bio::SeqFeatureI implementations
+=item 1.
+
+No limitation on Bio::SeqFeatureI implementations
Unlike Bio::DB::GFF, Bio::DB::SeqFeature::Store works with
any Bio::SeqFeatureI object.
-=item 2. No limitation on nesting of features & subfeatures
+=item 2.
+
+No limitation on nesting of features & subfeatures
Bio::DB::GFF is limited to features that have at most one
level of subfeature. Bio::DB::SeqFeature::Store can work with features
that have unlimited levels of nesting.
-=item 3. No aggregators
+=item 3.
+
+No aggregators
The aggregator architecture, which was necessary to impose order on
the GFF2 files that Bio::DB::GFF works with, does not apply to
Bio::DB::SeqFeature::Store. It is intended to store features that obey
well-defined ontologies, such as the Sequence Ontology
(http://song.sourceforge.net).
-=item 4. No relative locations
+=item 4.
+
+No relative locations
All locations defined by this module are relative to an absolute
sequence ID, unlike Bio::DB::GFF which allows you to define the
@@ -2506,7 +2514,6 @@ use the BioPerl bug tracking system to report bugs.
=head1 SEE ALSO
-L<bioperl>,
L<Bio::DB::SeqFeature>,
L<Bio::DB::SeqFeature::Store::GFF3Loader>,
L<Bio::DB::SeqFeature::Segment>,
View
6 Bio/Index/Stockholm.pm
@@ -172,10 +172,10 @@ sub fetch_report{
return $report->next_aln;
}
-=head2 fetch_report
+=head2 fetch_aln
- Title : fetch_report
- Usage : my $align = $idx->fetch_report($id);
+ Title : fetch_aln
+ Usage : my $align = $idx->fetch_aln($id);
Function: Returns a Bio::SimpleAlign object
for a specific alignment
Returns : Bio::SimpleAlign
View
48 Bio/Root/IO.pm
@@ -98,6 +98,10 @@ web:
Email hlapp@gmx.net
+=head1 CONTRIBUTORS
+
+Mark A. Jensen ( maj -at- fortinbras -dot- us )
+
=head1 APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
@@ -144,10 +148,10 @@ BEGIN {
}
eval {
- require LWP::Simple;
+ require LWP::UserAgent;
};
if( $@ ) {
- print STDERR "Cannot load LWP::Simple: $@" if( $VERBOSE > 0 );
+ print STDERR "Cannot load LWP::UserAgent: $@" if( $VERBOSE > 0 );
$HAS_LWP = 0;
} else {
$HAS_LWP = 1;
@@ -251,6 +255,13 @@ sub new {
-flush boolean flag to autoflush after each write
-noclose boolean flag, when set to true will not close a
filehandle (must explictly call close($io->_fh)
+ -retries number of times to try a web fetch before failure
+
+ -ua_parms hashref of key => value parameters to pass
+ to LWP::UserAgent->new()
+ (only meaningful with -url is set)
+ A useful value might be, for example,
+ { timeout => 60 } (ua default is 180 sec)
Returns : TRUE
Args : named parameters
@@ -262,27 +273,34 @@ sub _initialize_io {
$self->_register_for_cleanup(\&_io_cleanup);
- my ($input, $noclose, $file, $fh, $flush, $url) = $self->_rearrange([qw(INPUT
- NOCLOSE
- FILE FH
- FLUSH URL)], @args);
+ my ($input, $noclose, $file, $fh, $flush, $url,
+ $retries, $ua_parms) =
+ $self->_rearrange([qw(INPUT
+ NOCLOSE
+ FILE
+ FH
+ FLUSH
+ URL
+ RETRIES
+ UA_PARMS)], @args);
if($url){
- my $trymax = 5;
+ $retries ||= 5;
- if($HAS_LWP){ #use LWP::Simple::getstore()
- require LWP::Simple;
- #$self->warn("has lwp");
+ if($HAS_LWP){ #use LWP::UserAgent
+ require LWP::UserAgent;
+ my $ua = LWP::UserAgent->new(%$ua_parms);
my $http_result;
my($handle,$tempfile) = $self->tempfile();
CORE::close($handle);
+
- for(my $try = 1 ; $try <= $trymax ; $try++){
- $http_result = LWP::Simple::getstore($url, $tempfile);
- $self->warn("[$try/$trymax] tried to fetch $url, but server threw $http_result. retrying...") if $http_result != 200;
- last if $http_result == 200;
+ for(my $try = 1 ; $try <= $retries ; $try++){
+ $http_result = $ua->get($url, ':content_file' => $tempfile);
+ $self->warn("[$try/$retries] tried to fetch $url, but server threw " . $http_result->code . ". retrying...") if !$http_result->is_success;
+ last if $http_result->is_success;
}
- $self->throw("failed to fetch $url, server threw $http_result") if $http_result != 200;
+ $self->throw("failed to fetch $url, server threw " . $http_result->code) if !$http_result->is_success;
$input = $tempfile;
$file = $tempfile;
View
29 Bio/Search/HSP/ModelHSP.pm
@@ -404,33 +404,6 @@ sub get_aln {
return $aln;
}
-=head2 seq_inds
-
- Title : seq_inds
- Purpose : Get a list of residue positions (indices) for all identical
- : or conserved residues in the query or sbjct sequence.
- Example : @s_ind = $hsp->seq_inds('query', 'identical');
- : @h_ind = $hsp->seq_inds('hit', 'conserved');
- : @h_ind = $hsp->seq_inds('hit', 'conserved', 1);
- Returns : List of integers
- : May include ranges if collapse is true.
- Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = query)
- : ('sbjct' is synonymous with 'hit')
- : class = 'identical' or 'conserved' or 'nomatch' or 'gap'
- : (default = identical)
- : (can be shortened to 'id' or 'cons')
- :
- : collapse = boolean, if true, consecutive positions are merged
- : using a range notation, e.g., "1 2 3 4 5 7 9 10 11"
- : collapses to "1-5 7 9-11". This is useful for
- : consolidating long lists. Default = no collapse.
- Throws : n/a.
- Comments :
-
-See Also : L<Bio::Search::BlastUtils::collapse_nums()|Bio::Search::BlastUtils>, L<Bio::Search::Hit::HitI::seq_inds()|Bio::Search::Hit::HitI>
-
-=cut
-
=head2 Inherited from Bio::SeqFeature::SimilarityPair
These methods come from Bio::SeqFeature::SimilarityPair
@@ -488,7 +461,7 @@ These methods come from Bio::SeqFeature::SimilarityPair
The following methods have been overridden due to their current reliance on
sequence-based queries. They may be implemented in future versions of this class.
-=head2 frac_identical
+=head2 seq_inds
=cut
View
2,465 Bio/SeqIO/chadoxml.pm
@@ -55,7 +55,7 @@ This is currently a write-only module.
-seq_so_type=>'gene',
-src_feature=>'X',
-src_feat_type=>'chromosome_arm',
- -nounflatten=>1,
+ -nounflatten=>1,
-is_analysis=>'true',
-data_source=>'GenBank');
@@ -80,64 +80,64 @@ containment hierarchy conforming to chado central dogma model: gene
Destination of data in the subject Bio::Seq object $seq is as following:
- *$seq->display_id: name of the top-level feature;
+ *$seq->display_id: name of the top-level feature;
- *$seq->accession_number: if defined, uniquename and
- feature_dbxref of the top-level
- feature if not defined,
- $seq->display_id is used as the
- uniquename of the top-level feature;
+ *$seq->accession_number: if defined, uniquename and
+ feature_dbxref of the top-level
+ feature if not defined,
+ $seq->display_id is used as the
+ uniquename of the top-level feature;
- *$seq->molecule: transformed to SO type, used as the feature
- type of the top-level feature if -seq_so_type
- argument is supplied, use the supplied SO type
- as the feature type of the top-level feature;
+ *$seq->molecule: transformed to SO type, used as the feature
+ type of the top-level feature if -seq_so_type
+ argument is supplied, use the supplied SO type
+ as the feature type of the top-level feature;
- *$seq->species: organism of the top-level feature;
+ *$seq->species: organism of the top-level feature;
- *$seq->seq: residues of the top-level feature;
+ *$seq->seq: residues of the top-level feature;
- *$seq->is_circular, $seq->division: feature_cvterm;
+ *$seq->is_circular, $seq->division: feature_cvterm;
- *$seq->keywords, $seq->desc, comments: featureprop;
+ *$seq->keywords, $seq->desc, comments: featureprop;
- *references: pub and feature_pub;
- medline/pubmed ids: pub_dbxref;
- comments: pubprop;
+ *references: pub and feature_pub;
+ medline/pubmed ids: pub_dbxref;
+ comments: pubprop;
- *feature "source" span: featureloc for top-level feature;
+ *feature "source" span: featureloc for top-level feature;
- *feature "source" db_xref: feature_dbxref for top-level feature;
+ *feature "source" db_xref: feature_dbxref for top-level feature;
- *feature "source" other tags: featureprop for top-level feature;
+ *feature "source" other tags: featureprop for top-level feature;
- *subfeature 'symbol' or 'label' tag: feature uniquename, if
+ *subfeature 'symbol' or 'label' tag: feature uniquename, if
none of these is present, the chadoxml object
generates feature uniquenames as:
<gene>-<feature_type>-<span>
(e.g. foo-mRNA--1000..3000);
- *gene model: feature_relationship built based on the
+ *gene model: feature_relationship built based on the
containment hierarchy;
- *feature span: featureloc;
+ *feature span: featureloc;
- *feature accession numbers: feature_dbxref;
+ *feature accession numbers: feature_dbxref;
- *feature tags (except db_xref, symbol and gene): featureprop;
+ *feature tags (except db_xref, symbol and gene): featureprop;
Things to watch out for:
- *chado schema change: this version works with the chado
+ *chado schema change: this version works with the chado
version tagged chado_1_01 in GMOD CVS.
- *feature uniquenames: especially important if using XORT
+ *feature uniquenames: especially important if using XORT
loader to do incremental load into
chado. may need pre-processing of the
source data to put the correct
uniquenames in place.
- *pub uniquenames: chadoxml->write_seq() has the FlyBase policy
+ *pub uniquenames: chadoxml->write_seq() has the FlyBase policy
on pub uniquenames hard-coded, it assigns
pub uniquenames in the following way: for
journals and books, use ISBN number; for
@@ -147,7 +147,7 @@ Things to watch out for:
implement your policy. look for the comments
in the code.
- *for pubs possibly existing in chado but with no knowledge of
+ *for pubs possibly existing in chado but with no knowledge of
its uniquename:put "op" as "match", then need to run the
output chadoxml through a special filter that
talks to chado database and tries to find the
@@ -160,9 +160,9 @@ Things to watch out for:
case. please modify to work according to your
rules.
- *chado initialization for loading:
+ *chado initialization for loading:
- cv & cvterm: in the output chadoxml, all cv's and
+ cv & cvterm: in the output chadoxml, all cv's and
cvterm's are lookup only. Therefore,
before using XORT loader to load the
output into chado, chado must be
@@ -247,29 +247,29 @@ undef(my %datahash); #data from Bio::Seq object stored in a hash
my $chadotables = 'feature featureprop feature_relationship featureloc feature_cvterm cvterm cv feature_pub pub pub_dbxref pub_author author pub_relationship pubprop feature_dbxref dbxref db synonym feature_synonym';
my %fkey = (
- "cvterm.cv_id" => "cv",
+ "cvterm.cv_id" => "cv",
"cvterm.dbxref_id" => "dbxref",
- "dbxref.db_id" => "db",
- "feature.type_id" => "cvterm",
- "feature.organism_id" => "organism",
- "feature.dbxref_id" => "dbxref",
- "featureprop.type_id" => "cvterm",
- "feature_pub.pub_id" => "pub",
- "feature_cvterm.cvterm_id" => "cvterm",
- "feature_cvterm.pub_id" => "pub",
+ "dbxref.db_id" => "db",
+ "feature.type_id" => "cvterm",
+ "feature.organism_id" => "organism",
+ "feature.dbxref_id" => "dbxref",
+ "featureprop.type_id" => "cvterm",
+ "feature_pub.pub_id" => "pub",
+ "feature_cvterm.cvterm_id" => "cvterm",
+ "feature_cvterm.pub_id" => "pub",
"feature_cvterm.feature_id" => "feature",
- "feature_dbxref.dbxref_id" => "dbxref",
- "feature_relationship.object_id" => "feature",
- "feature_relationship.subject_id" => "feature",
- "feature_relationship.type_id" => "cvterm",
- "featureloc.srcfeature_id" => "feature",
- "pub.type_id" => "cvterm",
- "pub_dbxref.dbxref_id" => "dbxref",
- "pub_author.author_id" => "author",
- "pub_relationship.obj_pub_id" => "pub",
- "pub_relationship.subj_pub_id" => "pub",
- "pub_relationship.type_id" => "cvterm",
- "pubprop.type_id" => "cvterm",
+ "feature_dbxref.dbxref_id" => "dbxref",
+ "feature_relationship.object_id" => "feature",
+ "feature_relationship.subject_id" => "feature",
+ "feature_relationship.type_id" => "cvterm",
+ "featureloc.srcfeature_id" => "feature",
+ "pub.type_id" => "cvterm",
+ "pub_dbxref.dbxref_id" => "dbxref",
+ "pub_author.author_id" => "author",
+ "pub_relationship.obj_pub_id" => "pub",
+ "pub_relationship.subj_pub_id" => "pub",
+ "pub_relationship.type_id" => "cvterm",
+ "pubprop.type_id" => "cvterm",
"feature_synonym.feature_id" => "feature",
"feature_synonym.synonym_id" => "synonym",
"feature_synonym.pub_id" => "pub",
@@ -283,22 +283,22 @@ my %cv_name = (
);
my %feattype_args2so = (
- "aberr" => "aberration_junction",
-# "conflict" => "sequence_difference",
-# "polyA_signal" => "polyA_signal_sequence",
- "variation" => "sequence_variant",
- "mutation1" => "point_mutation", #for single-base mutation
- "mutation2" => "sequence_variant", #for multi-base mutation
- "rescue" => "rescue_fragment",
-# "rfrag" => "restriction_fragment",
- "protein_bind" => "protein_binding_site",
- "misc_feature" => "region",
-# "prim_transcript" => "primary_transcript",
- "CDS" => "polypeptide",
- "reg_element" => "regulatory_region",
- "seq_variant" => "sequence_variant",
- "mat_peptide" => "mature_peptide",
- "sig_peptide" => "signal_peptide",
+ "aberr" => "aberration_junction",
+# "conflict" => "sequence_difference",
+# "polyA_signal" => "polyA_signal_sequence",
+ "variation" => "sequence_variant",
+ "mutation1" => "point_mutation", #for single-base mutation
+ "mutation2" => "sequence_variant", #for multi-base mutation
+ "rescue" => "rescue_fragment",
+# "rfrag" => "restriction_fragment",
+ "protein_bind" => "protein_binding_site",
+ "misc_feature" => "region",
+# "prim_transcript" => "primary_transcript",
+ "CDS" => "polypeptide",
+ "reg_element" => "regulatory_region",
+ "seq_variant" => "sequence_variant",
+ "mat_peptide" => "mature_peptide",
+ "sig_peptide" => "signal_peptide",
);
undef(my %organism);
@@ -328,99 +328,103 @@ sub _initialize {
Title : write_seq
Usage : $stream->write_seq(-seq=>$seq, -seq_so_type=>$seqSOtype,
- -src_feature=>$srcfeature,
- -src_feat_type=>$srcfeattype,
- -nounflatten=>0 or 1,
- -is_analysis=>'true' or 'false',
- -data_source=>$datasource)
+ -src_feature=>$srcfeature,
+ -src_feat_type=>$srcfeattype,
+ -nounflatten=>0 or 1,
+ -is_analysis=>'true' or 'false',
+ -data_source=>$datasource)
Function: writes the $seq object (must be seq) into chadoxml.
- Current implementation:
- 1. for non-mRNA records,
- a top-level feature of type $seq->alphabet is
- generated for the whole GenBank record, features listed
- are unflattened for DNA records to build gene model
- feature graph, and for the other types of records all
- features in $seq are treated as subfeatures of the top-level
- feature.
- 2. for mRNA records,
- if a 'gene' feature is present, it B<must> have a /symbol
- or /label tag to contain the uniquename of the gene. a top-
- level feature of type 'gene' is generated. the mRNA is written
- as a subfeature of the top-level gene feature, and the other
- sequence features listed in $seq are treated as subfeatures
- of the mRNA feature.
Returns : 1 for success and 0 for error
+ Args : A Bio::Seq object $seq, optional $seqSOtype, $srcfeature,
+ $srcfeattype, $nounflatten, $is_analysis and $data_source.
+When $srcfeature (a string, the uniquename of the source feature) is given, the
+location and strand information of the top-level feature against the source
+feature will be derived from the sequence feature called 'source' of the $seq
+object, a featureloc record is generated for the top -level feature on
+$srcfeature. when $srcfeature is given, $srcfeattype must also be present. All
+feature coordinates in $seq should be against $srcfeature. $seqSOtype is the
+optional SO term to use as the type of the top-level feature. For example, a
+GenBank data file for a Drosophila melanogaster genome scaffold has the molecule
+type of "DNA", when converting to chadoxml, a $seqSOtype argument of
+"golden_path_region" can be supplied to save the scaffold as a feature of type
+"golden_path_region" in chadoxml, instead of "DNA". a feature with primary tag
+of 'source' must be present in the sequence feature list of $seq, to decribe the
+whole sequence record.
- Args : A Bio::Seq object $seq, optional $seqSOtype, $srcfeature,
- $srcfeattype, $nounflatten, $is_analysis and $data_source.
- when $srcfeature (a string, the uniquename of the source
- feature) is given, the location and strand information of
- the top-level feature against the source feature will be
- derived from the sequence feature called 'source' of the
- $seq object, a featureloc record is generated for the top
- -level feature on $srcfeature. when $srcfeature is given,
- $srcfeattype must also be present. All feature coordinates
- in $seq should be against $srcfeature. $seqSOtype is the
- optional SO term to use as the type of the top-level feature.
- For example, a GenBank data file for a Drosophila melanogaster
- genome scaffold has the molecule type of "DNA", when
- converting to chadoxml, a $seqSOtype argument of
- "golden_path_region" can be supplied to save the scaffold
- as a feature of type "golden_path_region" in chadoxml, instead
- of "DNA". a feature with primary tag of 'source' must be
- present in the sequence feature list of $seq, to decribe the
- whole sequence record.
+In the current implementation:
+
+=over 3
+
+=item *
+
+non-mRNA records
+
+A top-level feature of type $seq-E<gt>alphabet is generated for the whole GenBank
+record, features listed are unflattened for DNA records to build gene model
+feature graph, and for the other types of records all features in $seq are
+treated as subfeatures of the top-level feature.
+=item *
+
+mRNA records
+
+If a 'gene' feature is present, it B<must> have a /symbol or /label tag to
+contain the uniquename of the gene. a top-level feature of type 'gene' is
+generated. the mRNA is written as a subfeature of the top-level gene feature,
+and the other sequence features listed in $seq are treated as subfeatures of the
+mRNA feature.
+
+=back
=cut
sub write_seq {
- my $usage = <<EOUSAGE;
+ my $usage = <<EOUSAGE;
Bio::SeqIO::chadoxml->write_seq()
Usage : \$stream->write_seq(-seq=>\$seq,
- -seq_so_type=>\$SOtype,
- -src_feature=>\$srcfeature,
- -src_feat_type=>\$srcfeattype,
- -nounflatten=>0 or 1,
+ -seq_so_type=>\$SOtype,
+ -src_feature=>\$srcfeature,
+ -src_feat_type=>\$srcfeattype,
+ -nounflatten=>0 or 1,
-is_analysis=>'true' or 'false',
-data_source=>\$datasource)
-Args : \$seq : a Bio::Seq object
- \$SOtype : the SO term to use as the feature type of
- the \$seq record, optional
- \$srcfeature : unique name of the source feature, a string
- containing at least one alphabetical letter
- (a-z, A-Z), optional
- \$srcfeattype : feature type of \$srcfeature. one of SO terms.
- optional
- when \$srcfeature is given, \$srcfeattype becomes mandatory,
- \$datasource : source of the sequence annotation data,
- e.g. 'GenBank' or 'GFF'.
+Args : \$seq : a Bio::Seq object
+ \$SOtype : the SO term to use as the feature type of
+ the \$seq record, optional
+ \$srcfeature : unique name of the source feature, a string
+ containing at least one alphabetical letter
+ (a-z, A-Z), optional
+ \$srcfeattype : feature type of \$srcfeature. one of SO terms.
+ optional
+ when \$srcfeature is given, \$srcfeattype becomes mandatory,
+ \$datasource : source of the sequence annotation data,
+ e.g. 'GenBank' or 'GFF'.
EOUSAGE
- my ($self,@args) = @_;
+ my ($self,@args) = @_;
- my ($seq, $seq_so_type, $srcfeature, $srcfeattype, $nounflatten, $isanalysis, $datasource, $genus, $species) =
- $self->_rearrange([qw(SEQ
- SEQ_SO_TYPE
- SRC_FEATURE
- SRC_FEAT_TYPE
- NOUNFLATTEN
- IS_ANALYSIS
- DATA_SOURCE
+ my ($seq, $seq_so_type, $srcfeature, $srcfeattype, $nounflatten, $isanalysis, $datasource, $genus, $species) =
+ $self->_rearrange([qw(SEQ
+ SEQ_SO_TYPE
+ SRC_FEATURE
+ SRC_FEAT_TYPE
+ NOUNFLATTEN
+ IS_ANALYSIS
+ DATA_SOURCE
GENUS
SPECIES
- )],
- @args);
- #print "$seq_so_type, $srcfeature, $srcfeattype\n";
+ )],
+ @args);
+ #print "$seq_so_type, $srcfeature, $srcfeattype\n";
- if( !defined $seq ) {
- $self->throw("Attempting to write with no seq!");
- }
+ if( !defined $seq ) {
+ $self->throw("Attempting to write with no seq!");
+ }
- if( ! ref $seq || ! $seq->isa('Bio::Seq::RichSeqI') ) {
- ## FIXME $self->warn(" $seq is not a RichSeqI compliant module. Attempting to dump, but may fail!");
- }
+ if( ! ref $seq || ! $seq->isa('Bio::Seq::RichSeqI') ) {
+ ## FIXME $self->warn(" $seq is not a RichSeqI compliant module. Attempting to dump, but may fail!");
+ }
# try to get the srcfeature from the seqFeature object
# for this to work, the user has to pass in the srcfeature type
@@ -430,124 +434,124 @@ EOUSAGE
}
}
- #$srcfeature, when provided, should contain at least one alphabetical letter
- if (defined $srcfeature)
- {
- if ($srcfeature =~ /[a-zA-Z]/)
- {
- chomp($srcfeature);
- } else {
- $self->throw( $usage );
- }
-
- #check for mandatory $srcfeattype
- if (! defined $srcfeattype)
- {
- $self->throw( $usage );
- #$srcfeattype must be a string of non-whitespace characters
- } else {
- if ($srcfeattype =~ /\S+/) {
- chomp($srcfeattype);
- } else {
- $self->throw( $usage );
- }
- }
- }
-
- # variables local to write_seq()
+ #$srcfeature, when provided, should contain at least one alphabetical letter
+ if (defined $srcfeature)
+ {
+ if ($srcfeature =~ /[a-zA-Z]/)
+ {
+ chomp($srcfeature);
+ } else {
+ $self->throw( $usage );
+ }
+
+ #check for mandatory $srcfeattype
+ if (! defined $srcfeattype)
+ {
+ $self->throw( $usage );
+ #$srcfeattype must be a string of non-whitespace characters
+ } else {
+ if ($srcfeattype =~ /\S+/) {
+ chomp($srcfeattype);
+ } else {
+ $self->throw( $usage );
+ }
+ }
+ }
+
+ # variables local to write_seq()
my $div = undef;
- my $hkey = undef;
- undef(my @top_featureprops);
+ my $hkey = undef;
+ undef(my @top_featureprops);
undef(my @featuresyns);
undef(my @top_featurecvterms);
- my $name = $seq->display_id if $seq->can('display_id');
+ my $name = $seq->display_id if $seq->can('display_id');
$name = $seq->display_name if $seq->can('display_name');
- undef(my @feature_cvterms);
- undef(my %sthash);
- undef(my %dvhash);
- undef(my %h1);
- undef(my %h2);
- my $temp = undef;
- my $ann = undef;
- undef(my @references);
- undef(my @feature_pubs);
- my $ref = undef;
- my $location = undef;
- my $fbrf = undef;
- my $journal = undef;
- my $issue = undef;
- my $volume = undef;
- my $volumeissue = undef;
- my $pages = undef;
- my $year = undef;
- my $pubtype = undef;
-# my $miniref= undef;
- my $uniquename = undef;
- my $refhash = undef;
- my $feat = undef;
- my $tag = undef;
- my $tag_cv = undef;
- my $ftype = undef;
- my $subfeatcnt = undef;
- undef(my @top_featrels);
- undef (my %srcfhash);
-
- local($^W) = 0; # supressing warnings about uninitialized fields.
+ undef(my @feature_cvterms);
+ undef(my %sthash);
+ undef(my %dvhash);
+ undef(my %h1);
+ undef(my %h2);
+ my $temp = undef;
+ my $ann = undef;
+ undef(my @references);
+ undef(my @feature_pubs);
+ my $ref = undef;
+ my $location = undef;
+ my $fbrf = undef;
+ my $journal = undef;
+ my $issue = undef;
+ my $volume = undef;
+ my $volumeissue = undef;
+ my $pages = undef;
+ my $year = undef;
+ my $pubtype = undef;
+# my $miniref= undef;
+ my $uniquename = undef;
+ my $refhash = undef;
+ my $feat = undef;
+ my $tag = undef;
+ my $tag_cv = undef;
+ my $ftype = undef;
+ my $subfeatcnt = undef;
+ undef(my @top_featrels);
+ undef (my %srcfhash);
+
+ local($^W) = 0; # supressing warnings about uninitialized fields.
if (!$name && $seq->can('attributes') ) {
($name) = $seq->attributes('Alias');
}
- if ($seq->can('accession_number') && defined $seq->accession_number && $seq->accession_number ne 'unknown') {
- $uniquename = $seq->accession_number;
- } elsif ($seq->can('accession') && defined $seq->accession && $seq->accession ne 'unknown') {
- $uniquename = $seq->accession;
- } elsif ($seq->can('attributes')) {
+ if ($seq->can('accession_number') && defined $seq->accession_number && $seq->accession_number ne 'unknown') {
+ $uniquename = $seq->accession_number;
+ } elsif ($seq->can('accession') && defined $seq->accession && $seq->accession ne 'unknown') {
+ $uniquename = $seq->accession;
+ } elsif ($seq->can('attributes')) {
($uniquename) = $seq->attributes('load_id');
} else {
- $uniquename = $name;
- }
+ $uniquename = $name;
+ }
my $len = $seq->length();
- if ($len == 0) {
- $len = undef;
- }
-
- undef(my $gb_type);
- if (!$seq->can('molecule') || ! defined ($gb_type = $seq->molecule()) ) {
- $gb_type = $seq->can('alphabet') ? $seq->alphabet : 'DNA';
- }
- $gb_type = 'DNA' if $ftype eq 'dna';
- $gb_type = 'RNA' if $ftype eq 'rna';
-
- if(length $seq_so_type > 0) {
- if (defined $seq_so_type) {
- $ftype = $seq_so_type;
- }
- elsif ($seq->type) {
- $ftype = ($seq->type =~ /(.*):/)
-