Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

sync with main trunk

svn path=/bioperl-live/branches/branch-1-6/; revision=16147
  • Loading branch information...
commit 086fcff6fd99ca99c0590e5a6d6702b3e0da4394 1 parent 89fe0dd
cjfields authored
View
123 Bio/Align/DNAStatistics.pm
@@ -73,21 +73,37 @@ in brackets are the pattern which will match
=over 3
-=item JukesCantor [jc|jukes|jukescantor|jukes-cantor]
+=item *
-=item Uncorrected [jcuncor|uncorrected]
+JukesCantor [jc|jukes|jukescantor|jukes-cantor]
-=item F81 [f81|felsenstein]
+=item *
-=item Kimura [k2|k2p|k80|kimura]
+Uncorrected [jcuncor|uncorrected]
-=item Tamura [t92|tamura|tamura92]
+=item *
-=item F84 [f84|felsenstein84]
+F81 [f81|felsenstein]
-=item TajimaNei [tajimanei|tajima\-nei]
+=item *
-=item JinNei [jinnei|jin\-nei] (not implemented)
+Kimura [k2|k2p|k80|kimura]
+
+=item *
+
+Tamura [t92|tamura|tamura92]
+
+=item *
+
+F84 [f84|felsenstein84]
+
+=item *
+
+TajimaNei [tajimanei|tajima\-nei]
+
+=item *
+
+JinNei [jinnei|jin\-nei] (not implemented)
=back
@@ -104,7 +120,7 @@ several pre-requisites for the alignment.
=item 1
DNA alignment must be based on protein alignment. Use the subroutine
-L<aa_to_dna_aln> in Bio::Align::Utilities to achieve this.
+L<Bio::Align::Utilities/aa_to_dna_aln> to achieve this.
=item 2
@@ -140,49 +156,53 @@ comparisons in an MSA. The statistics returned are:
=over 3
-=item S_d
+=item *
-Number of synonymous mutations between the 2 sequences.
+S_d - Number of synonymous mutations between the 2 sequences.
-=item N_d
+=item *
-Number of non-synonymous mutations between the 2 sequences.
+N_d - Number of non-synonymous mutations between the 2 sequences.
-=item S
+=item *
-Mean number of synonymous sites in both sequences.
+S - Mean number of synonymous sites in both sequences.
-=item N
+=item *
-mean number of synonymous sites in both sequences.
+N - mean number of synonymous sites in both sequences.
-=item P_s
+=item *
-proportion of synonymous differences in both sequences given by P_s = S_d/S.
+P_s - proportion of synonymous differences in both sequences given by
+P_s = S_d/S.
-=item P_n
+=item *
-proportion of non-synonymous differences in both sequences given by P_n = S_n/S.
+P_n - proportion of non-synonymous differences in both sequences given
+by P_n = S_n/S.
-=item D_s
+=item *
-estimation of synonymous mutations per synonymous site (by Jukes-Cantor).
+D_s - estimation of synonymous mutations per synonymous site (by
+Jukes-Cantor).
-=item D_n
+=item *
-estimation of non-synonymous mutations per non-synonymous site (by Jukes-Cantor).
+D_n - estimation of non-synonymous mutations per non-synonymous site (by
+Jukes-Cantor).
-=item D_n_var
+=item *
-estimation of variance of D_n .
+D_n_var - estimation of variance of D_n .
-=item D_s_var
+=item *
-estimation of variance of S_n.
+D_s_var - estimation of variance of S_n.
-=item z_value
+=item *
-calculation of z value.Positive value indicates D_n E<gt> D_s,
+z_value - calculation of z value.Positive value indicates D_n E<gt> D_s,
negative value indicates D_s E<gt> D_n.
=back
@@ -191,25 +211,25 @@ The statistics returned by calc_average_KaKs are:
=over 3
-=item D_s
+=item *
-Average number of synonymous mutations/synonymous site.
+D_s - Average number of synonymous mutations/synonymous site.
-=item D_n
+=item *
-Average number of non-synonymous mutations/non-synonymous site.
+D_n - Average number of non-synonymous mutations/non-synonymous site.
-=item D_s_var
+=item *
-Estimated variance of Ds from bootstrapped alignments.
+D_s_var - Estimated variance of Ds from bootstrapped alignments.
-=item D_n_var
+=item *
-Estimated variance of Dn from bootstrapped alignments.
+D_n_var - Estimated variance of Dn from bootstrapped alignments.
-=item z_score
+=item *
-calculation of z value. Positive value indicates D_n E<gt>D_s,
+z_score - calculation of z value. Positive value indicates D_n E<gt>D_s,
negative values vice versa.
=back
@@ -222,7 +242,6 @@ the book, and reproduce those results. If people like having this sort
of analysis in BioPerl other methods for estimating Ds and Dn can be
provided later.
-
Much of the DNA distance code is based on implementations in EMBOSS
(Rice et al, www.emboss.org) [distmat.c] and PHYLIP (J. Felsenstein et
al) [dnadist.c]. Insight also gained from Eddy, Durbin, Krogh, &
@@ -232,26 +251,36 @@ Mitchison.
=over 3
-=item D_JukesCantor
+=item *
+
+D_JukesCantor
"Phylogenetic Inference", Swoffrod, Olsen, Waddell and Hillis, in
Mol. Systematics, 2nd ed, 1996, Ch 11. Derived from "Evolution of
Protein Molecules", Jukes & Cantor, in Mammalian Prot. Metab., III,
1969, pp. 21-132.
-=item D_Tamura
+=item *
+
+D_Tamura
K Tamura, Mol. Biol. Evol. 1992, 9, 678.
-=item D_Kimura
+=item *
+
+D_Kimura
M Kimura, J. Mol. Evol., 1980, 16, 111.
-=item JinNei
+=item *
+
+JinNei
Jin and Nei, Mol. Biol. Evol. 82, 7, 1990.
-=item D_TajimaNei
+=item *
+
+D_TajimaNei
Tajima and Nei, Mol. Biol. Evol. 1984, 1, 269.
View
16 Bio/AnalysisI.pm
@@ -198,8 +198,8 @@ sub describe { shift->throw_not_implemented(); }
The analysis input data are named, and can be also associated with a
default value, with allowed values and with few other attributes. The
names are important for feeding the service with the input data (the
-inputs are given to methods C<create_job>, C<run>, and/or C<wait_for>
-as name/value pairs).
+inputs are given to methods C<create_job>, C<Bio::AnalysisI|run>, and/or
+C<Bio::AnalysisI|wait_for> as name/value pairs).
Here is a (slightly shortened) example of an input specification:
@@ -324,8 +324,8 @@ tool.
Call this method if you wish to "stage the scene" - to create a job
with all input data but without actually running it. This method is
-called automatically from other methods (C<run> and C<wait_for>) so
-usually you do not need to call it directly.
+called automatically from other methods (C<Bio::AnalysisI|run> and
+C<Bio::AnalysisI|wait_for>) so usually you do not need to call it directly.
The input data and prameters for this execution can be specified in
various ways:
@@ -459,7 +459,7 @@ sub id { shift->throw_not_implemented(); }
# -----------------------------------------------------------------------------
-=head2 run
+=head2 Bio::AnalysisI::JobI::run
Usage : $job->run
Returns : itself
@@ -467,8 +467,8 @@ sub id { shift->throw_not_implemented(); }
It starts previously created job. The job already must have all input
data filled-in. This differs from the method of the same name of the
-C<Bio::Tools::Run::Analysis> object where the C<run> method creates
-also a new job allowing to set input data.
+C<Bio::Tools::Run::Analysis> object where the C<Bio::AnalysisI::JobI::run> method
+creates also a new job allowing to set input data.
=cut
@@ -476,7 +476,7 @@ sub run { shift->throw_not_implemented(); }
# -----------------------------------------------------------------------------
-=head2 wait_for
+=head2 Bio::AnalysisI::JobI::wait_for
Usage : $job->wait_for
Returns : itself
View
2  Bio/Assembly/Tools/ContigSpectrum.pm
@@ -785,7 +785,7 @@ sub average {
}
-=head2 average
+=head2 score
Title : score
Usage : my $score = $csp->score();
View
117 Bio/DB/GFF.pm
@@ -94,7 +94,9 @@ directory under a subdirectory named Bio::DB::GFF:
=over 4
-=item bp_load_gff.pl
+=item *
+
+bp_load_gff.pl
This script will load a Bio::DB::GFF database from a flat GFF file of
sequence annotations. Only the relational database version of
@@ -108,7 +110,9 @@ for most of their functionality.
load_gff.pl also has a --upgrade option, which will perform a
non-destructive upgrade of older schemas to newer ones.
-=item bp_bulk_load_gff.pl
+=item *
+
+bp_bulk_load_gff.pl
This script will populate a Bio::DB::GFF database from a flat GFF file
of sequence annotations. Only the MySQL database version of
@@ -120,7 +124,9 @@ This script takes a --fasta argument to load raw DNA into the database
as well. However, GFF databases do not require access to the raw DNA
for most of their functionality.
-=item bp_fast_load_gff.pl
+=item *
+
+bp_fast_load_gff.pl
This script is as fast as bp_bulk_load_gff.pl but uses Unix pipe
tricks to allow for incremental updates. It only supports the MySQL
@@ -129,13 +135,17 @@ non-Unix platforms.
Arguments are the same as bp_load_gff.pl
-=item gadfly_to_gff.pl
+=item *
+
+gadfly_to_gff.pl
This script will convert the GFF-like format used by the Berkeley
Drosophila Sequencing project into a format suitable for use with this
module.
-=item sgd_to_gff.pl
+=item *
+
+sgd_to_gff.pl
This script will convert the tab-delimited feature files used by the
Saccharomyces Genome Database into a format suitable for use with this
@@ -155,13 +165,17 @@ The 9 columns are as follows:
=over 4
-=item 1. reference sequence
+=item 1.
+
+reference sequence
This is the ID of the sequence that is used to establish the
coordinate system of the annotation. In the example above, the
reference sequence is "Chr1".
-=item 2. source
+=item 2.
+
+source
The source of the annotation. This field describes how the annotation
was derived. In the example above, the source is "curated" to
@@ -169,22 +183,30 @@ indicate that the feature is the result of human curation. The names
and versions of software programs are often used for the source field,
as in "tRNAScan-SE/1.2".
-=item 3. method
+=item 3.
+
+method
The annotation method. This field describes the type of the
annotation, such as "CDS". Together the method and source describe
the annotation type.
-=item 4. start position
+=item 4.
+
+start position
The start of the annotation relative to the reference sequence.
-=item 5. stop position
+=item 5.
+
+stop position
The stop of the annotation relative to the reference sequence. Start
is always less than or equal to stop.
-=item 6. score
+=item 6.
+
+score
For annotations that are associated with a numeric score (for example,
a sequence similarity), this field describes the score. The score
@@ -192,20 +214,26 @@ units are completely unspecified, but for sequence similarities, it is
typically percent identity. Annotations that don't have a score can
use "."
-=item 7. strand
+=item 7.
+
+strand
For those annotations which are strand-specific, this field is the
strand on which the annotation resides. It is "+" for the forward
strand, "-" for the reverse strand, or "." for annotations that are
not stranded.
-=item 8. phase
+=item 8.
+
+phase
For annotations that are linked to proteins, this field describes the
phase of the annotation on the codons. It is a number from 0 to 2, or
"." for features that have no phase.
-=item 9. group
+=item 9.
+
+group
GFF provides a simple way of generating annotation hierarchies ("is
composed of" relationships) by providing a group field. The group
@@ -315,13 +343,17 @@ specifying which tag to group on:
=over 4
-=item Using -preferred_groups
+=item *
+
+Using -preferred_groups
When you create a Bio::DB::GFF object, pass it a -preferred_groups=E<gt>
argument. This specifies a tag that will be used for grouping. You
can pass an array reference to specify a list of such tags.
-=item In the GFF header
+=item *
+
+In the GFF header
The GFF file itself can specify which tags are to be used for
grouping. Insert a comment like the following:
@@ -409,7 +441,9 @@ it adaptable to use with a variety of databases.
=over 4
-=item Adaptors
+=item *
+
+Adaptors
The core of the module handles the user API, annotation coordinate
arithmetic, and other common issues. The details of fetching
@@ -441,7 +475,9 @@ There are currently five adaptors recommended for general use:
Check the Bio/DB/GFF/Adaptor directory and subdirectories for other,
more specialized adaptors, as well as experimental ones.
-=item Aggregators
+=item *
+
+Aggregators
The GFF format uses a "group" field to indicate aggregation properties
of individual features. For example, a set of exons and introns may
@@ -513,7 +549,7 @@ has some limitations.
=over 4
-=item 1. GFF version string is required
+=item GFF version string is required
The GFF file B<must> contain the version comment:
@@ -523,7 +559,7 @@ Unless this version string is present at the top of the GFF file, the
loader will attempt to parse the file in GFF2 format, with
less-than-desirable results.
-=item 2. Only one level of nesting allowed
+=item Only one level of nesting allowed
A major restriction is that Bio::DB::GFF only allows one level of
nesting of features. For nesting, the Target tag will be used
@@ -1742,27 +1778,37 @@ This method takes a single overloaded argument, which can be any of:
=over 4
-=item 1. a scalar corresponding to a GFF file on the system
+=item *
+
+a scalar corresponding to a GFF file on the system
A pathname to a local GFF file. Any files ending with the .gz, .Z, or
.bz2 suffixes will be transparently decompressed with the appropriate
command-line utility.
-=item 2. an array reference containing a list of GFF files on the system
+=item *
+
+an array reference containing a list of GFF files on the system
For example ['/home/gff/gff1.gz','/home/gff/gff2.gz']
-=item 3. directory path
+=item *
+
+directory path
The indicated directory will be searched for all files ending in the
suffixes .gff, .gff.gz, .gff.Z or .gff.bz2.
-=item 4. filehandle
+=item *
+
+filehandle
An open filehandle from which to read the GFF data. Tied filehandles
now work as well.
-=item 5. a pipe expression
+=item *
+
+a pipe expression
A pipe expression will also work. For example, a GFF file on a remote
web server can be loaded with an expression like this:
@@ -1837,27 +1883,37 @@ This method takes a single overloaded argument, which can be any of:
=over 4
-=item 1. scalar corresponding to a FASTA file on the system
+=item *
+
+scalar corresponding to a FASTA file on the system
A pathname to a local FASTA file. Any files ending with the .gz, .Z, or
.bz2 suffixes will be transparently decompressed with the appropriate
command-line utility.
-=item 2. array reference containing a list of FASTA files on the
+=item *
+
+array reference containing a list of FASTA files on the
system
For example ['/home/fasta/genomic.fa.gz','/home/fasta/genomic.fa.gz']
-=item 3. path to a directory
+=item *
+
+path to a directory
The indicated directory will be searched for all files ending in the
suffixes .fa, .fa.gz, .fa.Z or .fa.bz2.
-a=item 4. filehandle
+=item *
+
+filehandle
An open filehandle from which to read the FASTA data.
-=item 5. pipe expression
+=item *
+
+pipe expression
A pipe expression will also work. For example, a FASTA file on a remote
web server can be loaded with an expression like this:
@@ -3775,7 +3831,6 @@ fixed.
=head1 SEE ALSO
-L<bioperl>,
L<Bio::DB::GFF::RelSegment>,
L<Bio::DB::GFF::Aggregator>,
L<Bio::DB::GFF::Feature>,
View
32 Bio/DB/GFF/Aggregator.pm
@@ -39,20 +39,26 @@ Instances of Bio::DB::GFF::Aggregator have three attributes:
=over 3
-=item method
+=item *
+
+method
This is the GFF method field of the composite feature as a whole. For
example, "transcript" may be used for a composite feature created by
aggregating individual intron, exon and UTR features.
-=item main method
+=item *
+
+main method
Sometimes GFF groups are organized hierarchically, with one feature
logically containing another. For example, in the C. elegans schema,
methods of type "Sequence:curated" correspond to regions covered by
curated genes. There can be zero or one main methods.
-=item subparts
+=item *
+
+subparts
This is a list of one or more methods that correspond to the component
features of the aggregates. For example, in the C. elegans database,
@@ -65,14 +71,18 @@ subclasses:
=over 4
-=item disaggregate()
+=item *
+
+disaggregate()
This method is called by the Adaptor object prior to fetching a list
of features. The method is passed an associative array containing the
[method,source] pairs that the user has requested, and it returns a
list of raw features that it would like the adaptor to fetch.
-=item aggregate()
+=item *
+
+aggregate()
This method is called by the Adaptor object after it has fetched
features. The method is passed a list of raw features and is expected
@@ -86,15 +96,21 @@ case, it suffices for subclasses to override the following methods:
=over 4
-=item method()
+=item *
+
+method()
Return the default method for the composite feature as a whole.
-=item main_name()
+=item *
+
+main_name()
Return the default main method name.
-=item part_names()
+=item *
+
+part_names()
Return a list of subpart method names.
View
155 Bio/DB/HIV/HIVQueryHelper.pm
@@ -95,7 +95,7 @@ BEGIN {
=head2 HIVSchema - objects/methods to manipulate a version of the LANL HIV DB schema
-=head3 SYNOPSIS
+=head3 HIVSchema SYNOPSIS
$schema = new HIVSchema( 'lanl-schema.xml' );
@tables = $schema->tables;
@@ -109,7 +109,7 @@ BEGIN {
$table = $schema->tablepart('SEQ_SAMple.SSAM_badseq'); # returns 'SEQ_SAMple'
$column = $schema->columnpart('SEQ_SAMple.SSAM_badseq'); # returns 'SSAM_badseq'
-=head3 DESCRIPTION
+=head3 HIVSchema DESCRIPTION
HIVSchema methods are used in L<Bio::DB::Query::HIVQuery> for table,
column, primary/foreign key manipulations based on the observed Los
@@ -131,9 +131,9 @@ use strict;
### constructor
-=head3 CONSTRUCTOR
+=head3 HIVSchema CONSTRUCTOR
-=head4 new
+=head4 HIVSchema::new
Title : new
Usage : $schema = new HIVSchema( "lanl-schema.xml ");
@@ -157,9 +157,9 @@ sub new {
### object methods
-=head3 INSTANCE METHODS
+=head3 HIVSchema INSTANCE METHODS
-=head4 tables
+=head4 HIVSchema tables
Title : tables
Usage : $schema->tables()
@@ -186,7 +186,7 @@ sub tables {
return @k;
}
-=head4 columns
+=head4 HIVSchema columns
Title : columns
Usage : $schema->columns( [$tablename] );
@@ -218,7 +218,7 @@ sub columns {
return @k;
}
-=head4 fields
+=head4 HIVSchema fields
Title : fields
Usage : $schema->fields();
@@ -238,7 +238,7 @@ sub fields {
return @k;
}
-=head4 options
+=head4 HIVSchema options
Title : options
Usage : $schema->options(@fieldnames)
@@ -259,7 +259,7 @@ sub options {
return $$sref{$sfield}{option} ? @{$$sref{$sfield}{option}} : ();
}
-=head4 aliases
+=head4 HIVSchema aliases
Title : aliases
Usage : $schema->aliases(@fieldnames)
@@ -286,7 +286,7 @@ sub aliases {
}
}
-=head4 ankh
+=head4 HIVSchema ankh
Title : ankh (annotation key hash)
Usage : $schema->ankh(@fieldnames)
@@ -314,7 +314,7 @@ sub ankh {
return %ret;
}
-=head4 tablepart
+=head4 HIVSchema tablepart
Title : tablepart (alias: tbl)
Usage : $schema->tbl(@fieldnames)
@@ -353,7 +353,7 @@ sub tbl {
shift->tablepart(@_);
}
-=head4 columnpart
+=head4 HIVSchema columnpart
Title : columnpart (alias: col)
Usage : $schema->col(@fieldnames)
@@ -382,7 +382,7 @@ sub col {
shift->columnpart(@_);
}
-=head4 primarykey
+=head4 HIVSchema primarykey
Title : primarykey [alias: pk]
Usage : $schema->pk(@tablenames);
@@ -416,7 +416,7 @@ sub pk {
shift->primarykey(@_);
}
-=head4 foreignkey
+=head4 HIVSchema foreignkey
Title : foreignkey [alias: fk]
Usage : $schema->fk($intable [, $totable])
@@ -461,7 +461,7 @@ sub fk {
shift->foreignkey(@_);
}
-=head4 foreigntable
+=head4 HIVSchema foreigntable
Title : foreigntable [alias ftbl]
Usage : $schema->ftbl( @foreign_key_fieldnames );
@@ -495,7 +495,7 @@ sub ftbl {
shift->foreigntable(@_);
}
-=head4 find_join
+=head4 HIVSchema find_join
Title : find_join
Usage : $sch->find_join('Table1', 'Table2')
@@ -527,7 +527,7 @@ sub find_join {
}
}
-=head4 _find_join_guts
+=head4 HIVSchema _find_join_guts
Title : _find_join_guts
Usage : $sch->_find_join_guts($table1, $table2, $stackref, \$found, $reverse)
@@ -610,7 +610,7 @@ sub _find_join_guts {
}
}
-=head4 loadSchema
+=head4 HIVSchema loadSchema
Title : loadHIVSchema [alias: loadSchema]
Usage : $schema->loadSchema( $XMLfilename )
@@ -686,7 +686,7 @@ sub loadSchema {
# below, dangerous
-=head4 _sfieldh
+=head4 HIVSchema _sfieldh
Title : _sfieldh
Usage : $schema->_sfieldh($fieldname)
@@ -708,7 +708,7 @@ sub _sfieldh {
=head2 Class QRY - a query algebra for HIVQuery
-=head3 SYNOPSIS
+=head3 QRY SYNOPSIS
$Q = new QRY(
new R(
@@ -729,7 +729,7 @@ sub _sfieldh {
$Q3 = QRY::Or($Q, $Q2);
print $Q3->A; # prints '(CCR5 CXCR4)[coreceptor] (ZA)[country]'
-=head3 DESCRIPTION
+=head3 QRY DESCRIPTION
The QRY package provides a query parser for
L<Bio::DB::Query::HIVQuery>. Currently, the parser supports AND, OR,
@@ -823,9 +823,7 @@ use overload
# QRY object will be translated into (possibly multiple) hashes
# conforming to HIVQuery parameter requirements.
-=head3 CLASS METHODS
-
-=head4 _make_q
+=head4 QRY _make_q
Title : _make_q
Usage : QRY::_make_q($parsetree)
@@ -862,7 +860,7 @@ sub _make_q {
return @dbq;
}
-=head4 _make_q_guts
+=head4 QRY _make_q_guts
Title : _make_q_guts (Internal class method)
Usage : _make_q_guts($ptree, $q_expr, $qarry, $anarry)
@@ -974,7 +972,7 @@ sub _make_q_guts {
: return 1;
}
-=head4 _parse_q
+=head4 QRY _parse_q
Title : _parse_q
Usage : QRY::_parse_q($query_string)
@@ -1045,7 +1043,7 @@ sub _parse_q {
## QRY constructor
-=head3 CONSTRUCTOR
+=head3 QRY CONSTRUCTOR
=head4 QRY Constructor
@@ -1070,9 +1068,9 @@ sub new {
## QRY instance methods
-=head3 INSTANCE METHODS
+=head3 QRY INSTANCE METHODS
-=head4 requests
+=head4 QRY requests
Title : requests
Usage : $QRY->requests
@@ -1089,7 +1087,7 @@ sub requests {
return @{$self->{'requests'}};
}
-=head4 put_requests
+=head4 QRY put_requests
Title : put_requests
Usage : $QRY->put_request(@R)
@@ -1110,7 +1108,7 @@ sub put_requests {
return @args;
}
-=head4 isnull
+=head4 QRY isnull
Title : isnull
Usage : $QRY->isnull
@@ -1126,7 +1124,7 @@ sub isnull {
return ($self->requests) ? 0 : 1;
}
-=head4 A
+=head4 QRY A
Title : A
Usage : print $QRY->A
@@ -1142,7 +1140,7 @@ sub A {
return join( "\n", map {$_->A} $self->requests );
}
-=head4 len
+=head4 QRY len
Title : len
Usage : $QRY->len
@@ -1158,7 +1156,7 @@ sub len {
return scalar @{$self->{'requests'}};
}
-=head4 clone
+=head4 QRY clone
Title : clone
Usage : $QRY2 = $QRY1->clone;
@@ -1181,9 +1179,9 @@ sub clone {
## QRY class methods
-=head3 CLASS METHODS
+=head3 QRY CLASS METHODS
-=head4 Or
+=head4 QRY Or
Title : Or
Usage : $QRY3 = QRY::Or($QRY1, $QRY2)
@@ -1237,7 +1235,7 @@ sub Or {
return new QRY( @ret_rq );
}
-=head4 And
+=head4 QRY And
Title : And
Usage : $QRY3 = QRY::And($QRY1, $QRY2)
@@ -1268,7 +1266,7 @@ sub And {
return new QRY( @ret_rq );
}
-=head4 Bool
+=head4 QRY Bool
Title : Bool
Usage : QRY::Bool($QRY1)
@@ -1285,7 +1283,7 @@ sub Bool {
return $q->isnull ? 0 : 1;
}
-=head4 Eq
+=head4 QRY Eq
Title : Eq
Usage : QRY::Eq($QRY1, $QRY2)
@@ -1319,7 +1317,7 @@ sub Eq {
=head2 Class R - request objects for QRY algebra
-=head3 SYNOPSIS
+=head3 R SYNOPSIS
$R = new R( $q1, $q2 );
$R->put_atoms($q3);
@@ -1334,7 +1332,7 @@ sub Eq {
QRY::Eq( new QRY(R::Or($R1, $R2)), new QRY($R1, $R2) ); # returns 1
R::In( (R::And($R1, $R2))[0], $R1 ); # returns 1
-=head3 DESCRIPTION
+=head3 R DESCRIPTION
Class R objects contain a list of atomic queries (class Q
objects). Each class R object represents a single HTTP request to the
@@ -1350,7 +1348,7 @@ $R::NULL = new R();
## R constructor
-=head3 CONSTRUCTOR
+=head3 R CONSTRUCTOR
=head4 R constructor
@@ -1375,9 +1373,9 @@ sub new {
## R instance methods
-=head3 INSTANCE METHODS
+=head3 R INSTANCE METHODS
-=head4 len
+=head4 R len
Title : len
Usage : $R->len
@@ -1393,7 +1391,7 @@ sub len {
return scalar @{[keys %{$self->{'atoms'}}]};
}
-=head4 atoms
+=head4 R atoms
Title : atoms
Usage : $R->atoms( [optional $field])
@@ -1415,7 +1413,7 @@ sub atoms {
return wantarray ? map { $self->{'atoms'}->{$_} } @flds : $self->{'atoms'}->{$flds[0]};
}
-=head4 fields
+=head4 R fields
Title : fields
Usage : $R->fields
@@ -1431,7 +1429,7 @@ sub fields {
return keys %{$self->{'atoms'}};
}
-=head4 put_atoms
+=head4 R put_atoms
Title : put_atoms
Usage : $R->put_atoms( @q )
@@ -1465,7 +1463,7 @@ sub put_atoms {
return;
}
-=head4 del_atoms
+=head4 R del_atoms
Title : del_atoms
Usage : $R->del_atoms( @qfields )
@@ -1490,7 +1488,7 @@ sub del_atoms {
return @ret;
}
-=head4 isnull
+=head4 R isnull
Title : isnull
Usage : $R->isnull
@@ -1506,7 +1504,7 @@ sub isnull {
return ($self->len) ? 0 : 1;
}
-=head4 A
+=head4 R A
Title : A
Usage : print $R->A
@@ -1523,7 +1521,7 @@ sub A {
return join(" ", map {$_->A} @a);
}
-=head4 clone
+=head4 R clone
Title : clone
Usage : $R2 = $R1->clone;
@@ -1546,9 +1544,9 @@ sub clone {
## R class methods
-=head3 CLASS METHODS
+=head3 R CLASS METHODS
-=head4 In
+=head4 R In
Title : In
Usage : R::In($R1, $R2)
@@ -1578,7 +1576,7 @@ sub In {
return 1;
}
-=head4 And
+=head4 R And
Title : And
Usage : @Rresult = R::And($R1, $R2)
@@ -1624,7 +1622,7 @@ sub And {
}
-=head4 Or
+=head4 R Or
Title : Or
Usage : @Rresult = R::Or($R1, $R2)
@@ -1672,7 +1670,7 @@ sub Or {
}
-=head4 Eq
+=head4 R Eq
Title : Eq
Usage : R::Eq($R1, $R2)
@@ -1703,7 +1701,7 @@ sub Eq {
=head2 Class Q - atomic query objects for QRY algebra
-=head3 SYNOPSIS
+=head3 Q SYNOPSIS
$q = new Q('coreceptor', 'CXCR4 CCR5');
$u = new Q('coreceptor', 'CXCR4');
@@ -1715,7 +1713,7 @@ sub Eq {
Q::qin($u, $q) # returns 1
Q::qeq(Q::qand($u, $q), $u ); # returns 1
-=head3 DESCRIPTION
+=head3 Q DESCRIPTION
Class Q objects represent atomic queries, that can be described by a
single LANL cgi parameter=value pair. Class R objects (requests) are
@@ -1731,7 +1729,7 @@ $Q::NULL = new Q();
## Q constructor
-=head3 CONSTRUCTOR
+=head3 Q CONSTRUCTOR
=head4 Q constructor
@@ -1758,9 +1756,9 @@ sub new {
## Q instance methods
-=head3 INSTANCE METHODS
+=head3 Q INSTANCE METHODS
-=head4 isnull
+=head4 Q isnull
Title : isnull
Usage : $q->isnull
@@ -1778,7 +1776,7 @@ sub isnull {
return 0;
}
-=head4 fld
+=head4 Q fld
Title : fld
Usage : $q->fld($field)
@@ -1802,7 +1800,7 @@ sub fld {
}
-=head4 dta
+=head4 Q dta
Title : dta
Usage : $q->dta($data)
@@ -1825,7 +1823,7 @@ sub dta {
return $self->{dta};
}
-=head4 A
+=head4 Q A
Title : A
Usage : print $q->A
@@ -1844,7 +1842,7 @@ sub A {
return "(".join(' ', sort {$a cmp $b} @a).")[".$self->fld."]";
}
-=head4 clone
+=head4 Q clone
Title : clone
Usage : $q2 = $q1->clone;
@@ -1864,9 +1862,9 @@ sub clone {
### Q class methods
-=head3 CLASS METHODS
+=head3 Q CLASS METHODS
-=head4 qin
+=head4 Q qin
Title : qin
Usage : Q::qin($q1, $q2)
@@ -1885,7 +1883,7 @@ sub qin {
return Q::qeq( $b, Q::qor($a, $b) );
}
-=head4 qeq
+=head4 Q qeq
Title : qeq
Usage : Q::qeq($q1, $q2)
@@ -1909,7 +1907,7 @@ sub qeq {
return @cd == @bd;
}
-=head4 qor
+=head4 Q qor
Title : qor
Usage : @qresult = Q::qor($q1, $q2)
@@ -1941,7 +1939,7 @@ sub qor {
return @ret;
}
-=head4 qand
+=head4 Q qand
Title : qand
Usage : @qresult = Q::And($q1, $q2)
@@ -1992,9 +1990,9 @@ sub qand {
}
}
-=head3 INTERNALS
+=head3 Q INTERNALS
-=head4 unique
+=head4 Q unique
Title : unique
Usage : @ua = unique(@a)
@@ -2016,7 +2014,7 @@ sub unique {
=head2 Additional tools for Bio::AnnotationCollectionI
-=head3 SYNOPSIS
+=head3 Bio::AnnotationCollectionI SYNOPSIS (additional methods)
$seq->annotation->put_value('patient_id', 1401)
$seq->annotation->get_value('patient_ids') # returns 1401
@@ -2027,9 +2025,11 @@ sub unique {
$blood_readings{$_} = $seq->annonation->get_value(['clinical', $_]);
}
-=head3 DESCRIPTION
+=head3 Bio::AnnotationCollectionI DESCRIPTION (additional methods)
-C<get_value()> and C<put_value> allow easy creation of and access to an annotation collection tree with nodes of L<Bio::Annotation::SimpleValue>. These methods obiviate direct accession of the SimpleValue objects.
+C<get_value()> and C<put_value> allow easy creation of and access to an
+annotation collection tree with nodes of L<Bio::Annotation::SimpleValue>. These
+methods obiviate direct accession of the SimpleValue objects.
=cut
@@ -2082,7 +2082,8 @@ sub get_value {
\@tagnames, $value (or as -KEYS=>\@tagnames, -VALUE=>$value )
Note : If intervening nodes do not exist, put_value creates them, replacing
existing nodes. So if $ac->put_value('x', 10) was done, then later,
- $ac->put_value(['x', 'y'], 20), the original value of 'x' is trashed, and $ac->get_value('x') will now return the annotation collection
+ $ac->put_value(['x', 'y'], 20), the original value of 'x' is trashed,
+ and $ac->get_value('x') will now return the annotation collection
with tagname 'y'.
=cut
View
17 Bio/DB/SeqFeature/Store.pm
@@ -114,18 +114,24 @@ with the following differences:
=over 4
-=item 1. No limitation on Bio::SeqFeatureI implementations
+=item 1.
+
+No limitation on Bio::SeqFeatureI implementations
Unlike Bio::DB::GFF, Bio::DB::SeqFeature::Store works with
any Bio::SeqFeatureI object.
-=item 2. No limitation on nesting of features & subfeatures
+=item 2.
+
+No limitation on nesting of features & subfeatures
Bio::DB::GFF is limited to features that have at most one
level of subfeature. Bio::DB::SeqFeature::Store can work with features
that have unlimited levels of nesting.
-=item 3. No aggregators
+=item 3.
+
+No aggregators
The aggregator architecture, which was necessary to impose order on
the GFF2 files that Bio::DB::GFF works with, does not apply to
@@ -133,7 +139,9 @@ Bio::DB::SeqFeature::Store. It is intended to store features that obey
well-defined ontologies, such as the Sequence Ontology
(http://song.sourceforge.net).
-=item 4. No relative locations
+=item 4.
+
+No relative locations
All locations defined by this module are relative to an absolute
sequence ID, unlike Bio::DB::GFF which allows you to define the
@@ -2506,7 +2514,6 @@ use the BioPerl bug tracking system to report bugs.
=head1 SEE ALSO
-L<bioperl>,
L<Bio::DB::SeqFeature>,
L<Bio::DB::SeqFeature::Store::GFF3Loader>,
L<Bio::DB::SeqFeature::Segment>,
View
6 Bio/Index/Stockholm.pm
@@ -172,10 +172,10 @@ sub fetch_report{
return $report->next_aln;
}
-=head2 fetch_report
+=head2 fetch_aln
- Title : fetch_report
- Usage : my $align = $idx->fetch_report($id);
+ Title : fetch_aln
+ Usage : my $align = $idx->fetch_aln($id);
Function: Returns a Bio::SimpleAlign object
for a specific alignment
Returns : Bio::SimpleAlign
View
48 Bio/Root/IO.pm
@@ -98,6 +98,10 @@ web:
Email hlapp@gmx.net
+=head1 CONTRIBUTORS
+
+Mark A. Jensen ( maj -at- fortinbras -dot- us )
+
=head1 APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
@@ -144,10 +148,10 @@ BEGIN {
}
eval {
- require LWP::Simple;
+ require LWP::UserAgent;
};
if( $@ ) {
- print STDERR "Cannot load LWP::Simple: $@" if( $VERBOSE > 0 );
+ print STDERR "Cannot load LWP::UserAgent: $@" if( $VERBOSE > 0 );
$HAS_LWP = 0;
} else {
$HAS_LWP = 1;
@@ -251,6 +255,13 @@ sub new {
-flush boolean flag to autoflush after each write
-noclose boolean flag, when set to true will not close a
filehandle (must explictly call close($io->_fh)
+ -retries number of times to try a web fetch before failure
+
+ -ua_parms hashref of key => value parameters to pass
+ to LWP::UserAgent->new()
+ (only meaningful with -url is set)
+ A useful value might be, for example,
+ { timeout => 60 } (ua default is 180 sec)
Returns : TRUE
Args : named parameters
@@ -262,27 +273,34 @@ sub _initialize_io {
$self->_register_for_cleanup(\&_io_cleanup);
- my ($input, $noclose, $file, $fh, $flush, $url) = $self->_rearrange([qw(INPUT
- NOCLOSE
- FILE FH
- FLUSH URL)], @args);
+ my ($input, $noclose, $file, $fh, $flush, $url,
+ $retries, $ua_parms) =
+ $self->_rearrange([qw(INPUT
+ NOCLOSE
+ FILE
+ FH
+ FLUSH
+ URL
+ RETRIES
+ UA_PARMS)], @args);
if($url){
- my $trymax = 5;
+ $retries ||= 5;
- if($HAS_LWP){ #use LWP::Simple::getstore()
- require LWP::Simple;
- #$self->warn("has lwp");
+ if($HAS_LWP){ #use LWP::UserAgent
+ require LWP::UserAgent;
+ my $ua = LWP::UserAgent->new(%$ua_parms);
my $http_result;
my($handle,$tempfile) = $self->tempfile();
CORE::close($handle);
+
- for(my $try = 1 ; $try <= $trymax ; $try++){
- $http_result = LWP::Simple::getstore($url, $tempfile);
- $self->warn("[$try/$trymax] tried to fetch $url, but server threw $http_result. retrying...") if $http_result != 200;
- last if $http_result == 200;
+ for(my $try = 1 ; $try <= $retries ; $try++){
+ $http_result = $ua->get($url, ':content_file' => $tempfile);
+ $self->warn("[$try/$retries] tried to fetch $url, but server threw " . $http_result->code . ". retrying...") if !$http_result->is_success;
+ last if $http_result->is_success;
}
- $self->throw("failed to fetch $url, server threw $http_result") if $http_result != 200;
+ $self->throw("failed to fetch $url, server threw " . $http_result->code) if !$http_result->is_success;
$input = $tempfile;
$file = $tempfile;
View
29 Bio/Search/HSP/ModelHSP.pm
@@ -404,33 +404,6 @@ sub get_aln {
return $aln;
}
-=head2 seq_inds
-
- Title : seq_inds
- Purpose : Get a list of residue positions (indices) for all identical
- : or conserved residues in the query or sbjct sequence.
- Example : @s_ind = $hsp->seq_inds('query', 'identical');
- : @h_ind = $hsp->seq_inds('hit', 'conserved');
- : @h_ind = $hsp->seq_inds('hit', 'conserved', 1);
- Returns : List of integers
- : May include ranges if collapse is true.
- Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = query)
- : ('sbjct' is synonymous with 'hit')
- : class = 'identical' or 'conserved' or 'nomatch' or 'gap'
- : (default = identical)
- : (can be shortened to 'id' or 'cons')
- :
- : collapse = boolean, if true, consecutive positions are merged
- : using a range notation, e.g., "1 2 3 4 5 7 9 10 11"
- : collapses to "1-5 7 9-11". This is useful for
- : consolidating long lists. Default = no collapse.
- Throws : n/a.
- Comments :
-
-See Also : L<Bio::Search::BlastUtils::collapse_nums()|Bio::Search::BlastUtils>, L<Bio::Search::Hit::HitI::seq_inds()|Bio::Search::Hit::HitI>
-
-=cut
-
=head2 Inherited from Bio::SeqFeature::SimilarityPair
These methods come from Bio::SeqFeature::SimilarityPair
@@ -488,7 +461,7 @@ These methods come from Bio::SeqFeature::SimilarityPair
The following methods have been overridden due to their current reliance on
sequence-based queries. They may be implemented in future versions of this class.
-=head2 frac_identical
+=head2 seq_inds
=cut
View
2,465 Bio/SeqIO/chadoxml.pm
@@ -55,7 +55,7 @@ This is currently a write-only module.
-seq_so_type=>'gene',
-src_feature=>'X',
-src_feat_type=>'chromosome_arm',
- -nounflatten=>1,
+ -nounflatten=>1,
-is_analysis=>'true',
-data_source=>'GenBank');
@@ -80,64 +80,64 @@ containment hierarchy conforming to chado central dogma model: gene
Destination of data in the subject Bio::Seq object $seq is as following:
- *$seq->display_id: name of the top-level feature;
+ *$seq->display_id: name of the top-level feature;
- *$seq->accession_number: if defined, uniquename and
- feature_dbxref of the top-level
- feature if not defined,
- $seq->display_id is used as the
- uniquename of the top-level feature;
+ *$seq->accession_number: if defined, uniquename and
+ feature_dbxref of the top-level
+ feature if not defined,
+ $seq->display_id is used as the
+ uniquename of the top-level feature;
- *$seq->molecule: transformed to SO type, used as the feature
- type of the top-level feature if -seq_so_type
- argument is supplied, use the supplied SO type
- as the feature type of the top-level feature;
+ *$seq->molecule: transformed to SO type, used as the feature
+ type of the top-level feature if -seq_so_type
+ argument is supplied, use the supplied SO type
+ as the feature type of the top-level feature;
- *$seq->species: organism of the top-level feature;
+ *$seq->species: organism of the top-level feature;
- *$seq->seq: residues of the top-level feature;
+ *$seq->seq: residues of the top-level feature;
- *$seq->is_circular, $seq->division: feature_cvterm;
+ *$seq->is_circular, $seq->division: feature_cvterm;
- *$seq->keywords, $seq->desc, comments: featureprop;
+ *$seq->keywords, $seq->desc, comments: featureprop;
- *references: pub and feature_pub;
- medline/pubmed ids: pub_dbxref;
- comments: pubprop;
+ *references: pub and feature_pub;
+ medline/pubmed ids: pub_dbxref;
+ comments: pubprop;
- *feature "source" span: featureloc for top-level feature;
+ *feature "source" span: featureloc for top-level feature;
- *feature "source" db_xref: feature_dbxref for top-level feature;
+ *feature "source" db_xref: feature_dbxref for top-level feature;
- *feature "source" other tags: featureprop for top-level feature;
+ *feature "source" other tags: featureprop for top-level feature;
- *subfeature 'symbol' or 'label' tag: feature uniquename, if
+ *subfeature 'symbol' or 'label' tag: feature uniquename, if
none of these is present, the chadoxml object
generates feature uniquenames as:
<gene>-<feature_type>-<span>
(e.g. foo-mRNA--1000..3000);
- *gene model: feature_relationship built based on the
+ *gene model: feature_relationship built based on the
containment hierarchy;
- *feature span: featureloc;
+ *feature span: featureloc;
- *feature accession numbers: feature_dbxref;
+ *feature accession numbers: feature_dbxref;
- *feature tags (except db_xref, symbol and gene): featureprop;
+ *feature tags (except db_xref, symbol and gene): featureprop;
Things to watch out for:
- *chado schema change: this version works with the chado
+ *chado schema change: this version works with the chado
version tagged chado_1_01 in GMOD CVS.
- *feature uniquenames: especially important if using XORT
+ *feature uniquenames: especially important if using XORT
loader to do incremental load into
chado. may need pre-processing of the
source data to put the correct
uniquenames in place.
- *pub uniquenames: chadoxml->write_seq() has the FlyBase policy
+ *pub uniquenames: chadoxml->write_seq() has the FlyBase policy
on pub uniquenames hard-coded, it assigns
pub uniquenames in the following way: for
journals and books, use ISBN number; for
@@ -147,7 +147,7 @@ Things to watch out for:
implement your policy. look for the comments
in the code.
- *for pubs possibly existing in chado but with no knowledge of
+ *for pubs possibly existing in chado but with no knowledge of
its uniquename:put "op" as "match", then need to run the
output chadoxml through a special filter that
talks to chado database and tries to find the
@@ -160,9 +160,9 @@ Things to watch out for:
case. please modify to work according to your
rules.
- *chado initialization for loading:
+ *chado initialization for loading:
- cv & cvterm: in the output chadoxml, all cv's and
+ cv & cvterm: in the output chadoxml, all cv's and
cvterm's are lookup only. Therefore,
before using XORT loader to load the
output into chado, chado must be
@@ -247,29 +247,29 @@ undef(my %datahash); #data from Bio::Seq object stored in a hash
my $chadotables = 'feature featureprop feature_relationship featureloc feature_cvterm cvterm cv feature_pub pub pub_dbxref pub_author author pub_relationship pubprop feature_dbxref dbxref db synonym feature_synonym';
my %fkey = (
- "cvterm.cv_id" => "cv",
+ "cvterm.cv_id" => "cv",
"cvterm.dbxref_id" => "dbxref",
- "dbxref.db_id" => "db",
- "feature.type_id" => "cvterm",
- "feature.organism_id" => "organism",
- "feature.dbxref_id" => "dbxref",
- "featureprop.type_id" => "cvterm",
- "feature_pub.pub_id" => "pub",
- "feature_cvterm.cvterm_id" => "cvterm",
- "feature_cvterm.pub_id" => "pub",
+ "dbxref.db_id" => "db",
+ "feature.type_id" => "cvterm",
+ "feature.organism_id" => "organism",
+ "feature.dbxref_id" => "dbxref",
+ "featureprop.type_id" => "cvterm",
+ "feature_pub.pub_id" => "pub",
+ "feature_cvterm.cvterm_id" => "cvterm",
+ "feature_cvterm.pub_id" => "pub",
"feature_cvterm.feature_id" => "feature",
- "feature_dbxref.dbxref_id" => "dbxref",
- "feature_relationship.object_id" => "feature",
- "feature_relationship.subject_id" => "feature",
- "feature_relationship.type_id" => "cvterm",
- "featureloc.srcfeature_id" => "feature",
- "pub.type_id" => "cvterm",
- "pub_dbxref.dbxref_id" => "dbxref",
- "pub_author.author_id" => "author",
- "pub_relationship.obj_pub_id" => "pub",
- "pub_relationship.subj_pub_id" => "pub",
- "pub_relationship.type_id" => "cvterm",
- "pubprop.type_id" => "cvterm",
+ "feature_dbxref.dbxref_id" => "dbxref",
+ "feature_relationship.object_id" => "feature",
+ "feature_relationship.subject_id" => "feature",
+ "feature_relationship.type_id" => "cvterm",
+ "featureloc.srcfeature_id" => "feature",
+ "pub.type_id" => "cvterm",
+ "pub_dbxref.dbxref_id" => "dbxref",
+ "pub_author.author_id" => "author",
+ "pub_relationship.obj_pub_id" => "pub",
+ "pub_relationship.subj_pub_id" => "pub",
+ "pub_relationship.type_id" => "cvterm",
+ "pubprop.type_id" => "cvterm",
"feature_synonym.feature_id" => "feature",
"feature_synonym.synonym_id" => "synonym",
"feature_synonym.pub_id" => "pub",
@@ -283,22 +283,22 @@ my %cv_name = (
);
my %feattype_args2so = (
- "aberr" => "aberration_junction",
-# "conflict" => "sequence_difference",
-# "polyA_signal" => "polyA_signal_sequence",
- "variation" => "sequence_variant",
- "mutation1" => "point_mutation", #for single-base mutation
- "mutation2" => "sequence_variant", #for multi-base mutation
- "rescue" => "rescue_fragment",
-# "rfrag" => "restriction_fragment",
- "protein_bind" => "protein_binding_site",
- "misc_feature" => "region",
-# "prim_transcript" => "primary_transcript",
- "CDS" => "polypeptide",
- "reg_element" => "regulatory_region",
- "seq_variant" => "sequence_variant",
- "mat_peptide" => "mature_peptide",
- "sig_peptide" => "signal_peptide",
+ "aberr" => "aberration_junction",
+# "conflict" => "sequence_difference",
+# "polyA_signal" => "polyA_signal_sequence",
+ "variation" => "sequence_variant",
+ "mutation1" => "point_mutation", #for single-base mutation
+ "mutation2" => "sequence_variant", #for multi-base mutation
+ "rescue" => "rescue_fragment",
+# "rfrag" => "restriction_fragment",
+ "protein_bind" => "protein_binding_site",
+ "misc_feature" => "region",
+# "prim_transcript" => "primary_transcript",
+ "CDS" => "polypeptide",
+ "reg_element" => "regulatory_region",
+ "seq_variant" => "sequence_variant",
+ "mat_peptide" => "mature_peptide",
+ "sig_peptide" => "signal_peptide",
);
undef(my %organism);
@@ -328,99 +328,103 @@ sub _initialize {
Title : write_seq
Usage : $stream->write_seq(-seq=>$seq, -seq_so_type=>$seqSOtype,
- -src_feature=>$srcfeature,
- -src_feat_type=>$srcfeattype,
- -nounflatten=>0 or 1,
- -is_analysis=>'true' or 'false',
- -data_source=>$datasource)
+ -src_feature=>$srcfeature,
+ -src_feat_type=>$srcfeattype,
+ -nounflatten=>0 or 1,
+ -is_analysis=>'true' or 'false',
+ -data_source=>$datasource)
Function: writes the $seq object (must be seq) into chadoxml.
- Current implementation:
- 1. for non-mRNA records,
- a top-level feature of type $seq->alphabet is
- generated for the whole GenBank record, features listed
- are unflattened for DNA records to build gene model
- feature graph, and for the other types of records all
- features in $seq are treated as subfeatures of the top-level
- feature.
- 2. for mRNA records,
- if a 'gene' feature is present, it B<must> have a /symbol
- or /label tag to contain the uniquename of the gene. a top-
- level feature of type 'gene' is generated. the mRNA is written
- as a subfeature of the top-level gene feature, and the other
- sequence features listed in $seq are treated as subfeatures
- of the mRNA feature.
Returns : 1 for success and 0 for error
+ Args : A Bio::Seq object $seq, optional $seqSOtype, $srcfeature,
+ $srcfeattype, $nounflatten, $is_analysis and $data_source.
+When $srcfeature (a string, the uniquename of the source feature) is given, the
+location and strand information of the top-level feature against the source
+feature will be derived from the sequence feature called 'source' of the $seq
+object, a featureloc record is generated for the top -level feature on
+$srcfeature. when $srcfeature is given, $srcfeattype must also be present. All
+feature coordinates in $seq should be against $srcfeature. $seqSOtype is the
+optional SO term to use as the type of the top-level feature. For example, a
+GenBank data file for a Drosophila melanogaster genome scaffold has the molecule
+type of "DNA", when converting to chadoxml, a $seqSOtype argument of
+"golden_path_region" can be supplied to save the scaffold as a feature of type
+"golden_path_region" in chadoxml, instead of "DNA". a feature with primary tag
+of 'source' must be present in the sequence feature list of $seq, to decribe the
+whole sequence record.
- Args : A Bio::Seq object $seq, optional $seqSOtype, $srcfeature,
- $srcfeattype, $nounflatten, $is_analysis and $data_source.
- when $srcfeature (a string, the uniquename of the source
- feature) is given, the location and strand information of
- the top-level feature against the source feature will be
- derived from the sequence feature called 'source' of the
- $seq object, a featureloc record is generated for the top
- -level feature on $srcfeature. when $srcfeature is given,
- $srcfeattype must also be present. All feature coordinates
- in $seq should be against $srcfeature. $seqSOtype is the
- optional SO term to use as the type of the top-level feature.
- For example, a GenBank data file for a Drosophila melanogaster
- genome scaffold has the molecule type of "DNA", when
- converting to chadoxml, a $seqSOtype argument of
- "golden_path_region" can be supplied to save the scaffold
- as a feature of type "golden_path_region" in chadoxml, instead
- of "DNA". a feature with primary tag of 'source' must be
- present in the sequence feature list of $seq, to decribe the
- whole sequence record.
+In the current implementation:
+
+=over 3
+
+=item *
+
+non-mRNA records
+
+A top-level feature of type $seq-E<gt>alphabet is generated for the whole GenBank
+record, features listed are unflattened for DNA records to build gene model
+feature graph, and for the other types of records all features in $seq are
+treated as subfeatures of the top-level feature.
+=item *
+
+mRNA records
+
+If a 'gene' feature is present, it B<must> have a /symbol or /label tag to
+contain the uniquename of the gene. a top-level feature of type 'gene' is
+generated. the mRNA is written as a subfeature of the top-level gene feature,
+and the other sequence features listed in $seq are treated as subfeatures of the
+mRNA feature.
+
+=back
=cut
sub write_seq {
- my $usage = <<EOUSAGE;
+ my $usage = <<EOUSAGE;
Bio::SeqIO::chadoxml->write_seq()
Usage : \$stream->write_seq(-seq=>\$seq,
- -seq_so_type=>\$SOtype,
- -src_feature=>\$srcfeature,
- -src_feat_type=>\$srcfeattype,
- -nounflatten=>0 or 1,
+ -seq_so_type=>\$SOtype,
+ -src_feature=>\$srcfeature,
+ -src_feat_type=>\$srcfeattype,
+ -nounflatten=>0 or 1,
-is_analysis=>'true' or 'false',
-data_source=>\$datasource)
-Args : \$seq : a Bio::Seq object
- \$SOtype : the SO term to use as the feature type of
- the \$seq record, optional
- \$srcfeature : unique name of the source feature, a string
- containing at least one alphabetical letter
- (a-z, A-Z), optional
- \$srcfeattype : feature type of \$srcfeature. one of SO terms.
- optional
- when \$srcfeature is given, \$srcfeattype becomes mandatory,
- \$datasource : source of the sequence annotation data,
- e.g. 'GenBank' or 'GFF'.
+Args : \$seq : a Bio::Seq object
+ \$SOtype : the SO term to use as the feature type of
+ the \$seq record, optional
+ \$srcfeature : unique name of the source feature, a string
+ containing at least one alphabetical letter
+ (a-z, A-Z), optional
+ \$srcfeattype : feature type of \$srcfeature. one of SO terms.
+ optional
+ when \$srcfeature is given, \$srcfeattype becomes mandatory,
+ \$datasource : source of the sequence annotation data,
+ e.g. 'GenBank' or 'GFF'.
EOUSAGE
- my ($self,@args) = @_;
+ my ($self,@args) = @_;
- my ($seq, $seq_so_type, $srcfeature, $srcfeattype, $nounflatten, $isanalysis, $datasource, $genus, $species) =
- $self->_rearrange([qw(SEQ
- SEQ_SO_TYPE
- SRC_FEATURE
- SRC_FEAT_TYPE
- NOUNFLATTEN
- IS_ANALYSIS
- DATA_SOURCE
+ my ($seq, $seq_so_type, $srcfeature, $srcfeattype, $nounflatten, $isanalysis, $datasource, $genus, $species) =
+ $self->_rearrange([qw(SEQ
+ SEQ_SO_TYPE
+ SRC_FEATURE
+ SRC_FEAT_TYPE
+ NOUNFLATTEN
+ IS_ANALYSIS
+ DATA_SOURCE
GENUS
SPECIES
- )],
- @args);
- #print "$seq_so_type, $srcfeature, $srcfeattype\n";
+ )],
+ @args);
+ #print "$seq_so_type, $srcfeature, $srcfeattype\n";
- if( !defined $seq ) {
- $self->throw("Attempting to write with no seq!");
- }
+ if( !defined $seq ) {
+ $self->throw("Attempting to write with no seq!");
+ }
- if( ! ref $seq || ! $seq->isa('Bio::Seq::RichSeqI') ) {
- ## FIXME $self->warn(" $seq is not a RichSeqI compliant module. Attempting to dump, but may fail!");
- }
+ if( ! ref $seq || ! $seq->isa('Bio::Seq::RichSeqI') ) {
+ ## FIXME $self->warn(" $seq is not a RichSeqI compliant module. Attempting to dump, but may fail!");
+ }
# try to get the srcfeature from the seqFeature object
# for this to work, the user has to pass in the srcfeature type
@@ -430,124 +434,124 @@ EOUSAGE
}
}
- #$srcfeature, when provided, should contain at least one alphabetical letter
- if (defined $srcfeature)
- {
- if ($srcfeature =~ /[a-zA-Z]/)
- {
- chomp($srcfeature);
- } else {
- $self->throw( $usage );
- }
-
- #check for mandatory $srcfeattype
- if (! defined $srcfeattype)
- {
- $self->throw( $usage );
- #$srcfeattype must be a string of non-whitespace characters
- } else {
- if ($srcfeattype =~ /\S+/) {
- chomp($srcfeattype);
- } else {
- $self->throw( $usage );
- }
- }
- }
-
- # variables local to write_seq()
+ #$srcfeature, when provided, should contain at least one alphabetical letter
+ if (defined $srcfeature)
+ {
+ if ($srcfeature =~ /[a-zA-Z]/)
+ {
+ chomp($srcfeature);
+ } else {
+ $self->throw( $usage );
+ }
+
+ #check for mandatory $srcfeattype
+ if (! defined $srcfeattype)
+ {
+ $self->throw( $usage );
+ #$srcfeattype must be a string of non-whitespace characters
+ } else {
+ if ($srcfeattype =~ /\S+/) {
+ chomp($srcfeattype);
+ } else {
+ $self->throw( $usage );
+ }
+ }
+ }
+
+ # variables local to write_seq()
my $div = undef;
- my $hkey = undef;
- undef(my @top_featureprops);
+ my $hkey = undef;
+ undef(my @top_featureprops);
undef(my @featuresyns);
undef(my @top_featurecvterms);
- my $name = $seq->display_id if $seq->can('display_id');
+ my $name = $seq->display_id if $seq->can('display_id');
$name = $seq->display_name if $seq->can('display_name');
- undef(my @feature_cvterms);
- undef(my %sthash);
- undef(my %dvhash);
- undef(my %h1);
- undef(my %h2);
- my $temp = undef;
- my $ann = undef;
- undef(my @references);
- undef(my @feature_pubs);
- my $ref = undef;
- my $location = undef;
- my $fbrf = undef;
- my $journal = undef;
- my $issue = undef;
- my $volume = undef;
- my $volumeissue = undef;
- my $pages = undef;
- my $year = undef;
- my $pubtype = undef;
-# my $miniref= undef;
- my $uniquename = undef;
- my $refhash = undef;
- my $feat = undef;
- my $tag = undef;
- my $tag_cv = undef;
- my $ftype = undef;
- my $subfeatcnt = undef;
- undef(my @top_featrels);
- undef (my %srcfhash);
-
- local($^W) = 0; # supressing warnings about uninitialized fields.
+ undef(my @feature_cvterms);
+ undef(my %sthash);
+ undef(my %dvhash);
+ undef(my %h1);
+ undef(my %h2);
+ my $temp = undef;
+ my $ann = undef;
+ undef(my @references);
+ undef(my @feature_pubs);
+ my $ref = undef;
+ my $location = undef;
+ my $fbrf = undef;
+ my $journal = undef;
+ my $issue = undef;
+ my $volume = undef;
+ my $volumeissue = undef;
+ my $pages = undef;
+ my $year = undef;
+ my $pubtype = undef;
+# my $miniref= undef;
+ my $uniquename = undef;
+ my $refhash = undef;
+ my $feat = undef;
+ my $tag = undef;
+ my $tag_cv = undef;
+ my $ftype = undef;
+ my $subfeatcnt = undef;
+ undef(my @top_featrels);
+ undef (my %srcfhash);
+
+ local($^W) = 0; # supressing warnings about uninitialized fields.
if (!$name && $seq->can('attributes') ) {
($name) = $seq->attributes('Alias');
}
- if ($seq->can('accession_number') && defined $seq->accession_number && $seq->accession_number ne 'unknown') {
- $uniquename = $seq->accession_number;
- } elsif ($seq->can('accession') && defined $seq->accession && $seq->accession ne 'unknown') {
- $uniquename = $seq->accession;
- } elsif ($seq->can('attributes')) {
+ if ($seq->can('accession_number') && defined $seq->accession_number && $seq->accession_number ne 'unknown') {
+ $uniquename = $seq->accession_number;
+ } elsif ($seq->can('accession') && defined $seq->accession && $seq->accession ne 'unknown') {
+ $uniquename = $seq->accession;
+ } elsif ($seq->can('attributes')) {
($uniquename) = $seq->attributes('load_id');
} else {
- $uniquename = $name;
- }
+ $uniquename = $name;
+ }
my $len = $seq->length();
- if ($len == 0) {
- $len = undef;
- }
-
- undef(my $gb_type);
- if (!$seq->can('molecule') || ! defined ($gb_type = $seq->molecule()) ) {
- $gb_type = $seq->can('alphabet') ? $seq->alphabet : 'DNA';
- }
- $gb_type = 'DNA' if $ftype eq 'dna';
- $gb_type = 'RNA' if $ftype eq 'rna';
-
- if(length $seq_so_type > 0) {
- if (defined $seq_so_type) {
- $ftype = $seq_so_type;
- }
- elsif ($seq->type) {
- $ftype = ($seq->type =~ /(.*):/)
- ? $1
- : $seq->type;
- }
- else {
- $ftype = $gb_type;
- }
- }
- else {
- $ftype = $gb_type;
- }
-
- my %ftype_hash = $self->return_ftype_hash($ftype);
+ if ($len == 0) {
+ $len = undef;
+ }
+
+ undef(my $gb_type);
+ if (!$seq->can('molecule') || ! defined ($gb_type = $seq->molecule()) ) {
+ $gb_type = $seq->can('alphabet') ? $seq->alphabet : 'DNA';
+ }
+ $gb_type = 'DNA' if $ftype eq 'dna';
+ $gb_type = 'RNA' if $ftype eq 'rna';
+
+ if(length $seq_so_type > 0) {
+ if (defined $seq_so_type) {
+ $ftype = $seq_so_type;
+ }
+ elsif ($seq->type) {
+ $ftype = ($seq->type =~ /(.*):/)
+ ? $1
+ : $seq->type;
+ }
+ else {
+ $ftype = $gb_type;
+ }
+ }
+ else {
+ $ftype = $gb_type;
+ }
+
+ my %ftype_hash = $self->return_ftype_hash($ftype);
if ($species) {
%organism = ("genus"=>$genus, "species" => $species);
}
else {
- my $spec = $seq->species();
- if (!defined $spec) {
- $self->throw("$seq does not know what organism it is from, which is required by chado. cannot proceed!\n");
- } else {
- %organism = ("genus"=>$spec->genus(), "species" => $spec->species());
- }
+ my $spec = $seq->species();
+ if (!defined $spec) {
+ $self->throw("$seq does not know what organism it is from, which is required by chado. cannot proceed!\n");
+ } else {
+ %organism = ("genus"=>$spec->genus(), "species" => $spec->species());
+ }
}
my $residues;
@@ -561,22 +565,22 @@ EOUSAGE
$residues = '';
}
- #set is_analysis flag for gene model features
- undef(my $isanal);
- if ($ftype eq 'gene' || $ftype eq 'mRNA' || $ftype eq 'exon' || $ftype eq 'protein' || $ftype eq 'polypeptide') {
- $isanal = $isanalysis;
- $isanal = 'false' if !defined $isanal;
- }
-
- %datahash = (
- "name" => $name,
- "uniquename" => $uniquename,
- "seqlen" => $len,
- "residues" => $residues,
- "type_id" => \%ftype_hash,
- "organism_id" => \%organism,
-