Adding molecular cloning methods to Bio::SeqUtils #31

Merged
merged 10 commits into from Jan 13, 2012

Projects

None yet

2 participants

@fschwach

As discussed on the mailing ist:

I needed to manipulate Bio::Seq objects with annotations and sequence
features to simulate molecular cloning techniques, e.g. to cut a vector
and insert a fragment into it while preserving all the annotations and
moving the features accordingly.
My main aim was to split features that span deletion/insertion sites in
a meaningful way, which can not be done with the currently availble
methods.
I have modified Bio::SeqUtils so that I have the following new methods:

delete

removes a segment from a sequence object and adjusts positions and types
of locations of sequence features:

  • locations of features that span the deletion sites are turned into
    Splits.
  • locations that extend into the deleted region are turned to Fuzzy to
    indicate that their true start/end was lost.
  • locations contained inside the deleted regions are lost.
  • other features are shifted according to the length of the deletion.

insert

adds a Bio::Seq object into another one between specified insertion
sites. This also affects the features on the recipient sequence:

  • locations of features that span the insertion site are split but
    position types are not turned to Fuzzy because no part of the original
    feature is lost.
  • other features are shifted according to the length of the insertion.

ligate

just for convenience. Supply a recipient, a fragment and one or two
sites to cut the recipient. Can also flip the fragment if required.
Simply calls delete [, reverse_complement_with_features] and insert in
turn.

One situation I haven't handled yet is a deletion that spans the origin
of a circular molecule but that should be a rare thing to do anyway. The
code currently throws an error if this is attempted.

Frank Schwach added some commits Jan 10, 2012
Frank Schwach Added methods for in-silico molecular cloning to Bio::SeqUtils.
delete: remove a segment from a sequence object, preserving annotations
and features.
insert: insert a fragment sequence object into a recipient sequence object,
preserving features and annotations
ligate: combine delete and insert to simulate digestion of a recipient
and ligation of a fragment into the recipient.
f489238
Frank Schwach Added to Contributors 6c1db90
Frank Schwach Added POD for new features d31ea3f
Frank Schwach hanged recursive acquisition of sub features for deletion/insertion f…
…rom using remove_SeqFeatures to get_SeqFeatures. The former has the side-effect of modifying the original sequence object, which should be avoided
54322ef
Frank Schwach added tests for delete/insert/ligate methods 7c9e48d
Frank Schwach modified existing methods _coord_revcom and _coord_adjust to make use…
… of the new methods _single_loc_object_from_collection and _location_objects_from_coordinate_list to reduce code-duplication
e0e063f
@fschwach

One thing I was considering while writing the code was to use Clone::Fast to generate the new objects for the recipient sequence. Currently, the code asks if the sequence object is allowed to call "new" on its class and if not, creates a PrimarySeq object instead. If we could simply clone the object, we would not have to do this.
I'm just wondering if there is any reason (which I have not come across) why Clone::Fast (or any other Clone module) should not be used here - could there be problems in a threaded environment?

Frank Schwach added some commits Jan 10, 2012
Frank Schwach corrected call to revcom_with_features in "ligate" and corrected POD …
…for method "ligate"
8b26e80
Frank Schwach corrected call to 'ligate' method with named parameters 1190cca
Frank Schwach changed behaviour of feature ends in deletions: a deletion no longer
turns truncated feature ends Fuzzy. Instead, as suggested by Roy
Chaudhuri and Chris Fields, they don't change type but a note is added
to the feature, informing about the length and position of the deletion.
Notes are now also added to features that have received an insertion.
The notes refer to the affected feature end as 3'/5' if the feature has
a strand, or start/end if it doesn't.
Also corrected an error in calculating the start position of subfeatures
that are created by insertions (was off by 1).
Added tests for the notes and removed tests for changed location
types
72ac9a8
Frank Schwach Added feature for deletion sites and bugfixes
'delete' method now adds a misc_feature with a note about
the length of the deletion site. The location type of this feature
is IN-BETWEEN.

Features of type IN-BETWEEN must have adjacent start/
end pos, so they are now deleted in the 'insert' method if they
co-localise with the insertion site. This happens when 'delete' is
followed by 'insert' or when using the 'ligate' shortcut method.
'insert' now also handles deleted features like 'delete', which
only applies to features with IN-BETWEEN locations.

Other fixes:

 - 'ligate' now skips the 'delete' step if 'left' and 'right' are
   adjacent because no deletion actually occurs.

 - '_coord_adjust_deletion': fixed test for splitting a feature.
   can not use 'contains' because that returns true if one or both
   coordinates of the feature and deletions co-localise but a split
   only makes sense when both ends overlap.

 - added and modified tests accordingly
014eda4
@cjfields
Member

BioPerl objects can be cloned (there is a Bio::Root::Root::clone() method). This will use either Clone or Storable (Clone preferentially, Storable as the core fallback). I haven't used Clone::Fast, but it appears to use Clone as well.

@cjfields
Member

Merging this in, btw.

@cjfields cjfields merged commit e2b0616 into bioperl:master Jan 13, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment