Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash on the attempt to store same sequence in a diff. namespace #5

Open
cjfields opened this issue Oct 7, 2015 · 4 comments
Open

Comments

@cjfields
Copy link
Member

cjfields commented Oct 7, 2015


Author Name: Dmitry Samborskiy (Dmitry Samborskiy)
Original Redmine Issue: 2280, https://redmine.open-bio.org/issues/2280
Original Date: 2007-04-27
Original Assignee: Bioperl Guts


Hi All,

I’ve found that ‘Duplicate entry’ crash occurs if I store the same
sequence the second time (but in a different namespace).

The attached archive contains complete and reproducable
(I believe) example for this issue.

I use stable bioperl-1.5.2/bioperl-db-1.5.2 releases against
mysql-4.1.16 database server.

Thanks in advance,
Dmitry Samborskiy

P.S. I got following output:

—————————— WARNING ——————————-
MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were ("
“,”Direct Submission“,”Submitted (10-JUL-2004) National Center for Biotechnology
Information, NIH, Bethesda, MD 20894, USA“,”CRC-7AF85E0508A630AE“,”1“,”3429“,”"
) FKs ()
Duplicate entry ‘CRC-7AF85E0508A630AE’ for key 3
—————————————————————————-
Could not store NC_005982:
——————- EXCEPTION ——————-
MSG: create: object (Bio::Annotation::Reference) failed to insert or to be found
by unique key
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /tmp/perl/lib/perl5/site_p
erl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /tmp/perl/lib/perl5/site_pe
rl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK Bio::DB::Persistent::PersistentObject::store /tmp/perl/lib/perl5/site_perl
/5.8.6/Bio/DB/Persistent/PersistentObject.pm:271
STACK Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children /tmp/perl/lib
/perl5/site_perl/5.8.6/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:217
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /tmp/perl/lib/perl5/site_p
erl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /tmp/perl/lib/perl5/site_pe
rl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK Bio::DB::Persistent::PersistentObject::store /tmp/perl/lib/perl5/site_perl
/5.8.6/Bio/DB/Persistent/PersistentObject.pm:271
STACK Bio::DB::BioSQL::SeqAdaptor::store_children /tmp/perl/lib/perl5/site_perl/
5.8.6/Bio/DB/BioSQL/SeqAdaptor.pm:224
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /tmp/perl/lib/perl5/site_p
erl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /tmp/perl/lib/perl5/site_pe
rl/5.8.6/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK Bio::DB::Persistent::PersistentObject::store /tmp/perl/lib/perl5/site_perl
/5.8.6/Bio/DB/Persistent/PersistentObject.pm:271
STACK (eval) ./load_seqdatabase.pl:620
STACK toplevel ./load_seqdatabase.pl:602


at ./load_seqdatabase.pl line 633

@cjfields
Copy link
Member Author

cjfields commented Oct 7, 2015


Original Redmine Comment
Author Name: Dmitry Samborskiy
Original Date: 2007-04-27T18:10:09Z


Created an attachment (id=639)
Test example

@cjfields
Copy link
Member Author

cjfields commented Oct 7, 2015


Original Redmine Comment
Author Name: Chris Fields
Original Date: 2008-03-05T17:13:33Z


I’m not sure how you are using load_seqdatabase.pl here; I think the script by default assumes you are loading new sequences in the database unless you specify options like ‘remove’, ‘update’, ‘safe’, etc., otherwise it dies if dups are possibly being inserted into the database (‘safe’ just bypasses the errors, and I believe ‘remove’ and ‘update’ do what they suggest).

The test script you attached also tries to switch the namespace directly by getting the persistent obj from the database, assign it a new namespace, and then store it. The problem with this approach is you are attempting to store the object using the same assigned primary_key (so it would indeed move it, as you’re updating the current obj, not a create()). Notably, using create() with a pers. object with an assigned primary_key() gets you an error (and a hint):

——————- EXCEPTION: Bio::Root::Exception ——————-
MSG: must not change primary_key() once it is set
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/cjfields/bioperl/bioperl-live/Bio/Root/Root.pm:357
STACK: Bio::DB::Persistent::PersistentObject::primary_key /Users/cjfields/bioperl/db/Bio/DB/Persistent/PersistentObject.pm:321
STACK: Bio::DB::Persistent::Seq::primary_key /Users/cjfields/bioperl/db/Bio/DB/Persistent/Seq.pm:124
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cjfields/bioperl/db/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:211
STACK: Bio::DB::Persistent::PersistentObject::create /Users/cjfields/bioperl/db/Bio/DB/Persistent/PersistentObject.pm:244
STACK: test.pl:33
—————————————————————————————-

The way I have worked out to do this is to reset the seq object’s primary_key() by assigning it undef prior to using store() or create() (which assigns a new primary key for the object, even if it is in the same namespace):

  1. store the found sequence in the second biodatabase:
    my $pseq = $seqadp->create_persistent($seq);
    $pseq->namespace($ns2);
    $pseq->primary_key(undef);
    $pseq->store(); # assign new primary key
    $seqadp->commit;

This works as long as the sequence namespace doesn’t match an already present one.

It might be worth adding some tests to make sure remove()-ing one persistent sequence doesn’t cause problems with the other sequences in different namespaces. I would also like Hilmar to comment on this as well to see if this is an adequate solution or if there are potential problems.

(In reply to comment #0)

Hi All,

I’ve found that ‘Duplicate entry’ crash occurs if I store the same
sequence the second time (but in a different namespace).

The attached archive contains complete and reproducable
(I believe) example for this issue.

@cjfields
Copy link
Member Author

cjfields commented Oct 7, 2015


Original Redmine Comment
Author Name: Hilmar Lapp
Original Date: 2008-03-09T19:26:20Z


(In reply to comment #2)

I’m not sure how you are using load_seqdatabase.pl here; I think the script by
default assumes you are loading new sequences in the database unless you
specify options like ‘remove’, ‘update’, ‘safe’, etc.,

Actually, one must specify —lookup to have incoming sequences looked up against the database first. All the other switches (except —remove, which works by itself) specify what to do if the sequence is indeed found already.

Since namespace (if set) is part of the unique key of a sequence, loading the same file (or sequence) under a different namespace should indeed create a duplicate of it. The error that Dmitry reports also isn’t an error from violating the unique key on bioentry or biosequence, so it is a rather odd one and surely indicative of a bug - the supposed behavior is to find the reference from the previous insert (since it will have the same unique key; bioentry doesn’t have a part in the unique key of a reference, only in the association between a reference and a bioentry)

However, if I recall correctly there was a bugfix in the ReferenceAdaptor’s implementation of its unique key search, so this might actually be fixed meanwhile. To check, Dmitry’s test case would have to be run with the svn HEAD. I probably won’t get to this right away, but if anyone has a chance, it’d be helpful to get confirmation from someone being set up to rerun the test against HEAD.

-hilmar

@cjfields
Copy link
Member Author

cjfields commented Oct 7, 2015


Original Redmine Comment
Author Name: Chris Fields
Original Date: 2008-11-29T15:37:57Z


Pushing to 1.6 bioperl-db point release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant