Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgres 8.3 - load_seqdatabase.pl / swissprot #7

Open
cjfields opened this issue Oct 7, 2015 · 4 comments
Open

postgres 8.3 - load_seqdatabase.pl / swissprot #7

cjfields opened this issue Oct 7, 2015 · 4 comments

Comments

@cjfields
Copy link
Member

cjfields commented Oct 7, 2015


Author Name: Erikjan empty (Erikjan empty)
Original Redmine Issue: 2474, https://redmine.open-bio.org/issues/2474
Original Date: 2008-03-23
Original Assignee: Bioperl Guts


Latest bioperl-live, bioperl-db, biosql schema.

Using:
PostgreSQL 8.3.1
DBD::Pg 2.3.0
perl 5.8.8

Loading uniprot_sprot.dat with load_seqdatabase.pl

Some entries are rejected. The errors and warnings are:

10 instances of error: value too long for type character varying(40) (all BioCyc id’s that are bit longer than 40 chars)
if we change the varchar(40) to varchar(128) ( or something ), these entries should be alright.

——————————- WARNING ——————————-
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were (“BioCyc”,“EcoCyc:ASP-SEMIALDEHYDE-DEHYDROGENASE-MON”,“0”,“”) FKs ()
ERROR: value too long for type character varying(40)
—————————————————————————-
Could not store P0A9Q9:
——————- EXCEPTION: Bio::Root::Exception ——————-
MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be found by unique key
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK: Bio::DB::Persistent::PersistentObject::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:217
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK: Bio::DB::Persistent::PersistentObject::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: Bio::DB::BioSQL::SeqAdaptor::store_children /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/SeqAdaptor.pm:224
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK: Bio::DB::Persistent::PersistentObject::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: scripts/biosql/load_seqdatabase.pl:630
—————————————————————————————-

150 instances of error: “Could not store”. (The offending sprot id’s are enumerated below the error stack.)

Could not store P0C6J8:
——————- EXCEPTION: Bio::Root::Exception ——————-
MSG: create: object (Bio::Species) failed to insert or to be found by unique key
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK: Bio::DB::Persistent::PersistentObject::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK: Bio::DB::Persistent::PersistentObject::store /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: scripts/biosql/load_seqdatabase.pl:630
—————————————————————————————-

(this list includes id’s with some other errors, for instance the above BioCyc errors. It’s a bit hard to separate those out)
P0C6J8 P0C695 P0C696 P0C697 P0C698 Q9QAB9 Q67924 Q9QBF2 P0C677 Q9E6S6 Q81102 P0C6H7 Q81164 P0C6H8 Q9QMI2 P0C680 P0C6I2 P0C6I3 Q69608 P0C683
P0C682 P0C6I8 P0C6I9 P0C6I6 O71303 Q37472 P54971 P85028 P0A9Q9 P0C691 Q91C36 O91533 Q4R1S7 Q4R1R9 Q9QAB8 Q9PX62 Q67925 Q9QBF1 P0C676 Q9E6S5
P0C688 Q913A7 Q81165 P0C690 Q9QMI1 P0C679 Q67878 O56655 Q69602 Q80IU7 Q9QAW8 Q80IU4 Q8JMY4 Q99HS4 Q99HR5 Q69605 Q8QZQ2 Q9IBI4 P87744 Q9YPV8
Q8JMY7 Q8JN08 Q8JMZ7 Q9J5S2 O71304 P03398 P03324 P0C6J9 Q91C37 O91532 Q4R1S8 Q4R1S0 P0C6G8 P0C6H0 P0C6H1 P0C6K6 P0C6H2 P0C6H3 Q913A8 P0C6K5
P0C6I0 P0C6I1 Q67876 O92920 P0C6I5 P89951 Q9WJE9 Q8JMZ4 P0C6J0 P0C684 Q91C35 O91534 Q4R1S6 Q4R1R8 Q9QAB7 Q9PWW3 Q67926 Q9QBF0 Q8JXB9 Q9E6S4
P31868 Q913A6 Q81162 Q998L9 Q9QMI0 Q998M2 Q67875 O92921 Q69603 Q80IU6 Q80IU3 Q99HS3 Q99HR4 Q69606 Q9IBI3 P87745 Q9WKC4 Q8JMY6 Q8JN07 Q8JMZ6
Q77NU1 O71305 P21645 P01546 P09348 P0AF06 P0A749 Q000A9 P02147 P31057 P26647 Q65399 P00529 Q9I9M4 P0AGK1 P0A887 P75728 Q91C38 O91531 Q4R1S9
Q4R1S1 P0C685 Q9PXA2 Q67923 Q9PX75 P0C678 Q9E6S8 P0C686 Q913A9 Q81163 P0C687 Q9QMI3 P0C681 Q67877 O93195 Q69604 Q80IU8 Q9QAX0 Q80IU5 Q8JMY3
Q99HR6 Q69607 Q9IBI5 P87743 Q9YJT2 Q8JMY5 Q8JN06 Q8JMZ5 Q9J5S3 O71302

Then there around 1000 WARNINGs about taxonomy (I think), of the form:
——————————- WARNING ——————————-
MSG: The supplied lineage does not start near ‘Epstein-Barr virus’ (I was supplied ‘Human herpesvirus 4 | Lymphocryptovirus | Gammaherpesvirinae | Herpesviridae’)
—————————————————————————-
I will attach a file with the output of
grep “^MSG:” ~/load_seqdatabase.pl.swissprot.output.txt | sort | uniq -c
(Unfortunately the sprot id is not mentioned)

see also:

http://article.gmane.org/gmane.comp.lang.perl.bio.general/16844

http://bugzilla.open-bio.org/show\_bug.cgi?id=2389

@cjfields
Copy link
Member Author

cjfields commented Oct 7, 2015


Original Redmine Comment
Author Name: Erikjan empty
Original Date: 2008-03-23T20:22:08Z


Created an attachment (id=882)
MSGs from load_seqdatabase.pl / swissprot

@cjfields
Copy link
Member Author

cjfields commented Oct 7, 2015


Original Redmine Comment
Author Name: Bank Beszteri
Original Date: 2008-04-08T03:26:19Z


Created an attachment (id=898)
Another output from load_seqdatabase.pl illustrating taxonomic conflicts between Swissprot flat file (v.13.1) & NCBI taxonomy

@cjfields
Copy link
Member Author

cjfields commented Oct 7, 2015


Original Redmine Comment
Author Name: Bank Beszteri
Original Date: 2008-04-08T03:32:32Z


(From update of attachment 898)
Forgot to add: MySQL this time (client v.4.0.18, server v.5.0.45)

@cjfields
Copy link
Member Author

cjfields commented Oct 7, 2015


Original Redmine Comment
Author Name: Chris Fields
Original Date: 2008-11-29T15:43:34Z


Pushing to 1.6 bioperl-db point release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant