You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added by johnraekwon empty about 9 years ago. Updated over 7 years ago.
This bug was first posted on the BioSQL-l, and it was requested that I post it here.
The problem is that the InterPro IDs (aka optional ID) are not entered into the biosql database when using the script load_seqdatabase.pl
This information should probably rightfully go into the database field: dbxref_qualifier_value. In fact, this table is not utilized at all.
The reply to my original post on BioSQL-l is shown here:
The problem here is that Bioperl-db (the persistence mapper between BioSQL and BioPerl) loses the optional_id property of Bio::Annotation::DBLink objects.
Moreover, the dbxref table in BioSQL doesn't actually provide for the opportunity to store two identifiers (or accessions) for one db_xref, so storing this bit of information is actually not as straightforward as one might wish b/c it would need to go into the dbxref_qualifier_value table, and I would not be surprised if the other Bio* projects with a mapping to BioSQL don't store or retrieve this either (though it'd be good to hear if anyone does).
Here are a couple of ideas for how this issue might be addressed.
Write a Bio::Seq::BaseSeqProcessor-derived object that for every incoming sequence massages all Interpro links to either substitute the primary_id with the optional_id, or to add a second DBLink annotation with the optional_id of the original one as its primary_id. (pros: relatively easy, entirely under your control; cons: you either lose the primary_id now, or have two dbxref annotations for each of the original ones.)
Add a column to the dbxref table, and code to Bioperl-db, that store, de/serialize the extra ID. (pros: not losing or duplicating any data; cons: change is significant in terms of schema stability, requires new release, depends on implementation in Bioperl-db, necessitates update of all other Bio* language bindings)
De/serialize the optional_id as an entry in the dbxref_qualifier_value table. (pros: technically it's the Right Way as that's what the table was intended for; cons: implementing in Bioperl-db is more involved as we now need to transform an object property to a child object and back)
So I'd say this is a bug in Bioperl-db in that the dbxref_qualifier_value table isn't utilized here. Would you mind filing it? In the meantime, if you just need something that works, you could try the first of the above ideas.
-hilmar
On Jul 3, 2009, at 7:17 PM, John LaCava wrote:
Hi all,
Tried this on the BioPerl-l but seemed to make sense to try here as well.
I am trying to use the BioPerl-db script:
"load_seqdatabase.pl" to parse a SwissProt ".dat" file (Yeast.dat, this is the yeast proteome with annotations etc.).
The particular entry I am interested is the InterPro optional ID, which is the domain name.
I have put a short stub up which displays the 4 pieces of info I want to parse into my data base.
That can be found here:
You can see that near the bottom, we get the optional ID:
$protein_ids->{interpro_domain} = $dblink->{optional_id};
I do not think the bioperl script load_seqdatabase.pl retrieves this information. At least, I cannot find it in the db built from parsing a test .dat file.
I would like some help figuring out:
WHY doesn't it retrieve this information, since it seems to be parsing "all" annotations...
HOW might I edit the script to include this particular annotation of interest in the info it passes to my db (biosql)
I am a bit out of my depth on this, and so, any help is appreciated.
Cheers,
John
The text was updated successfully, but these errors were encountered:
Added by johnraekwon empty about 9 years ago. Updated over 7 years ago.
This bug was first posted on the BioSQL-l, and it was requested that I post it here.
The problem is that the InterPro IDs (aka optional ID) are not entered into the biosql database when using the script load_seqdatabase.pl
This information should probably rightfully go into the database field: dbxref_qualifier_value. In fact, this table is not utilized at all.
The reply to my original post on BioSQL-l is shown here:
The problem here is that Bioperl-db (the persistence mapper between BioSQL and BioPerl) loses the optional_id property of Bio::Annotation::DBLink objects.
Moreover, the dbxref table in BioSQL doesn't actually provide for the opportunity to store two identifiers (or accessions) for one db_xref, so storing this bit of information is actually not as straightforward as one might wish b/c it would need to go into the dbxref_qualifier_value table, and I would not be surprised if the other Bio* projects with a mapping to BioSQL don't store or retrieve this either (though it'd be good to hear if anyone does).
Here are a couple of ideas for how this issue might be addressed.
Write a Bio::Seq::BaseSeqProcessor-derived object that for every incoming sequence massages all Interpro links to either substitute the primary_id with the optional_id, or to add a second DBLink annotation with the optional_id of the original one as its primary_id. (pros: relatively easy, entirely under your control; cons: you either lose the primary_id now, or have two dbxref annotations for each of the original ones.)
Add a column to the dbxref table, and code to Bioperl-db, that store, de/serialize the extra ID. (pros: not losing or duplicating any data; cons: change is significant in terms of schema stability, requires new release, depends on implementation in Bioperl-db, necessitates update of all other Bio* language bindings)
De/serialize the optional_id as an entry in the dbxref_qualifier_value table. (pros: technically it's the Right Way as that's what the table was intended for; cons: implementing in Bioperl-db is more involved as we now need to transform an object property to a child object and back)
So I'd say this is a bug in Bioperl-db in that the dbxref_qualifier_value table isn't utilized here. Would you mind filing it? In the meantime, if you just need something that works, you could try the first of the above ideas.
-hilmar
On Jul 3, 2009, at 7:17 PM, John LaCava wrote:
Hi all,
Tried this on the BioPerl-l but seemed to make sense to try here as well.
I am trying to use the BioPerl-db script:
"load_seqdatabase.pl" to parse a SwissProt ".dat" file (Yeast.dat, this is the yeast proteome with annotations etc.).
The particular entry I am interested is the InterPro optional ID, which is the domain name.
I have put a short stub up which displays the 4 pieces of info I want to parse into my data base.
That can be found here:
http://github.com/johnraekwon/BioPerl---BioSQL---InterPro-Optional-IDs/tree/master
You can see that near the bottom, we get the optional ID:
$protein_ids->{interpro_domain} = $dblink->{optional_id};
I do not think the bioperl script load_seqdatabase.pl retrieves this information. At least, I cannot find it in the db built from parsing a test .dat file.
I would like some help figuring out:
I am a bit out of my depth on this, and so, any help is appreciated.
Cheers,
John
The text was updated successfully, but these errors were encountered: