Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redmine migration: BioPerl / BioSQL - InterPro Optional IDs not parsed #20

Open
cjfields opened this issue Jul 5, 2018 · 1 comment
Labels
Redmine Old tickets migrated from the OBF redmine server

Comments

@cjfields
Copy link
Member

cjfields commented Jul 5, 2018

Added by johnraekwon empty about 9 years ago. Updated over 7 years ago.

This bug was first posted on the BioSQL-l, and it was requested that I post it here.

The problem is that the InterPro IDs (aka optional ID) are not entered into the biosql database when using the script load_seqdatabase.pl

This information should probably rightfully go into the database field: dbxref_qualifier_value. In fact, this table is not utilized at all.

The reply to my original post on BioSQL-l is shown here:

The problem here is that Bioperl-db (the persistence mapper between BioSQL and BioPerl) loses the optional_id property of Bio::Annotation::DBLink objects.

Moreover, the dbxref table in BioSQL doesn't actually provide for the opportunity to store two identifiers (or accessions) for one db_xref, so storing this bit of information is actually not as straightforward as one might wish b/c it would need to go into the dbxref_qualifier_value table, and I would not be surprised if the other Bio* projects with a mapping to BioSQL don't store or retrieve this either (though it'd be good to hear if anyone does).

Here are a couple of ideas for how this issue might be addressed.

  • Write a Bio::Seq::BaseSeqProcessor-derived object that for every incoming sequence massages all Interpro links to either substitute the primary_id with the optional_id, or to add a second DBLink annotation with the optional_id of the original one as its primary_id. (pros: relatively easy, entirely under your control; cons: you either lose the primary_id now, or have two dbxref annotations for each of the original ones.)

  • Add a column to the dbxref table, and code to Bioperl-db, that store, de/serialize the extra ID. (pros: not losing or duplicating any data; cons: change is significant in terms of schema stability, requires new release, depends on implementation in Bioperl-db, necessitates update of all other Bio* language bindings)

  • De/serialize the optional_id as an entry in the dbxref_qualifier_value table. (pros: technically it's the Right Way as that's what the table was intended for; cons: implementing in Bioperl-db is more involved as we now need to transform an object property to a child object and back)

So I'd say this is a bug in Bioperl-db in that the dbxref_qualifier_value table isn't utilized here. Would you mind filing it? In the meantime, if you just need something that works, you could try the first of the above ideas.

-hilmar

On Jul 3, 2009, at 7:17 PM, John LaCava wrote:

Hi all,

Tried this on the BioPerl-l but seemed to make sense to try here as well.

I am trying to use the BioPerl-db script:

"load_seqdatabase.pl" to parse a SwissProt ".dat" file (Yeast.dat, this is the yeast proteome with annotations etc.).

The particular entry I am interested is the InterPro optional ID, which is the domain name.

I have put a short stub up which displays the 4 pieces of info I want to parse into my data base.
That can be found here:

http://github.com/johnraekwon/BioPerl---BioSQL---InterPro-Optional-IDs/tree/master

You can see that near the bottom, we get the optional ID:
$protein_ids->{interpro_domain} = $dblink->{optional_id};

I do not think the bioperl script load_seqdatabase.pl retrieves this information. At least, I cannot find it in the db built from parsing a test .dat file.
I would like some help figuring out:

  1. WHY doesn't it retrieve this information, since it seems to be parsing "all" annotations...
  2. HOW might I edit the script to include this particular annotation of interest in the info it passes to my db (biosql)

I am a bit out of my depth on this, and so, any help is appreciated.

Cheers,
John

@cjfields
Copy link
Member Author

cjfields commented Jul 5, 2018

Updated by Jason Stajich over 7 years ago

Priority changed from Urgent to Normal

@cjfields cjfields added the Redmine Old tickets migrated from the OBF redmine server label Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Redmine Old tickets migrated from the OBF redmine server
Projects
None yet
Development

No branches or pull requests

1 participant