Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal tables #9

Closed
zakandrewking opened this issue Sep 4, 2014 · 13 comments
Closed

Universal tables #9

zakandrewking opened this issue Sep 4, 2014 · 13 comments
Milestone

Comments

@zakandrewking
Copy link
Contributor

  • Loading universal components
    1. If KEGG_ID is new, then add a new universal component
    2. If KEGG_ID is not new, then add connect to existing universal components
      • New column in model_compartmentalized_component for old_bigg_id
    3. If no KEGG_ID, then add a new universal component, and flag the row
  • Loading universal reactions
    • Compare new reactions by stoichiometry
    • check each metabolite and coefficient
@zakandrewking zakandrewking added this to the submission milestone Sep 4, 2014
@steve-federowicz
Copy link
Member

Hey so was just talking with justin about this and ran into an issue.

The first is that most models don't have associated KEGG IDs. Ideally they all would and maybe I am wrong but pretty sure only 10-20% of what we load will have them.

Second is that we probably have to have our own non-external unique ids for universal components because there are going to be many more types of components than just metabolites. The above scheme would work but it would have a massive number of flagged rows.

@zakandrewking
Copy link
Contributor Author

I can't check on this right now, but I remember going into Simpheny and seeing KEGG ids for almost every metabolite I checked. Definitely the central metabolic ones. I wonder if they never made it into GRMIT?

It's OK to have a massive number of flagged rows. We have to deal with this someday, and there are many automated approaches to consider. Andreas has done something very similar.

For non-metabolites, we should try to come up with external IDs where possible. For anything that's part of a template reaction, we can get fancy. For instance, a transcription elongation reaction could be linked to the reaction template AND to an external gene ID. But we don't have to solve that immediately.

@pillmill
Copy link
Contributor

The Kegg and Cas IDs were imported from Simpheny.

On Sun, Sep 21, 2014 at 7:33 PM, Zachary King notifications@github.com
wrote:

I can't check on this right now, but I remember going into Simpheny and
seeing KEGG ids for almost every metabolite I checked. Definitely the
central metabolic ones. I wonder if they never made it into GRMIT?

It's OK to have a massive number of flagged rows. We have to deal with
this someday, and there are many automated approaches to consider. Andreas
has done something very similar.

For non-metabolites, we should try to come up with external IDs where
possible. For anything that's part of a template reaction, we can get
fancy. For instance, a transcription elongation reaction could be linked to
the reaction template AND to an external gene ID. But we don't have to
solve that immediately.


Reply to this email directly or view it on GitHub
#9 (comment).

@draeger
Copy link

draeger commented Sep 22, 2014

Am 09/05/14 um 19:33 schrieb Zachary King:

I can't check on this right now, but I remember going into Simpheny
and seeing KEGG ids for almost every metabolite I checked. Definitely
the central metabolic ones. I wonder if they never made it into GRMIT?

It's OK to have a massive number of flagged rows. We have to deal with
this someday, and there are many automated approaches to consider.
Andreas has done something very similar.

For non-metabolites, we should try to come up with external IDs where
possible. For anything that's part of a template reaction, we can get
fancy. For instance, a transcription elongation reaction could be
linked to the reaction template AND to an external gene ID. But we
don't have to solve that immediately.


Reply to this email directly or view it on GitHub
#9 (comment).

Hi guys,

IMHO, having an own ID schema in addition to providing references to
external KEGG IDs would be a great idea. If BiGG ids would be consistent
and unique, other users could refer to us instead of pointing to KEGG
etc. It would be very nice if many models would contain references to
BiGG, ultimately increasing our access count when people look those up.

Cheers
Andreas

Dr. Andreas Draeger
University of California, San Diego, La Jolla, CA 92093-0412, USA
Bioengineering Dept., Systems Biology Research Group, Office #2506
Phone: +1-858-534-9717, Fax: +1-858-822-3120, twitter: @dr_drae

@steve-federowicz
Copy link
Member

Ok, so how does this sound as a temporary solution.

  1. We start by loading models that we know came from simpheny and have KEGG ids
    • This ensures that most of the primary metabolic universal components will have metabolite entries that contain a valid KEGG id
  2. Since we already have a column for KEGG id in the metabolite table then a simple select * from metabolite where kegg_id is null; should do the trick in keeping track of metabolites which need curation.

What do you think?

Actually in re-reading this is essentially exactly what you originally proposed??

@zakandrewking
Copy link
Contributor Author

Bingo 🎱 (don't read into that)

@jslu9
Copy link
Contributor

jslu9 commented Sep 24, 2014

Yeah, right now I made it so that the universal metabolite will update its
kegg id if it has a missing kegg id and another metabolite with the same
name and has a kegg id is uploaded into the database.

On Tue, Sep 23, 2014 at 5:53 PM, Zachary King notifications@github.com
wrote:

Bingo [image: 🎱](don't read into that)


Reply to this email directly or view it on GitHub
#9 (comment).

@zakandrewking
Copy link
Contributor Author

  • universal metabolite table and page
  • universal reaction table and page

@jslu9 jslu9 closed this as completed Oct 17, 2014
@jslu9 jslu9 reopened this Oct 17, 2014
@jslu9
Copy link
Contributor

jslu9 commented Oct 17, 2014

Just talked to John to try to put kegg ids into his new updated sbml models and he said that personally doesn't think that kegg ids are good for universal ids. He mentioned that metanetx is better.

@steve-federowicz
Copy link
Member

Hmmmm ok, I would be fine with metanetx. I was always pretty impressed with their atom mapping work. I'm not sure if it has been fully implemented within metanetx yet but I think it will be and at that point I think it will be a pretty dominantly sophisticated resource. I think the downside is that KEGG has a lot of visibility outside of constraint-based sysbio and if we go with metanetx ids we are potentially losing some visibility. However, the upside is that costass and metanetx aren't going anywhere and will only continue to get better. Costas is also a friendly lab and so if an official collaboration needed to happen or larger things were to move forward then it would likely be a good situation.

Sent from my iPhone

On Oct 17, 2014, at 11:16 AM, Justin Lu notifications@github.com wrote:

Just talked to John to try to put kegg ids into his new updated sbml models and he said that personally doesn't think that kegg ids are good for universal ids. He mentioned that metanetx is better.


Reply to this email directly or view it on GitHub.

@zakandrewking
Copy link
Contributor Author

We aren't technically using KEGG as universal ids: We are using KEGG to
generate universal BIGG ids. So we don't have to limit ourselves to one
type of external reference ID. It's worth thinking about this more.

Jon already has metanetx ids for his models?

On Fri, Oct 17, 2014 at 3:37 PM, Steve Federowicz notifications@github.com
wrote:

Hmmmm ok, I would be fine with metanetx. I was always pretty impressed
with their atom mapping work. I'm not sure if it has been fully implemented
within metanetx yet but I think it will be and at that point I think it
will be a pretty dominantly sophisticated resource. I think the downside is
that KEGG has a lot of visibility outside of constraint-based sysbio and if
we go with metanetx ids we are potentially losing some visibility. However,
the upside is that costass and metanetx aren't going anywhere and will only
continue to get better. Costas is also a friendly lab and so if an official
collaboration needed to happen or larger things were to move forward then
it would likely be a good situation.

Sent from my iPhone

On Oct 17, 2014, at 11:16 AM, Justin Lu notifications@github.com
wrote:

Just talked to John to try to put kegg ids into his new updated sbml
models and he said that personally doesn't think that kegg ids are good for
universal ids. He mentioned that metanetx is better.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#9 (comment).

@jslu9
Copy link
Contributor

jslu9 commented Oct 18, 2014

I don't think so actually. He might have some but it's not in his sbmls for
certain. But today I discussed with Jon on pulling out the kegg ids and cas
numbers and then putting them into his cobrapy objects. He'll be sending
his new models (w/ kegg ids) once he's done updating his python script.

On Fri, Oct 17, 2014 at 3:48 PM, Zachary King notifications@github.com
wrote:

We aren't technically using KEGG as universal ids: We are using KEGG to
generate universal BIGG ids. So we don't have to limit ourselves to one
type of external reference ID. It's worth thinking about this more.

Jon already has metanetx ids for his models?

On Fri, Oct 17, 2014 at 3:37 PM, Steve Federowicz <
notifications@github.com>
wrote:

Hmmmm ok, I would be fine with metanetx. I was always pretty impressed
with their atom mapping work. I'm not sure if it has been fully
implemented
within metanetx yet but I think it will be and at that point I think it
will be a pretty dominantly sophisticated resource. I think the downside
is
that KEGG has a lot of visibility outside of constraint-based sysbio and
if
we go with metanetx ids we are potentially losing some visibility.
However,
the upside is that costass and metanetx aren't going anywhere and will
only
continue to get better. Costas is also a friendly lab and so if an
official
collaboration needed to happen or larger things were to move forward
then
it would likely be a good situation.

Sent from my iPhone

On Oct 17, 2014, at 11:16 AM, Justin Lu notifications@github.com
wrote:

Just talked to John to try to put kegg ids into his new updated sbml
models and he said that personally doesn't think that kegg ids are good
for
universal ids. He mentioned that metanetx is better.


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#9 (comment).


Reply to this email directly or view it on GitHub
#9 (comment).

@draeger
Copy link

draeger commented Oct 19, 2014

Am 17.10.14 um 19:35 schrieb Justin Lu:

I don't think so actually. He might have some but it's not in his
sbmls for
certain. But today I discussed with Jon on pulling out the kegg ids
and cas
numbers and then putting them into his cobrapy objects. He'll be sending
his new models (w/ kegg ids) once he's done updating his python script.

Let's talk about all this on Tuesday during code talk. I think this is
very important and deserves a few words of direct discussion.

Dr. Andreas Draeger
University of California, San Diego, La Jolla, CA 92093-0412, USA
Bioengineering Dept., Systems Biology Research Group, Office #2506
Phone: +1-858-534-9717, Fax: +1-858-822-3120, twitter: @dr_drae

@jslu9 jslu9 closed this as completed Nov 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants