Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seed #11

Closed
ntromas opened this issue Apr 14, 2021 · 11 comments
Closed

seed #11

ntromas opened this issue Apr 14, 2021 · 11 comments

Comments

@ntromas
Copy link

ntromas commented Apr 14, 2021

Hi,

Thanks for this pipeline. I am at the same time looking at m2m. I wonder how to generate a "seed" file in .xml especially if I have no idea about the medium that would be used for the community I am working on.
Any suggestions? I am not interested into a specific pathways but more how "my" bacteria community cooperate.
Thanks for your help!

Nico

@cfrioux
Copy link
Owner

cfrioux commented Apr 15, 2021

Hi Nico,

It depends on the environmental context. If you work with human gut bacteria, you can take a look at the VMH diets. Otherwise, you can design a set of minimal nutrients using literature. You could even consider running the program multiple times, each time with a different randomized set of seeds (where you vary the carbon source or others) in order to capture the cooperation landscape in multiple environments.

As for the technical side, should you need help to generate the xml file itself, you can use m2m seeds.

Hope this helps a bit,

Clémence

@ntromas
Copy link
Author

ntromas commented Apr 15, 2021

Hi Clemence,

Thanks for your answer! I wonder if by any chance you would have some example of metabolites.txt files. On m2m, I can see that the name are specific. I guess it is from a database, right? Or it is from the db used to generate the model (I used gapseq)... Sorry if for the naive question!

Cheers,

Nico

@cfrioux
Copy link
Owner

cfrioux commented Apr 15, 2021

Hi Nico,

I don't have any examples but I can explain why the names seem so specific. It is because i) they match compounds IDs from the Metacyc database and ii) they are encoded (special characters are forbidden in SBML species id fields, compounds ids usually start by M and have a suffix made of the compartment id).

You have to use the compounds that match your metabolic networks you wish to study, and thus the database you used to build them. If you look at one of the metabolic networks, you have to provide a list of metabolite identifiers that are consistant with the ones in its listofspecies, and select their id:

<listOfSpecies>
        <species id="M_GLUCOSE_c" name="GLUCOSE" compartment="c"/>
        <species id="M_CO2_c" name="CO2" compartment="c"/>
        <species id="M_O2_c" name="O2" compartment="c"/>

In the example above, if I wanted to add cytosolic glucose to the seeds, I'd have to add M_GLUCOSE_c in the metabolites.txt file.

Let me know if something is unclear.

Clémence

@ntromas
Copy link
Author

ntromas commented Apr 15, 2021

Hi Clemence,

Thanks for your answer! I generated a model for each member of a small community (3-4 taxa) from their genomes. As we never cultivated them, I used default parameters to generate the models with gapseq (with a default medium). For the construction of metabolic network models, gapseq uses a reaction and metabolite database that is derived from the ModelSEED database.
My objective is to determine the possibility that one of the community member could share/cooperate with another one.
If I understood correctly, miscoto and m2m needs models. I just wonder how to list metabolites or medium composition that would reflect the in natura conditions (e.g freshwater) to build as well my model and to play with miscoto or m2m.
Again, sorry for my naive questions, I am learning how to use these tools - pretty new to me!

Thanks a lot for your time and suggestions,

Nico

@cfrioux
Copy link
Owner

cfrioux commented Apr 16, 2021

Hi Nico,

No worries :)

An easy and simple solution that I see in order to generate a first list of compounds is to work with the default medium used by gapseq. This would likely not represent natural conditions but that would be a first step quite fast to implement. That way you would already be able to compare the metabolism of your 4 species with m2m: do they roughly produce the same metabolites or not, how complementary to each other their metabolisms are...

As a second step, my advice would be to look for the macromolecules generally found in freshwater according to the literature and build a list of metabolites out of it. You can run several tests as computation is quite fast, and observing the differences in individual and collective metabolisms when varying the seed compounds is already informative by itself.

Clémence

@ntromas
Copy link
Author

ntromas commented Apr 16, 2021 via email

@cfrioux
Copy link
Owner

cfrioux commented Apr 16, 2021

Hi Nico,

Yes, precisely, you would have to generate a list of metabolites from the Bigg database then.
A good practice would be to check that every identifier you add in the seed file is a metabolite that occur in some or all of your models. Just to be sure.

For instance, if histidine is a seed, you would add to the list M_his__L_c to consider the cytosolic version of histidine as available, or M_his__L_e if you rather choose the extracellular version. In the latter case, be careful, sometimes transport reactions are not well inferred in automatically-reconstructed metabolic networks, and without a transport reaction to the cytosol, your histidine molecule would be useless (not usable by the internal metabolism).
If you check the presence of M_his__L_c in any of your models, you will find it and be confident that this seed can be used by them in some reactions.

Clémence

@ntromas
Copy link
Author

ntromas commented Apr 16, 2021 via email

@ntromas
Copy link
Author

ntromas commented Apr 16, 2021 via email

@ntromas
Copy link
Author

ntromas commented Apr 20, 2021

Hi Clemence,

I finally installed m2m and run m2m seeds. If I am correct, the input is just a tsv /txt files with a list of metabolites, conserving Bigg database names, right? I have also verified their presence in the model file.
I got this error:
AssertionError: Seed file is not in the correct format. Error with "M_cobalt2_c". Example of a correct ID is M_OXYGEN__45__MOLECULE_c. Rules = only numbers, letters or underscore in IDs, not starting with a number. One ID per line.
Any idea? Cause M_cobalt2_c is fine I think?
Thanks!

@ntromas
Copy link
Author

ntromas commented Apr 20, 2021

Got it! It was an issue within my file that generated unwanted characters.

Cheers

@cfrioux cfrioux closed this as completed Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants