feat: addition of `metadata` section to the yaml file specification in RAVEN #311

haowang-bioinfo · 2020-07-27T21:51:00Z

Description of the issue:

This issue propose to include a metadata section to the yaml file specification in RAVEN
- Previously, a metadata section was introduced to the tailored yaml file in Human-GEM serving for the requirements of MetabolicAtlas, as detailed in issue #71. After continuous development and evolvement, this section functions pretty well in providing relevant information for GEM-type repo (e.g. Human-GEM), GEM archive MetabolicAtals, as well as the research community.

Expected changes:

Adjust RAVEN model spec with following changes:
- Add new field version
- ~~Change field from description to fullName~~
- Modify subfields of annotation field
  - adding subfields sourceUrl
  - combining givenName and familyName into authors
  - ~~changing subfield from note to description~~
Adapt writeYaml function to enable the exporting of metadata information from fields id, ~~fullName~~name, version and annotation

I hereby confirm that I have:

Followed the guidelines to install RAVEN.
Checked that a similar issue does not already exist

The text was updated successfully, but these errors were encountered:

BenjaSanchez · 2020-07-28T15:46:21Z

For additional context, below the current metaData field in Human-GEM:

- metaData:
    short_name: "Human-GEM"
    full_name: "Generic genome-scale metabolic model of Homo sapiens"
    version: "1.4.0"
    date: "2020-06-12"
    authors: "Jonathan Robinson, Hao Wang, Pierre-Etienne Cholley, Pinar Kocabas"
    email: "jonrob@chalmers.se"
    organization: "Chalmers University of Technology"
    taxonomy: "9606"
    github: "https://github.com/SysBioChalmers/Human-GEM"
    description: "Genome-scale metabolic models are valuable tools to study metabolism and provide a scaffold for the integrative analysis of omics data. This is the latest version of Human-GEM, which is a genome-scale metabolic model of a generic human cell. The objective of Human-GEM is to serve as a community model for enabling integrative and mechanistic studies of human metabolism."

The new fields + modifications sound good to me. Additionally, it would be ideal if the field names in the yaml file match with the RAVEN spec names, for clarity. Below the cases that don't match based on what is already in RAVEN + the new names @Hao-Chalmers proposed:

Field	Name in RAVEN	Name in `HumanGEM.yml`
Model id	`id`	`short_name`
Model name	`description`	`full_name`
Authors	`annotation.authorList`	`authors`
URL where the model lives	`annotation.sourceUrl`	`github`
Additional comments	`annotation.note`	`description`

IMO the RAVEN names for id and URL would be preferable, as the former is the main choice in the COBRA community (Matlab and Python), and the latter is more generic, as not all RAVEN models are stored in Github. Could those 2 fields change in HumanGEM.yml to id and source_url? @JonathanRob @mihai-sysbio

On the other side, the .yml standard seems more adequate for model name, authors and comments (actually it's super confusing that the RAVEN field description is the model name and the field note contains a description). Would it make sense to change those 3 fields in RAVEN to fullName, annotation.authors and annotation.description?

edkerk · 2020-07-28T21:28:27Z

Are their corresponding (or comparable) COBRA fields for fullName, annotation.authors and annotation.description?

mihai-sysbio · 2020-07-29T07:30:44Z

Here are the latest yml fields are listed on COBRApy's devel branch. Imho, it doesn't look like a direct mapping of the RAVEN fields.
Cobratoolbox has some rules for modelVersion, modelName and modelID.

The short-name is something meant to be as human-friendly as possible. For example, this field is what is shown in the navigation bar on Metabolic Atlas. I found this opencobra thread illustrative of the implications of the BiGG model id spec. Also, I would like to point out the distinct fields for short-name and version. To me, it is of little importance what the keyword for the value of short-name is. However, I am an advocate for its role: human-friendliness. Therefore, I would lean towards keeping this field closer to the standard-GEM naming rather than the BiGG id spec. Needless to say, in the case of versioned models, it is expected of this short-name to be the same as the repository name.

I support changing github to something else. A potential drawback of the source_url is that, as a new person, I could find it confusing if it meant to be the link to the repository, or directly to the file itself on a model hosting platform. But maybe that's just me - and I can't come up with a better suggestion than source_url.

haowang-bioinfo · 2020-08-02T20:28:53Z

@BenjaSanchez the Expected changes of this issue had been updated as you recommended.

haowang-bioinfo · 2020-08-04T19:03:28Z

@edkerk according to the latest model spec in COBRA, the following four fields could be associated between RAVEN and COBRA.

Field	Name in RAVEN	Name in COBRA
Model id	`id`	`modelID`
Model name	`name`	`modelName`
Model version	`version`	`modelVersion`
Additional comments	`annotation.note`	`description`

mihai-sysbio · 2020-08-10T14:54:53Z

@Hao-Chalmers would the Expected changes also include something about the shortName field?

haowang-bioinfo · 2020-08-10T15:08:41Z

@mihai-sysbio I don't think an additional shortName field is needed, since it is equivalent to the exiting id field. Or are you suggesting renaming field from id to shortName?

mihai-sysbio · 2020-08-11T06:23:28Z

I see. To me, an ID does not have to be human friendly, unlike shortName. I think it would be clearer if some examples would be provided, maybe even both "good" and "bad". For example, a "bad" id would be h_sap13417__1_3_0, standing for Homo Sapiens model with 13417 reactions and corresponding to version 1.3.0.

haowang-bioinfo · 2020-08-11T19:46:28Z

@mihai-sysbio good point in providing examples, which can be both added to the spec in Wiki once a consensus is reached.

edkerk · 2021-04-07T13:38:54Z

So should HumanGEM's writeHumanYaml be integrated in RAVEN's writeYaml, thereby capturing this metadata?

haowang-bioinfo · 2021-04-07T14:09:42Z

So should HumanGEM's writeHumanYaml be integrated in RAVEN's writeYaml, thereby capturing this metadata?

@edkerk full support

edkerk · 2021-04-07T21:03:45Z

It is not sufficient to just define fields in the RAVEN model structure, and support export to YML file format. SBML is still the de facto standard for model distribution, so these fields should also be properly stored there.

Related to this there are some unresolved issues:

If we introduce version, where is this stored in the SBML file? As far as I can find, this is not covered by the SBML specification. I see two options:
1. The version number can be appended to the model id, e.g. yeastGEM_v8_4_2. Beneficial is that this is also loaded when using cobrapy or COBRA toolbox. However, would we then split the model id from SBML into two parts: (1) model.id and (2) model.version? In that case the model would have different model ids in RAVEN contrasting to cobrapy, COBRA etc. To avoid problems, I would prefer not to run regexprep on any identifier.
2. Include version number in the SBML as model annotation, in a similar way as taxonomy, authors, organization etc. are included. See example below. However, I don't know what tags to use, something related to <rdf>? Does standard-GEM have a role to play in this?

Example from iYali, model annotation given from line 4.

<?xml version="1.0" encoding="UTF-8"?>
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" xmlns:fbc="http://www.sbml.org/sbml/level3/version1/fbc/version2" xmlns:groups="http://www.sbml.org/sbml/level3/version1/groups/version1" level="3" version="1" fbc:required="false" groups:required="false">
  <model metaid="iYali" id="iYali" name="iYali" fbc:strict="true">
    <annotation>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
        <rdf:Description rdf:about="#iYali">
          <dcterms:creator>
            <rdf:Bag>
              <rdf:li rdf:parseType="Resource">
                <vCard:N rdf:parseType="Resource">
                  <vCard:Family>Kerkhoven</vCard:Family>
                  <vCard:Given>Eduard</vCard:Given>
                </vCard:N>
                <vCard:EMAIL>eduardk@chalmers.se</vCard:EMAIL>
                <vCard:ORG rdf:parseType="Resource">
                  <vCard:Orgname>Chalmers University of Technology</vCard:Orgname>
                </vCard:ORG>
              </rdf:li>
            </rdf:Bag>
          </dcterms:creator>
          <dcterms:created rdf:parseType="Resource">
            <dcterms:W3CDTF>2021-04-05T10:27:05Z</dcterms:W3CDTF>
          </dcterms:created>
          <dcterms:modified rdf:parseType="Resource">
            <dcterms:W3CDTF>2021-04-05T10:27:05Z</dcterms:W3CDTF>
          </dcterms:modified>
          <bqbiol:is>
            <rdf:Bag>
              <rdf:li rdf:resource="https://identifiers.org/taxonomy/4952"/>
            </rdf:Bag>
          </bqbiol:is>
        </rdf:Description>
      </rdf:RDF>
    </annotation>

If id is used instead of shortName [and I would argue we should, as id is similar to modelID, model.id and <model id=""> as used in COBRA, cobrapy and SBML], then why use fullName and not just name? The latter is also more in line with other software and the SBML specification.
In humanGEM.yml, date is also specified, should this be part of the RAVEN model structure? And what does this date reflect, when a new version was released? RAVEN generated SBML already includes the date that the file was created, but that's probably not what is meant here. Instead, the date should be set when the new version number is set, and absent if no version number is present?
Where should sourceUrl be stored in the SBML? Also in annotation, as the second suggestion for version?
Note that description is not problematic to store in the SBML, it is actually stored under <notes>. With that in mind, why change note to description? cobrapy has model.notes, and it is closer to the SBML specification.

haowang-bioinfo · 2021-04-08T09:21:51Z

@edkerk good arguments indeed.

@mihai-sysbio what do you think, if standard-GEM can help in adopting some fields?

edkerk · 2021-04-08T11:45:45Z

On second thought, perhaps it is better to move the discussion about incorporation in SBML into a separate issue, as the current issue is just about the MATLAB structure and the yaml file.
The points that remain relevant are:
2. Have a model.name field instead of model.fullName.
5. Have a model.annotation.note field instead of model.annotation.description.

mihai-sysbio · 2021-04-09T06:45:21Z

@Hao-Chalmers it would make a lot of sense to standardize (and validate) that the yml file has these fields. However, as @edkerk pointed out, maintaining compatibility with existing formats is tricky (1.ii), especially the newly added fields are to be parsed by other tools as well.

To me, the easiest path forward is what @edkerk suggested above:

current issue is just about the MATLAB structure and the yaml file

I would like to emphasize the different use cases for model.short_name and model.full_name. Here is how Metabolic Atlas uses these fields:

    "short_name": "Yeast-GEM",
    "full_name": "Consensus genome-scale metabolic model of Saccharomyces cerevisiae",
    "description": "Consensus genome-scale metabolic model of Saccharomyces cerevisiae. It is the continuation of the legacy project yeastnet",
    "version": "8.4.2",

Luckily, this GEM has a nice model.id, but it's just a coincidence that it is readable. The model.id could well have been yeastGEM_v8_4_2. Since it is an identifier, it will not be parsed into anything readable or worth presenting on a website.

haowang-bioinfo · 2021-04-09T10:48:24Z

@edkerk @mihai-sysbio I adjusted the Expected changes of this issue according to your input.

edkerk · 2021-04-09T20:31:54Z

writeYaml (5418e88) and the model fields definition (Wiki) are changed according to the discussion here, with the following exception:

givenName and familyName remain as (non-mandatory) fields, while authors is an additional (non-mandatory) field. This is to ensure backwards compatibility, as givenName and familyName are actually coded in the SBML, and authors is not, while their meaning is not identical (givenName and familyName would match organization and email, while for authors this is ambigious).
also other subfields of model.annotation (defaultLB, defaultUB) are included as metaData in the yaml file.
by default writeYaml no longer sorts the identifiers (it used to do this, while writeHumanYaml doesn't, probably best to keep the identifier order by default).

Renaming model.description to model.name additionally required small refactoring of 23 files (fe7d417). As this breaks backwards compatibility with models that would already have been loaded in MATLAB, I suggest these changes result in release 2.5.0 instead of 2.4.4.

haowang-bioinfo added discussion Not yet settled whether change in code is required. enhancement Possible enhancement that should be considered for future versions. feature A new function or new functionality for an existing function labels Jul 27, 2020

mihai-sysbio mentioned this issue Apr 7, 2021

feat: yaml worflow SysBioChalmers/Human-GEM#173

Merged

2 tasks

edkerk mentioned this issue Apr 7, 2021

support for export to cobrapy-compatible yaml format #77

Open

4 tasks

edkerk added wip work in progress and removed discussion Not yet settled whether change in code is required. labels Apr 7, 2021

edkerk added this to the 2.4.4 milestone Apr 7, 2021

This was referenced Apr 9, 2021

fix: use rev field in constructEquation, metaData in writeYaml, model.description to model.name and correct parse taxonomy URL #338

Merged

RAVEN 2.5.0 #340

Merged

edkerk removed the wip work in progress label May 16, 2021

edkerk closed this as completed May 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: addition of `metadata` section to the yaml file specification in RAVEN #311

feat: addition of `metadata` section to the yaml file specification in RAVEN #311

haowang-bioinfo commented Jul 27, 2020 •

edited

Loading

BenjaSanchez commented Jul 28, 2020

edkerk commented Jul 28, 2020

mihai-sysbio commented Jul 29, 2020 •

edited

Loading

haowang-bioinfo commented Aug 2, 2020

haowang-bioinfo commented Aug 4, 2020 •

edited

Loading

mihai-sysbio commented Aug 10, 2020

haowang-bioinfo commented Aug 10, 2020 •

edited

Loading

mihai-sysbio commented Aug 11, 2020 •

edited

Loading

haowang-bioinfo commented Aug 11, 2020 •

edited

Loading

edkerk commented Apr 7, 2021

haowang-bioinfo commented Apr 7, 2021 •

edited

Loading

edkerk commented Apr 7, 2021

haowang-bioinfo commented Apr 8, 2021

edkerk commented Apr 8, 2021

mihai-sysbio commented Apr 9, 2021 •

edited

Loading

haowang-bioinfo commented Apr 9, 2021

edkerk commented Apr 9, 2021

feat: addition of metadata section to the yaml file specification in RAVEN #311

feat: addition of metadata section to the yaml file specification in RAVEN #311

Comments

haowang-bioinfo commented Jul 27, 2020 • edited Loading

Description of the issue:

Expected changes:

BenjaSanchez commented Jul 28, 2020

edkerk commented Jul 28, 2020

mihai-sysbio commented Jul 29, 2020 • edited Loading

haowang-bioinfo commented Aug 2, 2020

haowang-bioinfo commented Aug 4, 2020 • edited Loading

mihai-sysbio commented Aug 10, 2020

haowang-bioinfo commented Aug 10, 2020 • edited Loading

mihai-sysbio commented Aug 11, 2020 • edited Loading

haowang-bioinfo commented Aug 11, 2020 • edited Loading

edkerk commented Apr 7, 2021

haowang-bioinfo commented Apr 7, 2021 • edited Loading

edkerk commented Apr 7, 2021

haowang-bioinfo commented Apr 8, 2021

edkerk commented Apr 8, 2021

mihai-sysbio commented Apr 9, 2021 • edited Loading

haowang-bioinfo commented Apr 9, 2021

edkerk commented Apr 9, 2021

feat: addition of `metadata` section to the yaml file specification in RAVEN #311

feat: addition of `metadata` section to the yaml file specification in RAVEN #311

haowang-bioinfo commented Jul 27, 2020 •

edited

Loading

mihai-sysbio commented Jul 29, 2020 •

edited

Loading

haowang-bioinfo commented Aug 4, 2020 •

edited

Loading

haowang-bioinfo commented Aug 10, 2020 •

edited

Loading

mihai-sysbio commented Aug 11, 2020 •

edited

Loading

haowang-bioinfo commented Aug 11, 2020 •

edited

Loading

haowang-bioinfo commented Apr 7, 2021 •

edited

Loading

mihai-sysbio commented Apr 9, 2021 •

edited

Loading