Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KP Source provenance standards -- July 1 #48

Open
vgardner-renci opened this issue Jun 22, 2021 · 3 comments
Open

KP Source provenance standards -- July 1 #48

vgardner-renci opened this issue Jun 22, 2021 · 3 comments

Comments

@vgardner-renci
Copy link

No description provided.

@cbizon
Copy link
Collaborator

cbizon commented Jun 22, 2021

KPs will return provenance on their edges. This will include both information about the upstream source (e.g. HMDB) and about the KP (saying that it is passing data to the ARA).

These must be provided as TRAPI attributes on edges (following the TRAPI spec for attributes).

The particular edge properties are defined in the biolink model:

primary knowledge source:
     is_a: knowledge source
     description: >-
       The most upstream source of the knowledge expressed in an Association that an
       implementer can identify (may or may not be the 'original' source).
     range: information resource
     multivalued: false

  original knowledge source:
    is_a: primary knowledge source
    description: >-
      The Information Resource that created the original record of the knowledge expressed
      in an Association (e.g. via curation of the knowledge from the literature, or
      generation of the knowledge de novo through computation, reasoning, inference over
      data).
    range: information resource
    multivalued: false

  aggregator knowledge source:
    is_a: knowledge source
    description: >-
      An intermediate aggregator resource from which knowledge expressed in an Association was
      retrieved downstream of the original source, on its path to its current serialized form.
    range: information resource
    multivalued: true

The values of these attributes (iris for the sources and KPs) are being defined by the EPC group in a spreadsheet to be released shortly.

@mbrush please add documents to help KPs implement this correctly when they are available.

@cbizon
Copy link
Collaborator

cbizon commented Jul 2, 2021

@CaseyTa
Copy link
Collaborator

CaseyTa commented Jul 6, 2021

We're still looking for guidance on how to implement this for COHD and OpenPredict. It looks like @mbrush is intending to fill in a few more examples in the implementation guide.

For now, we've implemented COHD as follows, using both attributes for biolink:original_knowledge_source and biolink:supporting_data_source since we consider COHD to contain both the underlying co-occurrence data (the supporting data source) as well as the analysis on that data to create the edge (original knowledge source). @mbrush, does this fit your intention?

            {
                'attribute_type_id': 'biolink:original_knowledge_source',
                'value': 'infores:cohd',
                'value_type_id': 'biolink:InformationResource',
                'attribute_source': 'infores:cohd',
                'value_url': 'http://cohd.io/api/query'
            },
            {
                'attribute_type_id': 'biolink:supporting_data_source',
                'value': 'infores:cohd',
                'value_type_id': 'biolink:InformationResource',
                'attribute_source': 'infores:cohd',
                'value_url': 'http://cohd.io/api/'
            }

Update: Discussed with Matt at the July 8 TRAPI call, and the above pattern fits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants