Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Distribute iMicrobe sample descriptions as RDF #318

Open
cmungall opened this Issue May 9, 2016 · 8 comments

Comments

Projects
None yet
3 participants
Owner

cmungall commented May 9, 2016 edited by ramonawalls

Source:

https://raw.githubusercontent.com/hurwitzlab/imicrobe-lib/master/docs/mapping_files/CameraMetadata_ENVO_working_copy.csv

  • check on license issue; can we redistribute? what license @rlwalls2008
  • what P(?)URLs to use for SAMPLEACC? @rlwalls2008
  • decide on RO props for biome, material and feature @pbuttigieg @cmungall @rlwalls2008
  • write quick python script
  • decide where to distribute. This repo, hurwitzlab/imicrobe-lib, or a new repo. I think a new repo may be cleanest
  • decide on header/VOID description for dataset
Owner

cmungall commented May 9, 2016

For the distribution, I'll follow the globi model whereby we create a new repo for each dataset, see also https://github.com/cmungall/biocaddie-gym/

"decide on RO props for biome, material and feature"

Note that MIXS (now available as RDF) has properties for biome, material, and feature. Their properties are just http://www.w3.org/1999/02/22-rdf-syntax-ns#Property, so we would still probably want to make RO properties. This is a bigger issue we have defined for BCO, in terms of translating Darwin Core properties into OWL (BiodiversityOntologies/bco#10)

See:
https://github.com/pyilmaz/mixs
http://terms.tdwg.org/wiki/MIxS (not quite ready for prime time)

Note that mixs hasn't quite got the mappings right, as it says:

<rdf:Description rdf:about="http://gensc.org/ns/mixs/env_biome">
<owl:sameAs rdf:resource="http://purl.obolibrary.org/obo/ENVO_00000428"/>
/rdf:Description
<rdf:Description rdf:about="http://gensc.org/ns/mixs/env_feature">
<owl:sameAs rdf:resource="http://purl.obolibrary.org/obo/ENVO_00002297"/>
/rdf:Description
<rdf:Description rdf:about="http://gensc.org/ns/mixs/env_material">
<owl:sameAs rdf:resource="http://purl.obolibrary.org/obo/ENVO_00010483"/>

which can't be true b/c mixs terms are propoerties (@pyilmaz @tucotuco - I will file an issue).

Github doesn't like to display rdf as code. See pyilmaz/mixs#1

Owner

cmungall commented May 10, 2016 edited

Understood.

Are the domains always samples or some kind of material entity?

The RO object properties would be labeled more like verb phrase, e.g.

  • has biome (subprop of located in?)
  • in environmental determined by feature (property chain of located-in o determined-by

But strictly speaking the sample isn't located in, so there would need to be an intermediate derived from

Might be easier to work back from the triples, something like this?

S derived-from E
E part-of some $biome
E determined-by some $feature
E has-part some $material

Where E would either be a blank node or a skolemized term

The imicrobe-lib archive is now licensed under a GPLv3 LICENSE

Owner

cmungall commented May 11, 2016

I added a statement to the license doc that the code on the repo (which is most of the content) is GPLv3 but the data in the docs folder is CC0. See https://github.com/hurwitzlab/imicrobe-lib/blob/master/LICENSE
@kyclark - tell me if this is a problem.

Owner

pbuttigieg commented May 18, 2016

@cmungall

decide on RO props for biome, material and feature

biome

As we're dealing with a system, it would be ideal to represent the degree of integration of a sample/organism (component or set of components) and the biome (the system) it was observed/collected in. Any entity in the biome is, to some degree, causally integrated into it; perhaps this makes it part of that biome by default (although the strength of this parthood would be variable).

I wouldn't use a relation like has biome, as this is a bit too committed (think has habitat) - the sample may just be in a biome by chance or be found near some feature that makes the biome less relevant (e.g. a deep cave, an active lava seep, etc).

At this stage, the MIxS community probably just want to state what the broad ecosystemic context is (right @pyilmaz?). If we could have a material entity to system relation like embedded in (quite a lot like part of, but more about causal integration of components) that may work. If part of can work for components and systems I'm fine with that too.

This system/component thinking will be very useful for SDGIO too.

feature

In this case, I think the triple approach works well. The sample isn't determined by the feature, but exists in an environment that a sampler or observer has asserted to be determined by said feature. If we were to use a direct relation between the sample and the feature, it would be something like: S part-of-environment-determined-by F.

Note that F can be any material entity, as ENVO's environmental feature is more like a disposition.

material

S surrounded by or partially surrounded by would work here. Not sure if MIxS would want to specify which one of those relations is best. It could be automated by annotation tool developers such that if there are >1 Ms, partially surrounded by will be used. If there's only one M, then (surrounded by or partially surrounded by) would be used (the partial surrounding is still in there as the submitter may not have been exhaustive).

S derived-from E
E part-of some $biome
E determined-by some $feature
E has-part some $material

This looks reasonable.
I wonder if the E would have to be 'part of' (~causally integrated in a system sense) of a biome. An E is always part of a bigger E*, but is it always part of a B? Is the inside of my fridge a part of a temperate urban biome? I see arguments both ways but am inclined to say yes (thus using part-of or embedded-in) to avoid Platonic forms.

Also, there also may be more than one feature determining the E. We could also more closely link the S to the $material in question. I spoke to @pyilmaz about this some time back and there's nothing preventing more than one class being used in any field.

S derived-from E
E part-of some $biome
E determined-by min 1 $feature
E has-part min 1 $material
S (surrounded-by or partially-surrounded-by) some $material

I've mixed in some cardinality to the RDF for brevity: these would just expand to their own distinct triples.

Are the domains always samples or some kind of material entity?

Never come across a non-material one yet.

cmungall referenced this issue in hurwitzlab/imicrobe-lib Jul 14, 2017

Open

CameraMetadata_ENVO_working_copy and iMicrobe site #1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment