Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on format for MIxS URIs - namespace #233

Closed
ramonawalls opened this issue Jan 13, 2020 · 27 comments
Closed

Decide on format for MIxS URIs - namespace #233

ramonawalls opened this issue Jan 13, 2020 · 27 comments
Assignees

Comments

@ramonawalls
Copy link
Collaborator

ramonawalls commented Jan 13, 2020

Go to this comment for solution: #233 (comment)

Although JSON does not strictly require term urls, much of what people need to do with mixs does (e.g., use mixs in linked data, use mixs terms in ontologies).

We discusses this topic at the CIG hackathon in Vienna in May.
Options include:

Use obo foundry purls
Make gensc purls
Make gensc URLs that are not purls (e.g., gensc.org/ns)
Keep namespace for terms in TDWG

Comment from @cmungall: Also https://w3id.org/

Comment from @lschriml: The last time we discussed this at the board level, there was a lot of support for:
Make gensc purls
--> Has this been discussed further on the CIG calls ?

Copy of GenomicsStandardsConsortium/mixs-ng#3

@ramonawalls ramonawalls self-assigned this Jan 13, 2020
@ramonawalls
Copy link
Collaborator Author

We have discussed this multiple times on the MIxS as RDF working group calls. Below is a summary:

  • Setting up our own PURL server is currently too much work.
  • GSC does not have the resources at the moment to maintain its own namespace server, so we are not in favor of hosting the terms at https://gensc.org
  • w3id requires the least amount of set up effort and gives us the most stability in the long term, which is very important.

Therefore, we recommend using URIs of the format https://w3id/gensc.org for MIxS terms and checklists.

I will run this by the full CIG group at our next call.

@ramonawalls
Copy link
Collaborator Author

We still need to decide on the specific namespaces for terms, checklists, packages, and cvs. We discussed this at the hackathon in Vienna, but I can't find the notes. @jdeck88 or @pbuttigieg did you write it down?

@ramonawalls
Copy link
Collaborator Author

I don't think we want a separate namespace for each checklist. In the interest of simplicity, I suggest

https://w3id/gensc.org/terms/ for individual terms
and
https://w3id/gensc.org/mixs/ for all checklist and packages

Individual terms would then have IDs like: https://w3id/gensc.org/terms/MIXS_000001 .

Packages would have URIs like https://w3id/gensc.org/mixs/migo.ttl of https://w3id/gensc.org/mixs/migo.xlsx.

Checklists would be similar https://w3id/gensc.org/mixs/human_gut.ttl`.

@ramonawalls ramonawalls changed the title Decide on format for MIxS PURLs Decide on format for MIxS URIs - namespace Feb 10, 2020
@lschriml
Copy link
Member

lschriml commented Feb 10, 2020 via email

@ramonawalls
Copy link
Collaborator Author

We also need to consider controlled vocabularies (as values for attribute terms). Since the CV terms are still just terms, I suggest using the https://w3id/gensc.org/terms/ namespace. If people are concerned that it would cause confusion, we could use https://w3id/gensc.org/terms/cvterms/ instead, but I think one namespace for all terms is sufficient. For example, in an ontology, we don't distinguish between classes, properties, and individuals by their namespace, but instead include a type designation.

As a namespace for the lists of CVs, I suggest https://w3id/gensc.org/mixs/cv/.

@ramonawalls
Copy link
Collaborator Author

ramonawalls commented Feb 28, 2020

@pbuttigieg, does it make sense to have a namespace for MIGO that is separate from MIxS.

In that case, MIxS packages and checklists would be in
https://w3id.org/gensc.org/mixs/

MIGO packages and checklists (is there more than one?) would be in
https://w3id.org/gensc.org/migo

and all terms (regardless of where they are used) would be in
https://w3id.org/gensc.org/terms/

@jdeck88
Copy link

jdeck88 commented Feb 28, 2020

I'm not sure MIGO needs a separate namespace.

@lschriml
Copy link
Member

lschriml commented Feb 28, 2020 via email

@ramonawalls
Copy link
Collaborator Author

ramonawalls commented Jul 13, 2020

Important! Use this one for terms, packages, and checklists!

Also, see below for packages and checklist version IRIs.

The actual format should not have .org in the IRI, so:

MIxS packages and checklists would be in
https://w3id.org/gensc/mixs/

Decision on 10 May 2022: We will use numerical IDs for each package and checklists. See the numbers at LINK. LinkML also makes the products of all packages and checklists. Let's make new IDs for those by concatonating the checklist number first then the package number, separated by an underscore.

(Old comment, no longer valid: MIGO packages and checklists (is there more than one?) would be in https://w3id.org/gensc/migo)

note: We may need to reconstruct some of the names to be sure there are no white spaces. They should all have dashes or underscores.

and all terms (regardless of where they are used) would be in
https://w3id.org/gensc/terms/

Term URIs should use the unique 7 digit string and follow this format:
https://w3id.org/gensc/terms/MIXS_0000001

@sujaypatil96 - for your reference

@ramonawalls
Copy link
Collaborator Author

I'm not sure MIGO needs a separate namespace.

I expect that genomic observatory packages would be quite different than MIxS packages. I think that is why we proposed this at GSC in Vienna. For now, we will start with registering MIxS, and we can add MIGO namespace later if the need arises.

@ramonawalls
Copy link
Collaborator Author

ramonawalls commented Aug 10, 2020

Important! Use this one for packages and checklists with versions.

Package and checklist URI namespace

CORE:

https://w3id.org/gensc/mixs/ - resolves to current version

https://w3id/gensc/mixs/vX/ - resolves specific versions

Continue to use numerical versions. Can add minor versions if needed.

Checklists:

https://w3id.org/gensc/mixs/checklist_name - resolves to current version

https://w3id.org/gensc/mixs/vX/checklist_name - resolves specific versions

Use acronyms.

Packages:

https://w3id.org/gensc/mixs/package/package_name - resolves to current version

https://w3id.org/gensc/mixs/package/vX/package_name/ - resolves specific versions

Use opaque IDs for these

@pbuttigieg
Copy link
Collaborator

pbuttigieg commented Aug 10, 2020

I expect that genomic observatory packages would be quite different than MIxS packages. I think that is why we proposed this at GSC in Vienna. For now, we will start with registering MIxS, and we can add MIGO namespace later if the need arises.

Yes, that's metadata about an observatory, rather than a sequence, so it will be quite different

Packages:

https://w3id.org/gensc/package/package_name - resolves to current version

https://w3id.org/gensc/package/vX/package_name/ - resolves specific versions

Use opaque IDs for these

These look good: in the packages, we can mix MIxS and MIGO terms, or terms from other dedicated namespaces.

@ramonawalls
Copy link
Collaborator Author

On Sep. 14, we agreed that package/checklist IDs should be numerical, seven digit.

@cmungall
Copy link
Contributor

I just discovered this today: http://rs.gbif.org/sandbox/extension/mixs_sample.xml

It seems at one point there was a vocabulary in use? E.g

these don't resolve. Whenever we set up the new system we should figure a way to make these resolve

@lschriml
Copy link
Member

lschriml commented Nov 20, 2020 via email

@ramonawalls
Copy link
Collaborator Author

ramonawalls commented Nov 20, 2020 via email

@ramonawalls
Copy link
Collaborator Author

I just filed gbif/rs.gbif.org#51 and GenomicsStandardsConsortium/mixs-rdf#31.

@mdoering
Copy link

Should this not be https://w3id.org/gensc/ ?
I cannot find any recommendations on using w3id as a top level domain name anywhere on: https://w3id.org/ and it would not be a valid IRI, would it? Instead w3id.org says:

There is not yet an official policy on identifier names. The current practice is to claim a top-level directory name and add project specific second level identifiers. For instance, https://w3id.org/PROJECT-ID/SUB-ID.... Shared top-levels are also available such as https://w3id.org/people/PERSON-ID. There is not yet an official list or policy for reserved identifiers. However, the administrators may deny requests for identifiers that are too generic, could cause confusion, are inappropriate or offensive, or otherwise may be needed for future service expansion.

@ramonawalls
Copy link
Collaborator Author

ramonawalls commented Nov 23, 2020

Thanks, @mdoering -- good catch! They should indeed be https with w3id.org as the top level. I will edit the comment above.

@ramonawalls
Copy link
Collaborator Author

I just discovered this today: http://rs.gbif.org/sandbox/extension/mixs_sample.xml

It seems at one point there was a vocabulary in use? E.g

these don't resolve. Whenever we set up the new system we should figure a way to make these resolve

This problem is covered in issue GenomicsStandardsConsortium/mixs-rdf#31

@wdduncan
Copy link

@ramonawalls just checking in ... have you put more thought into when you can register the MIXS namespace.

@ramonawalls
Copy link
Collaborator Author

@wdduncan I forgot that I actually registered with w3id.org four months ago. However, I think the issue here is when will the IDs resolve. We have been waiting for the release of MIxS6, but more packages keep coming, and it keeps getting delayed. I'm going to start a new issue to discuss this topic.

@kmexter

This comment was marked as off-topic.

@ramonawalls
Copy link
Collaborator Author

@kmexter please see issue #390 for resolving links to terms. I am hiding your comment here, since it belongs on #390.

@ramonawalls
Copy link
Collaborator Author

ramonawalls commented Feb 21, 2023

Background

This comment is about optimizing a user’s experience of accessing information about MixS terms via the LinkML auto-generated documentation pages.

Users shouldn’t encounter any broken links, and GSC should use namespaces that are clear and authoritative, but also provide flexibility for the future.

Management of these namespaces will involve
• changing the name of our GitHub Organization
• selecting a w3id namespace and editing our w3id redirection rules
• editing the prefixes and expansions in the LinkML YAML files

Decisions regarding the URLs for MIxS resources were made in 2020, before we started using LinkML seriously. Our knowledge and understanding have changed since then, and the documentation of the decisions is not completely clear. @turbomam and @ramonawalls met to try to clarify what resources a given URL should redirect to (what content should be associated with a given URL), and we make a new proposal below.

We have two layers of URLS - the verbose ones assigned by GitHub Pages and the terser ones available from w3id. Additionally, the LinkML files declare prefixes and their expansions. The GitHub Pages URLs are generated automatically based on the name of our GitHub organization and MIxS repo. Ideally, the LinkML prefixes would expand to w3id URLs which would be the official identifiers for elements of MIxS.

We had included a namespace for Minimum Information about a Genomic Observatory (MIGO) The MIGO prefix is not yet assigned to any checklists or terms and is not currently required. If we wish to use the https://w3id.org/gensc/mixs namespace, then those mixs.vocab and MIXS prefixes were wrong. They were just corrected in a PR based on branch #531. In that PR, @turbomam (with the blessing of @ramonawalls) changed it so we only have a single prefix for all MIxS resources, whether they are standards, terms, checklists, etc.

Feb. 2023 Proposal

I propose a solution that supports having future namespaces for standards or projects outside MIxS (e.g., for MIGO for global observatories) but is also easy to maintain and understand and works well with GitHub Pages and LinkML:

gensc/mixs namespace

The root namespace for MIxS would be

https://w3id.org/gensc/mixs/

Per issue #533, we will change the name of our Github organization to genomicstandardsconsortium (or GenomicStandardsConsortium). Therefore, GitHub pages will by necessity have the prefix https:// genomicstandardsconsortium.github.io/mixs/, which means that

https://w3id.org/gensc/mixs/

should redirect to

https:// genomicstandardsconsortium.github.io/mixs/ which is the home page for the MIxS standard.

The key difference from the current state (after PR #531) is that “gensc” would be added to the w3id redirect. This follows the w3id recommended practice of using “https://w3id.org/$org/$project/ and allows for the future prospect of including other GSC standards or projects, such as MIGO (minimum information about a genomic observatory).

MIxS checklists and extensions would use the same prefix. On 10 May 2022 we decided that checklists and extensions and all of their combinations would have numerical IDs. These numerical IDs have been assigned in the incomplete schemasheets branch but not in the current release yet. In the schemasheets branch, IDs for combinations of checklists and extensions are constructed by concatenating the checklist number first then the extension number, separated by an underscore.

Following this proposal, our LinkML yaml file should be edited to generate documentation pages for checklist, extensions, and combinations of the following format:

MIGS bacteria checklist: https:// genomicstandardsconsortium.github.io/mixs/0010003 (curie is MIXS:0010003)

Soil extension (aka environmental package): https:// genomicstandardsconsortium.github.io/mixs/0016012 (curie is MIXS:0016012)

Combination of MIGS bacteria and soil extension: https:// genomicstandardsconsortium.github.io/mixs/0010003_ 0016012

The shorter URLs generated from w3id for those three pages would be, respectively:

`https://w3id.org/gensc/mixs/0010003`
`https://w3id.org/gensc/mixs/0016012`
`https://w3id.org/gensc/mixs/0010003_ 0016012`

Our original proposal was that all terms, regardless of whether they are used in MIxS or some other (putative) standard should come within the w3id namespace https://w3id.org/gensc/terms/, so that each individual term ID would be of the form https://w3id.org/gensc/terms/$term_unique_string. However, github pages automatically generates URLs with the name of the organization and the repository. Although we could indeed use w3id URLs that contained gensc/terms, in order to have them redirect to corresponding documentation URLs generated by LinkML and GitHub pages or the same structure (i.e. https://genomicstandardsconsortium.github.io/terms/xxxxxxx’) we would need to move the term pages to their own repository (https://genomicstandardsconsortium/terms/’)). I don’t think it is worth the effort.

Per comments above, we expect the term sets for other standards that may arise to be quite different from MIxS (e.g., they would describe observatories, not samples or sequences), and they could still reuse/import terms from the MIxS namespace. Therefore, I think it is a more practical solution to simply use the MIxS namespace for MIxS terms. Under this proposal, URLs for terms (i.e. slots) would follow the same format as URLs for packages and extensions (i.e. classes).

For example, the w3id URL for geo_loc_name (curie MIXS:0000010) would be:

https://w3id.org/gensc/mixs/0000010

And the documentation generated by GitHub pages would be:

https://genomicstandardsconsortium.github.io/mixs/0000010.

Note a change from an earlier decision, where terms used URLs of the format https://w3id.org/gensc/terms/MIXS_0000001. Under the new plan, PURLs would follow the format https://w3id.org/gensc/mixs/0000001. In other words, we would change ‘terms’ to ‘mixs’ and we would not include MIXS_ in the prefix. I am open to whether or not we include MIXS_ in the URL for terms, but it seems unnecessary to me. (NOTE: during the TWG call on 2023-02-21, we decided not to include MIXS_)

Example of an actual term page for geolocation name (geo_loc_name):

https://genomicsstandardsconsortium.github.io/mixs/0000010/

or

http://w3id.org/gensc/mixs/0000010

@ramonawalls
Copy link
Collaborator Author

ramonawalls commented Feb 21, 2023

What needs to happen:

In LinkML, we should use the following prefix expansion rule:

MIXS: https://w3id.org/gensc/mixs/

Autogenerated documentation will make URLs for all MIxS element of the form:

https://genomicstandardsconsortium.github.io/mixs/0000010

for any element of the standard. These are the URLs that are embedded into the github pages.

In w3id.org, we need to register the prefix

https://w3id.org/gensc/mixs/ redirects to https://genomicstandardsconsortium.github.io/mixs/ (or GenomicStandardsConsortium, see #533)

Example:

For geo_loc_name

https://w3id.org/gensc/mixs/0000010 should redirect to https://genomicstandardsconsortium.github.io/mixs/0000010

This is a one way redirect, in that the short URLs are not directly in the github documentation. The URLs are listed as curies instead. However, since w3ids are shorter and redirectable, we can use these as the "official" URLs for MIxS elements in publications.

@ramonawalls
Copy link
Collaborator Author

We still need to decide how to implement this. I think it should be in our schemasheets, rather than continuing to hand edit https://github.com/GenomicsStandardsConsortium/mixs/blob/main/gsctools/mixs_converter.py. We will document that process in a different issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants