Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add descriptions and other annotations to extensions #769

Conversation

turbomam
Copy link
Member

No description provided.

@turbomam turbomam changed the title extension descriptions etc. add descriptions and other annotations to extensions Mar 5, 2024
aliases:
- MIxS-Food (farm environment)
annotations:
use_cases: Microbiome of farm and field crops as well as environmental samples including irrigation, soil amendments, and farm equipment.
FoodFoodProductionFacility:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make a separate issue for this.. but is it really foodfood? Why is food twice?

Copy link
Member Author

@turbomam turbomam Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. That is not a typo or bug. In this case, it is helpful to look at the submitter-provided title of this class: 'food-food production facility'. Pre-LinkML, an attempt was made to approximate hierarchy in the Extensions, as if there is a category of "food" Extensions, of which 'food production facility' is a child Extension.

Following standard naming for LinkML classes, the title 'food-food production facility' has been transformed into a term name of 'FoodFoodProductionFacility'.

I don't have any objection t changing that, but it should be done systematically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turbomam is correct, it was an attempt at hierarachy, the FDA were involved in designing a set of 4 extensions, so they grouped them all under a "Food" heading.

  • MIxS-food-animal and animal feed
  • MIxS-food-farm environment
  • MIxS-food-food production facility
  • MIxS-food-human foods

I'm not against renaming them, but there maybe parties that might be opposed to it, so I have opened a discussion ticket for it #779

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this file is, or how it relates to the updating of extension descriptions, but I trust its meant to be here?

Copy link
Member Author

@turbomam turbomam Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this file is, or how it relates to the updating of extension descriptions, but I trust its meant to be here?

Copy link
Member Author

@turbomam turbomam Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file, and the following status logs, are is generated by the the assets/mixs-schemasheets-concise.tsv target in project.Makefile

  • mixs-schemasheets-concise-log.txt
  • mixs-schemasheets-concise-report.txt

It is a tabular representation of almost everything in the MIxS schema. (It doesn't' include enumerations of permissible values).

  • definitions of classes
  • definitions of terms/slots
  • assertions of how terms should be used within classes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this file is, or how it relates to the updating of extension descriptions, but I trust its meant to be here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments about assets/isolate_slots.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unable to view this as github says the diff is too large!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unable to voiew this as github says the diff is too large.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added a project.Makefile target for regenerating it that illustrates how this file is generated.

LinkML provides two mechanisms for applying a regular expression constraint to the values of a term/slot.

  • the pattern meta-slot, in which you have to provide the exact regular expression that you consider valid for that slot
  • the structured_pattern meta-slot, which allows you to compose the regular expression from previously-defined building blocks in the schemas's settings section.

To see how the structured_patterns are converted to regular expressions, you need to run the schema through some kind of materializing generator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this file is, or how it relates to the updating of extension descriptions, but I trust its meant to be here?

Copy link
Member Author

@turbomam turbomam Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script extracts the global definitions of MIxS terms/slots from assets/mixs-schemasheets-concise.tsv. "Global" mean that the output does not indicate the cases in which one Extension class may. It also does a little more convenience filtering:

  • removes the schemasheet header rows that start with a > character
  • removes rows about classes
  • removes rows about slots used by the purely infrastructural class MixsCompliantData, which nis used for validating tables of data that is believed to follow MIxS guidelines
  • removes any columns if they don't contain any data values

It creates assets/mixs-schemasheets-concise-global-slots.tsv

I have commented out some lines that can insert the filtered table back into a relational database. This is especially useful for NMDC but I am glad to teach other people how to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this file is, or how it relates to the updating of extension descriptions, but I trust its meant to be here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated project.Makefile to demonstrate how this file is built.

The result is a file that we discussed a few weeks ago: a tabular representation of textual annotations on the MIxS classes (Checklists, Extensions and Combinations)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be regenerated as a convenient overview any time we make changes to the YAML source of truth. It also demonstrates a format we could use to add more textual annotations to the classes or terms, as an alternative to free-form GH issues or word processing documents.

Copy link
Member

@only1chunts only1chunts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor typos to correct, and a couple of files with changes that I'm not able to review as it says there are too many diffs to show?! not sure how to do review any other way.

…descriptions' of github.com:GenomicsStandardsConsortium/mixs into 646-classes-checklists-and-extensions-require-improved-descriptions
@turbomam
Copy link
Member Author

turbomam commented Mar 19, 2024

Good questions @only1chunts

I have replied, so you could re-review and choose accept or request changes as you see fit.

@turbomam
Copy link
Member Author

Discussed in CIG meeting 2024-03-26

Thanks for the feedback. I will stick to smaller PRs next time!

Followup actions:

  • integrate contributor and reference knowledge from
  • work on separate release artifact directory (just a renaming of project/?) What files are required, in terms of structure and content?
  • come to an agreement on structured formats for sharing attributes of entities (like Extensions). Something other than Google (word processing) Docs

@turbomam turbomam merged commit 83b4dc1 into main Mar 26, 2024
2 checks passed
@turbomam turbomam deleted the 646-classes-checklists-and-extensions-require-improved-descriptions branch March 26, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

classes (Checklists and Extensions) require improved descriptions
4 participants