-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add descriptions and other annotations to extensions #769
add descriptions and other annotations to extensions #769
Conversation
aliases: | ||
- MIxS-Food (farm environment) | ||
annotations: | ||
use_cases: Microbiome of farm and field crops as well as environmental samples including irrigation, soil amendments, and farm equipment. | ||
FoodFoodProductionFacility: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make a separate issue for this.. but is it really foodfood? Why is food twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. That is not a typo or bug. In this case, it is helpful to look at the submitter-provided title
of this class: 'food-food production facility'. Pre-LinkML, an attempt was made to approximate hierarchy in the Extension
s, as if there is a category of "food" Extension
s, of which 'food production facility' is a child Extension
.
Following standard naming for LinkML classes, the title
'food-food production facility' has been transformed into a term name
of 'FoodFoodProductionFacility'.
I don't have any objection t changing that, but it should be done systematically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turbomam is correct, it was an attempt at hierarachy, the FDA were involved in designing a set of 4 extensions, so they grouped them all under a "Food" heading.
- MIxS-food-animal and animal feed
- MIxS-food-farm environment
- MIxS-food-food production facility
- MIxS-food-human foods
I'm not against renaming them, but there maybe parties that might be opposed to it, so I have opened a discussion ticket for it #779
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this file is, or how it relates to the updating of extension descriptions, but I trust its meant to be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See notes on assets/mixs-schemasheets-concise.tsv
assets/mixs-schemasheets-concise.tsv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this file is, or how it relates to the updating of extension descriptions, but I trust its meant to be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file, and the following status logs, are is generated by the the assets/mixs-schemasheets-concise.tsv target in project.Makefile
- mixs-schemasheets-concise-log.txt
- mixs-schemasheets-concise-report.txt
It is a tabular representation of almost everything in the MIxS schema. (It doesn't' include enumerations of permissible values).
- definitions of classes
- definitions of terms/slots
- assertions of how terms should be used within classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this file is, or how it relates to the updating of extension descriptions, but I trust its meant to be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments about assets/isolate_slots.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unable to view this as github says the diff is too large!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See notes on assets/mixs-schemasheets-concise.tsv
…e-improved-descriptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unable to voiew this as github says the diff is too large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added a project.Makefile target for regenerating it that illustrates how this file is generated.
LinkML provides two mechanisms for applying a regular expression constraint to the values of a term/slot.
- the
pattern
meta-slot, in which you have to provide the exact regular expression that you consider valid for that slot - the
structured_pattern
meta-slot, which allows you to compose the regular expression from previously-defined building blocks in the schemas'ssettings
section.
To see how the structured_pattern
s are converted to regular expressions, you need to run the schema through some kind of materializing generator.
assets/isolate_slots.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this file is, or how it relates to the updating of extension descriptions, but I trust its meant to be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script extracts the global definitions of MIxS terms/slots from assets/mixs-schemasheets-concise.tsv. "Global" mean that the output does not indicate the cases in which one Extension class may. It also does a little more convenience filtering:
- removes the schemasheet header rows that start with a
>
character - removes rows about classes
- removes rows about slots used by the purely infrastructural class
MixsCompliantData
, which nis used for validating tables of data that is believed to follow MIxS guidelines - removes any columns if they don't contain any data values
It creates assets/mixs-schemasheets-concise-global-slots.tsv
I have commented out some lines that can insert the filtered table back into a relational database. This is especially useful for NMDC but I am glad to teach other people how to use it.
assets/class_summary_results.tsv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this file is, or how it relates to the updating of extension descriptions, but I trust its meant to be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated project.Makefile
to demonstrate how this file is built.
The result is a file that we discussed a few weeks ago: a tabular representation of textual annotations on the MIxS classes (Checklists, Extensions and Combinations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can be regenerated as a convenient overview any time we make changes to the YAML source of truth. It also demonstrates a format we could use to add more textual annotations to the classes or terms, as an alternative to free-form GH issues or word processing documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor typos to correct, and a couple of files with changes that I'm not able to review as it says there are too many diffs to show?! not sure how to do review any other way.
…descriptions' of github.com:GenomicsStandardsConsortium/mixs into 646-classes-checklists-and-extensions-require-improved-descriptions
when attributes not materialized
Good questions @only1chunts I have replied, so you could re-review and choose accept or request changes as you see fit. |
Discussed in CIG meeting 2024-03-26 Thanks for the feedback. I will stick to smaller PRs next time! Followup actions:
|
No description provided.