Add oral caries and dysbiosis community models#32
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new ORAL community category and introduces five curated oral microbiome/dental biofilm community exemplars (caries/dysbiosis), along with cached PubMed abstracts to support reference validation.
Changes:
- Extend
CommunityCategoryEnumwithORALand regenerate the LinkML Python datamodel. - Add 5 new oral community YAML records under
kb/communities/. - Add cached PubMed abstract text files for the 5 supporting PMIDs under
references_cache/.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/communitymech/schema/communitymech.yaml | Adds ORAL to CommunityCategoryEnum in the LinkML schema. |
| src/communitymech/datamodel/communitymech.py | Regenerated datamodel to include the new enum permissible value. |
| kb/communities/Early_Dental_Biofilm_FiveSpecies.yaml | New engineered five-species early dental biofilm model exemplar. |
| kb/communities/Defined_Multispecies_Enamel_Caries_Model.yaml | New defined multispecies enamel caries model exemplar. |
| kb/communities/SMutans_CAlbicans_ECC_Biofilm.yaml | New S. mutans–C. albicans ECC dual-species biofilm exemplar. |
| kb/communities/SMutans_SSputigena_ECC_Pathobiont.yaml | New S. mutans–S. sputigena ECC pathobiont exemplar. |
| kb/communities/SMutans_VParvula_ASC_Biofilm.yaml | New S. mutans–V. parvula adult severe caries exemplar. |
| references_cache/pmid_21966490.txt | Cached abstract used by reference validation for the 5-species model. |
| references_cache/pmid_23446436.txt | Cached abstract used by reference validation for the enamel caries model. |
| references_cache/pmid_24566629.txt | Cached abstract used by reference validation for the S. mutans–C. albicans model. |
| references_cache/pmid_37217495.txt | Cached abstract used by reference validation for the S. mutans–S. sputigena model. |
| references_cache/pmid_39345197.txt | Cached abstract used by reference validation for the S. mutans–V. parvula model. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for metagenomic analysis. The acidification, aciduricity, oxidative stress | ||
| tolerance, and gtf (glucosyltransferase) gene expression of S. mutans cocultured | ||
| with V. parvula which was identified as ASC-related dominant bacterium. The | ||
| biofilm formation and extracellular exopolysaccharide (EPS) synthesis of |
808700c to
59cad73
Compare
realmarcin
added a commit
that referenced
this pull request
May 25, 2026
…rict + write_validated_community + record_curation_event + audit_writers) (#84) * Port audit machinery from CultureMech: schema extension + validate_strict + write_validated_community + record_curation_event + audit_writers Brings CommunityMech to parity with the audit-machinery ports recently landed in CultureMech (source), MediaIngredientMech (#32), and TraitMech (#76). CommunityMech is the last sibling; the lift is larger than MIM / TraitMech because the schema did not yet define CurationEvent or curation_history. Schema additions (additive, no migration needed): - New CurationEvent class with timestamp / curator / action / changes / llm_assisted attributes, mirroring the shape used by sibling Mech repos so cross-repo tooling reads curation events uniformly. - New curation_history slot on MicrobialCommunity, multivalued + inlined + optional. Existing community YAMLs continue to validate without modification. - src/communitymech/datamodel/communitymech.py regenerated (just gen-python). New helpers: - src/communitymech/validation/write_validated.py — write_validated_community() refuses to dump a MicrobialCommunity that fails closed-schema LinkML validation; raises ValidationFailedError. Single-root-class schema so no target_class routing needed. Default yaml opts match the repo's existing emission convention (default_flow_style=False, sort_keys=False, allow_unicode=True, width=120, indent=2) so existing files roundtrip byte-identically. - src/communitymech/curate/curation_event.py — record_curation_event() is the standard helper for appending a CurationEvent to doc['curation_history']. Schema-aligned signature; whole-second + Z suffix timestamps; skip_if_recent support for idempotent re-runs. New scripts: - scripts/validate_strict.py — strict closed-schema parallel walk of kb/communities/ (with backups/ + snapshots/ excluded). Emits reports/instance_validation_failures.tsv categorized by error class, exits non-zero on ERROR. Strictly stronger than the per-file linkml-validate loop in just validate-all (open-mode, swallows exit codes). - scripts/audit_writers.py — inventory of every YAML-writing module under scripts/ + src/communitymech/, flags whether each script validates before writing and appends a curation_history event. Writer conversions (5 of ~15): - scripts/add_community_ids.py (action=ASSIGN_COMMUNITY_ID; also gained a --dry-run safeguard it lacked before) - scripts/apply_pmc_conversions.py (action=CONVERT_PMC_TO_PMID) - scripts/fix_network_integrity.py (action=FIX_NETWORK_INTEGRITY) - scripts/link_growth_media.py (action=LINK_GROWTH_MEDIA) - src/communitymech/network/llm_repair.py (action=LLM_REPAIR_APPLIED, llm_assisted=True) Each one was wrapped in try/except ValidationFailedError on the write call so one bad record can't kill a batch run. Existing CLI surfaces preserved. Justfile: - New validate-strict + audit-writers recipes. - qc composite extended to include validate-strict. Baseline: - just validate-strict — 265 files, 0 ERROR rows (clean). - just audit-writers — 15 writers; 5 now validate before write + append curation_history. The other 10 are flagged in the TSV as future-work conversions (apply_strain_designations, apply_taxonomy_corrections, apply_suggested_fixes / suggested_snippets, backfill_metals, batch_snippet_fixer, clean_metals_inplace, curate_evidence_with_pdfs, enhance_strain_data, fix_invalid_snippets, fix_reference_formats, intelligent_snippet_fixer, etc.) — converting them follows the same pattern as the 5 above. - pytest tests/ — 136 passed, 9 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address Copilot review on PR #84 5 findings, all real and addressed: - scripts/apply_pmc_conversions.py + scripts/link_growth_media.py (both process_single_community and process_all_communities paths): all three scripts rename the source file to a `.bak` backup before writing the validated result. Previously, if write_validated_community raised ValidationFailedError the handler only logged and continued — leaving the original path missing on disk (only the .bak existed). Now restore the backup on validation failure before logging. - scripts/audit_writers.py: replace the substring check for `wired_into_just` with a per-line check that ignores comments and requires a word-boundary match on the full filename. The previous check was a false positive when a justfile comment merely mentioned the filename — e.g. write_validated.py matched the justfile comment referencing write_validated_community(). Drops the wired-into-just count from 3 (with false positives) to 1 (genuine: link_growth_media). - scripts/add_community_ids.py: guard against running on already-IDed YAMLs. The previous flow built `{"id": community_id}.update(data)`, which silently retained the source file's existing id while the curation event still recorded "Assigned id=<new>" — a misleading audit entry. Skip such files with an explanatory log line instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds oral microbiome exemplars focused on dysbiosis and caries, plus a small schema extension to classify them cleanly.
Included communities:
Schema/datamodel changes:
ORALtoCommunityCategoryEnumEvidence support:
Validation
Passed:
just validatefor all 5 new community YAMLsjust validate-termsfor all 5 new community YAMLsjust validate-referencesfor all 5 new community YAMLsTests
just testis not clean on this branch, but the failures appear pre-existing and unrelated to this change set:tests/test_llm_client.pyanthropicpackage / tests attempting to patchanthropic.Anthropic