Skip to content

Resolve canonical examples for the 30 resolvable remaining traits (corpus 312/312 microbial+cited)#138

Merged
realmarcin merged 1 commit into
mainfrom
claude/canonical-examples-remaining
Jun 29, 2026
Merged

Resolve canonical examples for the 30 resolvable remaining traits (corpus 312/312 microbial+cited)#138
realmarcin merged 1 commit into
mainfrom
claude/canonical-examples-remaining

Conversation

@realmarcin

Copy link
Copy Markdown
Contributor

What

A deep-research pass over the traits that still lacked canonical examples, answering "what about the remaining traits?"

Triage of the 173 CLASS traits without examples

~143 genuinely cannot take a single-organism example and are intentionally left empty:

  • numerical/value bins: ph_optimum_mid1, temperature_range_*, *_delta_*, gc_*, genome_size, all *_phenotype_with_numerical_limits
  • the 20 observation/ measurement classes + quantitative_property/
  • abstract parents: pigmentation, gram_stain, cell_shape, metabolism, *_preference, habitat_association
  • biosafety_level_4/5 (no standard bacterial exemplar)

(The 94 OBJECT_PROPERTY + 7 DATATYPE_PROPERTY relational predicates — uses_as_carbon_source, has_observation, does_not_* — likewise have no organism exemplar.)

The ~30 resolvable phenotype classes → 1–2 representative microbes each

fine-grained trophic types (chemo/litho/photo-litho/photo-organo/phototrophic, chemoheterotrophic, organotrophic, copiotrophic); distinctive shapes/pigments (curved, spindle, black-/green-pigmented); electron_transfer, lignin_degradation, substrate_level_phosphorylation; categorical tolerances (alkaphilic, obligately-acidophilic, facultatively-alkaphilic, piezotolerant, non-halophilic, euryhaline); genomics features (genomic island, prophage, transposable element, mobile genetic element, plasmid carriage, rRNA-operon copy number).

Method & integrity

  • Deep-research confirmation workflow (59 trait–microbe pairs): each agent verified the organism fits the trait and that the citation identifier resolves (fetched). 57/59 cited; the 2 that couldn't be cited (Clostridium botulinum for spindle_shaped — content-filtered; Enterococcus faecium for mobile_genetic_element — none found) were dropped in favour of their cited co-representatives.
  • Microbe gate: every taxon_id verified under Bacteria/Archaea via the ncbitaxon.db entailed-edge closure.
  • Edison/falcon was also run as a cross-check (it produces causal-graph research reports rather than example organisms).

Final corpus

traits with examples: 225 | entries: 312 | microbial: 312/312 | cited: 312/312

just validate-strict: 0 errors.

🤖 Generated with Claude Code

Deep-research pass over the traits that still lacked canonical examples. Of the
173 CLASS traits without examples, ~143 genuinely cannot take a single-organism
example (numerical bins like ph_optimum_mid1 / temperature_range_*, gc_* and
genome-size value bins, the 20 observation/ measurement classes,
quantitative_property/, abstract parents such as pigmentation / gram_stain /
cell_shape / metabolism / *_preference / habitat_association, and
biosafety_level_4/5) and are intentionally left empty; the 94 OBJECT_PROPERTY
and 7 DATATYPE_PROPERTY relational predicates likewise have no organism exemplar.

The ~30 genuinely-resolvable phenotype classes get 1-2 representative microbes
each (225 traits / 312 entries total now):
- fine-grained trophic types (chemo/litho/photo-litho/photo-organo/phototrophic,
  chemoheterotrophic, organotrophic, copiotrophic, ...)
- distinctive shapes/pigments (curved, spindle, black-pigmented, green-pigmented)
- electron_transfer, lignin_degradation, substrate_level_phosphorylation
- categorical tolerances (alkaphilic, obligately_acidophilic,
  facultatively_alkaphilic, piezotolerant, non_halophilic, euryhaline)
- genomics features (genomic_island, prophage, transposable_element,
  mobile_genetic_element, plasmid_carriage, rrna_operon_copy_number)

- Microbe gate: every taxon_id verified under Bacteria/Archaea via ncbitaxon.db
  entailed-edge closure. Full corpus now 312 entries, 312 microbial, 312 cited.
- Citations from a deep-research confirmation workflow (each agent verified the
  organism fits the trait and the identifier resolves). 2 uncited entries
  (Clostridium botulinum for spindle_shaped, Enterococcus faecium for
  mobile_genetic_element) were dropped in favour of their cited co-representatives.
- just validate-strict: 0 errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@realmarcin realmarcin merged commit db3e279 into main Jun 29, 2026
2 checks passed
@realmarcin realmarcin deleted the claude/canonical-examples-remaining branch June 29, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant