Skip to content
Becky Sharp edited this page Aug 13, 2021 · 6 revisions

Grounders

In Eidos, all grounding is done against an ontology. Internally, each ontology is represented as a DomainOntology. The DomainOntology still has the original yaml ontology content. That is, it still contains the String examples etc. However, this format isn't particularly computationally efficient, so instead of interacting with it directly, we use an OntologyGrounder (note, in practice all grounders extend the EidosOntologyGrounder subclass). The EidosOntologyGrounder is initialized with a DomainOntology, and also derives more efficient representations of the ontology node examples and patterns:

  • conceptEmbeddings: these are the internal representations of the ontology nodes in terms of their examples. Each node has a ConceptEmbedding, which essentially has a name and a vector representation that is more or less the average of the example/definition terms. See class definition for more detail.
  • conceptPatterns: similar to above, each node in the ontology here is represented in terms of its (optional) regex. Thus, each ConceptPattern essentially has a name and an optional array of Regex.

There are essentially two different Grounders in active use, selected here:

Flat grounding

Flat grounding is way simpler than compositional grounding, but pushes the complexity to the ontology. That is, more complex concepts must each have their own node in the ontology, and a given mention is aligned to one at a time. (In practice we return the top k as there's always some uncertainty involved in automated grounding).

The WM flat ontology has two main "branches" -- the interventions and the main ontology. Since the interventions were so numerous and at such a different level of granularity, the grounding algorithm first considers the main branch of the ontology, and only looks at the intervention branch if certain conditions are met.

The grounding essentially:

  1. first tries to match a main branch ontology node regex. If one matches, then it is considered to be a perfect match, and the node is selected.
  2. If there are no main branch regex matches, then the word embedding representation of the mention (the average of the embeddings for the tokens in the canonical name) is compared to all main branch nodes (their embedding representation is similarly the average of all non-stop words in the examples and definition).
  3. Then, the grounder checks to see if the mention should also be grounded against the intervention branch. This currently is done through pattern matching, using these patterns plus any included in that branch of the ontology.
  4. If allowed to match against the intervention branch, then again first the branch is checked for node regex matches and then embedding matches.
  5. If the algorithm didn't yet return a grounding, then finally, all groundings are combined, sorted, and the top k are returned.

Compositional grounding

Compositional grounding is a much more complicated algorithm, as the complexity burden has shifted from the ontology to the grounding process. To understand the approach, first you should understand the compositional grounding representation. Each is a 4-tuple, where the slots represent (in order):

  1. the theme of the concept
  2. any property of that theme
  3. the process that applies to or acts on the theme, if any
  4. any property of that process

The case class that stores this representation is a PredicateTuple. This is the primary target of the compositional grounder: given an EidosMention, return one or more PredicateTuples (which are wrapped into a PredicateGrounding for consistency. Note you can simplify this if desired!)