Skip to content
Zheng Tang edited this page Aug 21, 2021 · 3 revisions

Extraction flow

The main entrypoint to the EidosSystem for extraction is the extractFromDoc method (there is a comparable one for extracting from text that annotated the text and then calls this method). In this method, you will see the creation of Refiners which are used in the extraction pipeline. If you follow the method path, you'll see that in the overloaded method, the flow is as follows:

  1. The DocumentRefiners filter or adjust the document prior to extraction (e.g., filtering long sentences or those which are likely mis-parsed tables from a pdf).
  2. Odin is run over the document by successive application of enabled Finders
  3. The Odin Mentions are post-processed with any enabled odinRefiners (e.g., handling of hedging, negation, etc.)
  4. An AnnotatedDocument is created with the resulting Odin Mentions. During this process, each Odin mention is mapped to an EidosMention, which is a wrapper class that holds additional metadata.
  5. The EidosMentions are post-processed with enabled EidosRefiners (e.g., ontology grounding, grounding of gradable adjectives/Quantifiers).