Skip to content

v2.1.0

Choose a tag to compare

@ti250 ti250 released this 25 Feb 17:21
· 71 commits to master since this release
df508f6

Implemented Enhancements:

  • An improved NER system that allows for much better performance on inorganic materials.
  • New tokenization to go with the new NER system.
  • The addition of InferredProperty allows for users to define explicit links between different properties included in their data models, reducing a large amount of boilerplate parser code.
  • The Every parse element means that users can specify that a certain token satisfy multiple condition.
  • A more flexible tagging system that allows for the creation of taggers beyond just part of speech and NER taggers.
  • Batch tagging.
  • A new, modern theme for the documentation, along with much more detail in the documentation on certain parts of ChemDataExtractor, such as tagging and tokenization.

Breaking Changes:

  • Any taggers previously written by the user will be broken. Please refer to the migration guide for version 2.1.
  • The new tokenization can break some parse rules written by the user. This can either be fixed by adopting a few changes to the parse rules, or by reverting to the previous NER system and tokenizer. Please refer to the migration guide for more details.