standoff2inline -- Converting standoff annotations to inline annotations
Inline annotations are annotations stored within the annotated text, like XML annotations.
The little <noun>cat</noun> drinks milk.
Standoff annotations are annotations stored separately from the text, usually with characters or token positions. For example, in the sentence:
The little cat drinks milk.
the third word, between the 12th and 14th characters, is a noun, so the standoff annotations may be something like this:
This python module offer classes and function to:
- add inline annotations, like xml annotations, counting in characters or tokens,
- highlight some chunks of text, for example with styled
- remove parts without annotations and replace them with something like
Download the module and copy it in your current directory, or a directory of your
PYTHONPATH variable, under the name
Create a new Python script:
from standoff2inline import Standoff2Inline string = "The little cat drinks milk." inliner = Standoff2Inline() inliner.add((0, "<sent>"), (26, "</sent>")) inliner.add((0, "<gn>"), (13, "</gn>")) inliner.add((11, "<noun>"), (13, "</noun>")) inliner.add((22, "<noun>"), (25, "</noun>")) inliner.add((0, "<det>"), (2, "</det>")) inliner.apply(string)
When you execute it, you will get:
<sent><gn><det>The</det> little <noun>cat</noun></gn> drinks <noun>milk</noun>.</sent>
Full documentation can be found in the doc/user_guide.html file.
A Jupyter notebook doc/user_guide.ipynb is available for you to play with.
Bruno Oberle. Please contact me at boberle.com.
See my other projects at boberle.com!
Copyright 2019 Bruno Oberle
This software is released under the terms of the Mozilla Public License 2.0. See the LICENSE file for details. This program comes with ABSOLUTELY NO WARRANTY.