STAM is a data model for stand-off text annotation and described in detail here. This is a python library (to be more specific; a python binding written in Rust) to work with the model.
What can you do with this library?
- Keep, build and manipulate an efficient in-memory store of texts and annotations on texts
- Search in annotations, data and text, either programmatically or via the STAM Query Language.
- Search annotations by data, textual content, relations between text fragments (overlap, embedding, adjacency, etc),
- Search in text (incl. via regular expressions) and find annotations targeting found text selections.
- Search in data (set,key,value) and find annotations that use the data.
- Elementary text operations with regard for text offsets (splitting text on a delimiter, stripping text).
- Convert between different kind of offsets (absolute, relative to other structures, UTF-8 bytes vs unicode codepoints, etc)
- Read and write resources and annotations from/to STAM JSON, STAM CSV, or an optimised binary (CBOR) representation
- The underlying STAM modelaims to be clear and simple. It is flexible and does not commit to any vocabulary or annotation paradigm other than stand-off annotation.
This STAM library is intended as a foundation upon which further applications can be built that deal with stand-off annotations on text. We implement all the low-level logic in dealing this so you no longer have to and can focus on your actual application.
This library offers a higher-level interface than the underlying Rust library. We aim to implement the full model and most extensions.
A tutorial for working with this API is available in the form of an interactive Jupyter Notebook: STAM Tutorial: Standoff Text Annotation for Pythonistas.
.. toctree:: :maxdepth: 5