Linear-progressive text discovery engine exposing functionality through simple service APIs. Break text into stream of token/non-token slices. Tokens can be annotated with search term matches. Using adapters for popular DOM libraries (HtmlAgilityPack, AngleSharp), you can highlight HTML, break HTML at a word count, and more.
Clone or download

README.md

TextDiscovery

Linear-progressive text discovery engine in C#. Exposes functionality through simple service APIs. Break plain text into a sequence of slices which can be reconstituted as annotated text. Generate meta-rich tokens from a search expression to then be used to annotate source text matches; noise-word detection, tokenization, and matching options are configurable. Use a common adapter interface with interchangeable DOM libraries (HtmlAgility, AngleSharp, etc.) to do the following: mark search hits in the DOM, create HTML excerpts at a given word count with configurable element-breaking rules, and extract text content with selectively preserved formatting indicators. High degree of extensibility leveraging dependency injection. While regex can be used in advanced configurations, it is not required.