Libgwords is a GObject-based library that allows application developers to manipulate natural language texts easily. It is designed to supply consumers with text iteration and manipulation data structures and algorithms, such as dictionaries, word and sentence segmenters, sentence manipulators, grammar trees, word conjugation and more.
Libgwords aims to provide the following funtionality:
- Efficient associative array (ART)
- Language abstractions
- Word and sentence segmentation
- Sentence manipulation
- Grammar decomposition
- Dictionaries
- Operation pipelines
- Root extractor
- Word conjugation
- String iteration
- Synonims and antonyms
This is a tentative roadmap of how to achieve the goals of libgwords. Since it's an auxiliary library that I'm building as part of (but not limited to) my masters program, this can change during the development.
- Adaptative Radix Tree
- GwRadixTree
- Dictionaries
- GwDictionary (stub)
- Word segmentation
- GwSegmenter
- GwSegmenterFallback
- GwSegmenterPtBr
- Refcounted strings
- GwString
- Sentence manipulation
- GwStringEditor
- Language abstractions
- GwGrammarClass
- GwWord
- Dictionaries
- Word lookup
- Grammar abstraction
- Grammar decomposition
- Language abstractions
- GwMark
- GwParagraph
- GwSentence
- GwWord
- Dictionaries
- Word lookup