Skip to content

bobbytables/treat

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status Dependency Status

Treat is a toolkit for natural language processing and computational linguistics in Ruby. It provides a common API for a number of gems and external libraries for document retrieval, parsing, annotation, and information extraction.

Current features

  • Text extractors for PDF, HTML, XML, Word, AbiWord, OpenOffice and image formats (Ocropus)
  • Text retrieval with indexation and full-text search (Ferret)
  • Text chunkers, sentence segmenters, tokenizers, and parsers for several languages (Stanford & Enju)
  • Word inflectors, including stemmers, conjugators, declensors, and number inflection
  • Lexical resources (WordNet interface, several POS taggers for English, Stanford taggers for several languages)
  • Language, date/time, topic words (LDA) and keyword (TF*IDF) extraction.
  • Simple text statistics (frequency, TF*IDF)
  • Serialization of annotated entities to YAML or XML format
  • Visualization in ASCII tree, directed graph (DOT) and tag-bracketed (standoff) formats
  • Linguistic resources, including full ISO-639-1 and ISO-639-2 support, and tag alignments for several treebanks

Resources


License

This software is released under the GPL License and includes software released under the GPL, Ruby, Apache 2.0 and MIT licenses.

About

Text Retrieval, Extraction and Annotation Toolkit.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published