Skip to content

lsdr/ptstemmer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

PTStemmer - A stemming toolkit for Portuguese in Python

Features

  • Python implementations of Orengo, Porter, and Savoy stemmers
  • Fast: can stem more than 1.5M words/second on a normal desktop
  • Least Recently Used (LRU) stem cache
  • Support for lists of words to ignore (useful for stopword and named entity removal)

About the original project

The project was originally developed by Pedro Oliveira.

This is a fork automatically exported from the Google Code original repos that lived at code.google.com/p/ptstemmer.

The original codebase also contained Java and C# implementations of the stemmers, but I removed since I had no interested in them. I have the original code tagged under original-export and can be retrieved with a simple checkout:

$ git checkout original-export

Licensing

The original work, and therefore this fork, are licensed under the GNU Lesser General Public License, version 3.0 (LGPLv3).

A verbatim copy of the license can be found in the LICENSE file.

About

A stemming toolkit for Portuguese language in Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages