Multiple implementations of the Paice/Husk (Lancaster) stemming algorithm
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Lingua-Stem-PaiceHusk-1.01
README.md
paicehusk_ansic.c
paicehusk_java.java
paicehusk_pascal.pas
paicehusk_perl.pl
paicehusk_rules.txt
run_paicehusk.bash
wordlist.txt

README.md

Paice/Husk (Lancaster) Stemmer

This page offers interchangeable implementations of the Paice/Husk stemmer, developed by Chris Paice and Gareth Husk. The official stemmer website (archived copy) has more information.

The implementations here provide different results from the "official" releases. Each of the releases on the Lancaster site produce slightly differing stems, making it impossible to use a mix of, say, C and Java stemmers on the same project. In this project, source code is provided in C, Java, Perl, and Pascal; each of these implementations produces identical stemming results, allowing different languages to work together. The source code is commented and laid out to mostly match this flowchart of the algorithm:

flowchart

Perl users may prefer the Perl module Lingua::Stem::PaiceHusk instead, suitable for incorporation into other programs.