Snowball stemmers for deno. These stemmers are based on the compiled JavaScript stemmers from the snowball project version 2.2.0.
Provides the stem of the given word. Assumes that the input is lowercase.
import { assertStrictEquals } from "./test_deps.ts";
import { EnglishStemmer } from "https://deno.land/x/snowball/english_stemmer.ts";
const englishStemmer = new EnglishStemmer();
const stem = englishStemmer.stem("enthusiastically");
assertStrictEquals(stem, "enthusiast");
Here is an example with multiple words.
import { assertStrictEquals } from "./test_deps.ts";
import { EnglishStemmer } from "https://deno.land/x/snowball/english_stemmer.ts";
const sentence = "the quick brown fox jumped over the lazy dog";
const englishStemmer = new EnglishStemmer();
const stemmedSentence = sentence
.match(/\b\w\w+\b/gu)! // matches two or more word characters
.map((token) => englishStemmer.stem(token))
.join(" ");
assertStrictEquals(
stemmedSentence,
"the quick brown fox jump over the lazi dog",
);
Many languages are supported
import { assertStrictEquals } from "./test_deps.ts";
import { RussianStemmer } from "https://deno.land/x/snowball/russian_stemmer.ts";
const sentence = "обязательно выпейте свой овалтин";
const russianStemmer = new RussianStemmer();
const stemmedSentence = sentence
.split(/\s+/u)
.map((token) => russianStemmer.stem(token))
.join(" ");
assertStrictEquals(
stemmedSentence,
"обязательн вып сво овалтин",
);
There is an Object
containing all available languages and stemmers defined in
mod.ts
.
import { assertStrictEquals } from "https://deno.land/std@0.126.0/testing/asserts.ts";
import { LanguageStemmers } from "https://deno.land/x/snowball/mod.ts";
const spanishStemmer = new LanguageStemmers["Spanish"]();
assertStrictEquals(
spanishStemmer.stem("gracias"),
"graci",
);
Unless specified, there is only one stemmer available called LanguageStemmer which is exported from language_stemmer.ts and mod.ts. Replace language with the desired language respecting the hinted capitalization.
- Arabic
- Armenian
- Basque
- Catalan
- Danish
- Dutch
- DutchStemmer
- KraaijPohlmannStemmer
- English
- EnglishStemmer - Porter 2 or snowball algorithm
- PorterStemmer - Porter 1 stemmer
- LovinsStemmer - The first published stemming algorithm
- Finnish
- French
- German
- GermanStemmer
- German2Stemmer
- Greek
- Hindi
- Hungarian
- Indonesian
- Irish
- Italian
- Lithuanian
- Nepali
- Norwegian
- Protugese
- Romanian
- Russian
- Serbian
- Spanish
- Swedish
- Tamil
- Turkish
- Yiddish