Skip to content

Commit

Permalink
Update snowball (#1708)
Browse files Browse the repository at this point in the history
* Update snowball

* use rm allocator
  • Loading branch information
ashtul committed Dec 22, 2020
1 parent dae8d48 commit b3e86c2
Show file tree
Hide file tree
Showing 109 changed files with 17,785 additions and 6,587 deletions.
14 changes: 6 additions & 8 deletions docs/Commands.md
Expand Up @@ -62,10 +62,9 @@ FT.CREATE idx ON HASH PREFIX 1 doc: SCHEMA name TEXT SORTABLE age NUMERIC SORTAB
If an unsupported language is sent, the command returns an error.
The supported languages are:

> "arabic", "danish", "dutch", "english", "finnish", "french",
> "german", "hungarian", "italian", "norwegian", "portuguese", "romanian",
> "russian", "spanish", "swedish", "tamil", "turkish"
> "chinese"
Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian,
Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian,
Spanish, Swedish, Tamil, Turkish, Chinese

When adding Chinese-language documents, `LANGUAGE chinese` should be set in
order for the indexer to properly tokenize the terms. If the default language
Expand Down Expand Up @@ -1329,10 +1328,9 @@ FT.ADD idx doc1 1.0 FIELDS title "hello world"
If an unsupported language is sent, the command returns an error.
The supported languages are:
> "arabic", "danish", "dutch", "english", "finnish", "french",
> "german", "hungarian", "italian", "norwegian", "portuguese", "romanian",
> "russian", "spanish", "swedish", "tamil", "turkish"
> "chinese"
Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian,
Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian,
Spanish, Swedish, Tamil, Turkish, Chinese
If indexing a Chinese language document, you must set the language to `chinese`
in order for Chinese characters to be tokenized properly.
Expand Down
100 changes: 100 additions & 0 deletions src/dep/snowball/MANIFEST
@@ -0,0 +1,100 @@
README
src_c/stem_ISO_8859_1_basque.c
src_c/stem_ISO_8859_1_basque.h
src_c/stem_ISO_8859_1_catalan.c
src_c/stem_ISO_8859_1_catalan.h
src_c/stem_ISO_8859_1_danish.c
src_c/stem_ISO_8859_1_danish.h
src_c/stem_ISO_8859_1_dutch.c
src_c/stem_ISO_8859_1_dutch.h
src_c/stem_ISO_8859_1_english.c
src_c/stem_ISO_8859_1_english.h
src_c/stem_ISO_8859_1_finnish.c
src_c/stem_ISO_8859_1_finnish.h
src_c/stem_ISO_8859_1_french.c
src_c/stem_ISO_8859_1_french.h
src_c/stem_ISO_8859_1_german.c
src_c/stem_ISO_8859_1_german.h
src_c/stem_ISO_8859_1_indonesian.c
src_c/stem_ISO_8859_1_indonesian.h
src_c/stem_ISO_8859_1_irish.c
src_c/stem_ISO_8859_1_irish.h
src_c/stem_ISO_8859_1_italian.c
src_c/stem_ISO_8859_1_italian.h
src_c/stem_ISO_8859_1_norwegian.c
src_c/stem_ISO_8859_1_norwegian.h
src_c/stem_ISO_8859_1_porter.c
src_c/stem_ISO_8859_1_porter.h
src_c/stem_ISO_8859_1_portuguese.c
src_c/stem_ISO_8859_1_portuguese.h
src_c/stem_ISO_8859_1_spanish.c
src_c/stem_ISO_8859_1_spanish.h
src_c/stem_ISO_8859_1_swedish.c
src_c/stem_ISO_8859_1_swedish.h
src_c/stem_ISO_8859_2_hungarian.c
src_c/stem_ISO_8859_2_hungarian.h
src_c/stem_ISO_8859_2_romanian.c
src_c/stem_ISO_8859_2_romanian.h
src_c/stem_KOI8_R_russian.c
src_c/stem_KOI8_R_russian.h
src_c/stem_UTF_8_arabic.c
src_c/stem_UTF_8_arabic.h
src_c/stem_UTF_8_basque.c
src_c/stem_UTF_8_basque.h
src_c/stem_UTF_8_catalan.c
src_c/stem_UTF_8_catalan.h
src_c/stem_UTF_8_danish.c
src_c/stem_UTF_8_danish.h
src_c/stem_UTF_8_dutch.c
src_c/stem_UTF_8_dutch.h
src_c/stem_UTF_8_english.c
src_c/stem_UTF_8_english.h
src_c/stem_UTF_8_finnish.c
src_c/stem_UTF_8_finnish.h
src_c/stem_UTF_8_french.c
src_c/stem_UTF_8_french.h
src_c/stem_UTF_8_german.c
src_c/stem_UTF_8_german.h
src_c/stem_UTF_8_greek.c
src_c/stem_UTF_8_greek.h
src_c/stem_UTF_8_hindi.c
src_c/stem_UTF_8_hindi.h
src_c/stem_UTF_8_hungarian.c
src_c/stem_UTF_8_hungarian.h
src_c/stem_UTF_8_indonesian.c
src_c/stem_UTF_8_indonesian.h
src_c/stem_UTF_8_irish.c
src_c/stem_UTF_8_irish.h
src_c/stem_UTF_8_italian.c
src_c/stem_UTF_8_italian.h
src_c/stem_UTF_8_lithuanian.c
src_c/stem_UTF_8_lithuanian.h
src_c/stem_UTF_8_nepali.c
src_c/stem_UTF_8_nepali.h
src_c/stem_UTF_8_norwegian.c
src_c/stem_UTF_8_norwegian.h
src_c/stem_UTF_8_porter.c
src_c/stem_UTF_8_porter.h
src_c/stem_UTF_8_portuguese.c
src_c/stem_UTF_8_portuguese.h
src_c/stem_UTF_8_romanian.c
src_c/stem_UTF_8_romanian.h
src_c/stem_UTF_8_russian.c
src_c/stem_UTF_8_russian.h
src_c/stem_UTF_8_spanish.c
src_c/stem_UTF_8_spanish.h
src_c/stem_UTF_8_swedish.c
src_c/stem_UTF_8_swedish.h
src_c/stem_UTF_8_tamil.c
src_c/stem_UTF_8_tamil.h
src_c/stem_UTF_8_turkish.c
src_c/stem_UTF_8_turkish.h
runtime/api.c
runtime/api.h
runtime/header.h
runtime/utilities.c
libstemmer/libstemmer.c
libstemmer/libstemmer_utf8.c
libstemmer/modules.h
libstemmer/modules_utf8.h
include/libstemmer.h
7 changes: 4 additions & 3 deletions src/dep/snowball/Makefile
@@ -1,9 +1,10 @@
include mkinc.mak
CFLAGS+=-Iinclude
CFLAGS=-O2
CPPFLAGS=-Iinclude
all: libstemmer.o stemwords
libstemmer.o: $(snowball_sources:.c=.o)
$(AR) -cru $@ $^
stemwords: examples/stemwords.o libstemmer.o
$(CC) -o $@ $^
$(CC) $(CFLAGS) -o $@ $^
clean:
rm -f stemwords *.o src_c/*.o runtime/*.o libstemmer/*.o
rm -f stemwords *.o src_c/*.o examples/*.o runtime/*.o libstemmer/*.o

0 comments on commit b3e86c2

Please sign in to comment.