-
Notifications
You must be signed in to change notification settings - Fork 180
/
TokenFilterName.yml
168 lines (133 loc) · 7.39 KB
/
TokenFilterName.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
### YamlMime:TSTypeAlias
name: TokenFilterName
uid: '@azure/search-documents.TokenFilterName'
package: '@azure/search-documents'
summary: >-
Defines values for TokenFilterName. \
<xref:KnownTokenFilterName> can be used interchangeably with TokenFilterName,
this enum contains the known values that the service supports.
### Known values supported by the service
**arabic_normalization**: A token filter that applies the Arabic normalizer to
normalize the orthography. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ar/ArabicNormalizationFilter.html
\
**apostrophe**: Strips all characters after an apostrophe (including the
apostrophe itself). See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html
\
**asciifolding**: Converts alphabetic, numeric, and symbolic Unicode
characters which are not in the first 127 ASCII characters (the "Basic Latin"
Unicode block) into their ASCII equivalents, if such equivalents exist. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html
\
**cjk_bigram**: Forms bigrams of CJK terms that are generated from the
standard tokenizer. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html
\
**cjk_width**: Normalizes CJK width differences. Folds fullwidth ASCII
variants into the equivalent basic Latin, and half-width Katakana variants
into the equivalent Kana. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html
\
**classic**: Removes English possessives, and dots from acronyms. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicFilter.html
\
**common_grams**: Construct bigrams for frequently occurring terms while
indexing. Single terms are still indexed too, with bigrams overlaid. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsFilter.html
\
**edgeNGram_v2**: Generates n-grams of the given size(s) starting from the
front or the back of an input token. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html
\
**elision**: Removes elisions. For example, "l'avion" (the plane) will be
converted to "avion" (plane). See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html
\
**german_normalization**: Normalizes German characters according to the
heuristics of the German2 snowball algorithm. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html
\
**hindi_normalization**: Normalizes text in Hindi to remove some differences
in spelling variations. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/hi/HindiNormalizationFilter.html
\
**indic_normalization**: Normalizes the Unicode representation of text in
Indian languages. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/in/IndicNormalizationFilter.html
\
**keyword_repeat**: Emits each incoming token twice, once as keyword and once
as non-keyword. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/KeywordRepeatFilter.html
\
**kstem**: A high-performance kstem filter for English. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/en/KStemFilter.html
\
**length**: Removes words that are too long or too short. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LengthFilter.html
\
**limit**: Limits the number of tokens while indexing. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.html
\
**lowercase**: Normalizes token text to lower case. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.htm
\
**nGram_v2**: Generates n-grams of the given size(s). See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html
\
**persian_normalization**: Applies normalization for Persian. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/fa/PersianNormalizationFilter.html
\
**phonetic**: Create tokens for phonetic matches. See
https://lucene.apache.org/core/4_10_3/analyzers-phonetic/org/apache/lucene/analysis/phonetic/package-tree.html
\
**porter_stem**: Uses the Porter stemming algorithm to transform the token
stream. See http://tartarus.org/~martin/PorterStemmer \
**reverse**: Reverses the token string. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html
\
**scandinavian_normalization**: Normalizes use of the interchangeable
Scandinavian characters. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.html
\
**scandinavian_folding**: Folds Scandinavian characters åÅäæÄÆ->a and
öÖøØ->o. It also discriminates against use of double vowels aa, ae, ao, oe
and oo, leaving just the first one. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianFoldingFilter.html
\
**shingle**: Creates combinations of tokens as a single token. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html
\
**snowball**: A filter that stems words using a Snowball-generated stemmer.
See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/snowball/SnowballFilter.html
\
**sorani_normalization**: Normalizes the Unicode representation of Sorani
text. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ckb/SoraniNormalizationFilter.html
\
**stemmer**: Language specific stemming filter. See
https://docs.microsoft.com/rest/api/searchservice/Custom-analyzers-in-Azure-Search#TokenFilters
\
**stopwords**: Removes stop words from a token stream. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/StopFilter.html
\
**trim**: Trims leading and trailing whitespace from tokens. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html
\
**truncate**: Truncates the terms to a specific length. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilter.html
\
**unique**: Filters out tokens with same text as the previous token. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/RemoveDuplicatesTokenFilter.html
\
**uppercase**: Normalizes token text to upper case. See
http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/UpperCaseFilter.html
\
**word_delimiter**: Splits words into subwords and performs optional
transformations on subword groups.
fullName: TokenFilterName
remarks: ''
isDeprecated: false
syntax: |
type TokenFilterName = string