uid: Lucene.Net.Analysis.Standard summary: *content

Fast, general-purpose grammar-based tokenizers.

The org.apache.lucene.analysis.standard package contains three fast grammar-based tokenizers constructed with JFlex:

xref:Lucene.Net.Analysis.Standard.StandardTokenizer: as of Lucene 3.1, implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29. Unlike UAX29URLEmailTokenizer, URLs and email addresses are not tokenized as single tokens, but are instead split up into tokens according to the UAX#29 word break rules.

[StandardAnalyzer](xref:Lucene.Net.Analysis.Standard.StandardAnalyzer) includes
[StandardTokenizer](xref:Lucene.Net.Analysis.Standard.StandardTokenizer),
[StandardFilter](xref:Lucene.Net.Analysis.Standard.StandardFilter), 
[LowerCaseFilter](xref:Lucene.Net.Analysis.Core.LowerCaseFilter)
and [StopFilter](xref:Lucene.Net.Analysis.Core.StopFilter).
When the `Version` specified in the constructor is lower than

3.1, the ClassicTokenizer implementation is invoked.

ClassicTokenizer: this class was formerly (prior to Lucene 3.1) named StandardTokenizer. (Its tokenization rules are not based on the Unicode Text Segmentation algorithm.) ClassicAnalyzer includes ClassicTokenizer, StandardFilter, LowerCaseFilter and StopFilter.

UAX29URLEmailTokenizer: implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29. URLs and email addresses are also tokenized according to the relevant RFCs.

[UAX29URLEmailAnalyzer](xref:Lucene.Net.Analysis.Standard.UAX29URLEmailAnalyzer) includes
[UAX29URLEmailTokenizer](xref:Lucene.Net.Analysis.Standard.UAX29URLEmailTokenizer),
[StandardFilter](xref:Lucene.Net.Analysis.Standard.StandardFilter),
[LowerCaseFilter](xref:Lucene.Net.Analysis.Core.LowerCaseFilter)
and [StopFilter](xref:Lucene.Net.Analysis.Core.StopFilter).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

package.md

package.md

uid: Lucene.Net.Analysis.Standard summary: *content

Files

package.md

Latest commit

History

package.md

File metadata and controls

uid: Lucene.Net.Analysis.Standard summary: *content