Skip to content

Latest commit

 

History

History
41 lines (28 loc) · 1.55 KB

File metadata and controls

41 lines (28 loc) · 1.55 KB
Error in user YAML: Alias parsing is not enabled.
---
uid: Lucene.Net.Analysis.Cn.Smart
summary: *content
---

Analyzer for Simplified Chinese, which indexes words. @lucene.experimental

Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.

  • StandardAnalyzer: Index unigrams (individual Chinese characters) as a token.

  • CJKAnalyzer (in the xref:Lucene.Net.Analysis.Cjk namespace of xref:Lucene.Net.Analysis.Common): Index bigrams (overlapping groups of two adjacent Chinese characters) as tokens.

  • SmartChineseAnalyzer (in this package): Index words (attempt to segment Chinese text into words) as tokens.

Example phrase: "我是中国人"

  1. StandardAnalyzer: 我-是-中-国-人

  2. CJKAnalyzer: 我是-是中-中国-国人

  3. SmartChineseAnalyzer: 我-是-中国-人