Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional Analyzers, Tokenizers, and TokenFilters from Lucene #6693

Closed
wants to merge 2 commits into from

Commits on Jul 2, 2014

  1. Analysis: Add additional Analyzers, Tokenizers, and TokenFilters from…

    … Lucene
    
    Add `irish` analyzer
    Add `sorani` analyzer (Kurdish)
    
    Add `classic` tokenizer: specific to english text and tries to recognize hostnames, companies, acronyms, etc.
    Add `thai` tokenizer: segments thai text into words.
    
    Add `classic` tokenfilter: cleans up acronyms and possessives from classic tokenizer
    Add `apostrophe` tokenfilter: removes text after apostrophe and the apostrophe itself
    Add `german_normalization` tokenfilter: umlaut/sharp S normalization
    Add `hindi_normalization` tokenfilter: accounts for hindi spelling differences
    Add `indic_normalization` tokenfilter: accounts for different unicode representations in Indian languages
    Add `sorani_normalization` tokenfilter: normalizes kurdish text
    Add `scandinavian_normalization` tokenfilter: normalizes Norwegian, Danish, Swedish text
    Add `scandinavian_folding` tokenfilter: much more aggressive form of `scandinavian_normalization`
    Add additional languages to stemmer tokenfilter: `galician`, `minimal_galician`, `irish`, `sorani`, `light_nynorsk`, `minimal_nynorsk`
    
    Add support access to default Thai stopword set "_thai_"
    
    Fix some bugs and broken links in documentation.
    
    Closes elastic#5935
    rmuir committed Jul 2, 2014
    Copy the full SHA
    a0d51e5 View commit details
    Browse the repository at this point in the history

Commits on Jul 3, 2014

  1. Copy the full SHA
    5d10836 View commit details
    Browse the repository at this point in the history