Search function multi-language support #80

netcmcc · 2019-09-16T04:52:11Z

Other aspects of this theme are good for multi-language support, but the search function currently only supports English, and I hope to be able to extend the search function to support beyond English, such as Chinese support.

netcmcc · 2019-09-16T04:53:23Z

The title and directory support Chinese, but searching for Chinese is not working properly.

alex-shpak · 2019-09-22T13:19:59Z

Hi!
Looks like lunr.js, which is used for search, does not have support for Chinese olivernn/lunr.js#173

I will need to check what can be done.
This might be an alternative https://github.com/nextapps-de/flexsearch

alex-shpak · 2019-10-22T21:16:36Z

I tried flexsearch and looks like it works with this configuration
https://github.com/nextapps-de/flexsearch#cjk-word-break-chinese-japanese-korean

Unfortunately it's either chinese or english (or other language) not both at the same time.
Or I didn't find correct way to have both.

oshliaer · 2019-10-24T06:56:39Z

Russian search is not supported too.

@alex-shpak Any recommendations. How can I help?

alex-shpak · 2019-10-27T11:14:06Z

Hi!
I pushed changes to master. It introduces https://github.com/nextapps-de/flexsearch in replace for lunr.js. FlexSearch has more configuration options for multi-language support.

There is now BookSearchConfig parameter which is flexsearch configuration object.
So for example for chinese, accroding to this https://github.com/nextapps-de/flexsearch#cjk-word-break-chinese-japanese-korean

BookSearchConfig = '''{
  encode: false,
  tokenize: function(str){
    return str.replace(/[\x00-\x7F]/g, "").split("");
  }
}'''

For russian I think stemmer needs to be set
https://github.com/nextapps-de/flexsearch#add-language-specific-stemmer-andor-filter

Future work will include integration with multi-lang mode, having different configuration for indexing per language.
Unfortunately there is no easy way to make support for multiple languages in same index.

alex-shpak · 2019-11-11T22:28:04Z

I think this config should work for russian, filter is optional.

BookSearchConfig = '''{
  split: /[^a-zа-яё0-9]/gi,
  filter: [ 
    "в", "на", "и", "не", "о", "от", "с"
  ]
}'''

alex-shpak · 2019-11-16T20:43:39Z

Changes has been merged to master

kevinclcn · 2020-04-29T08:08:17Z

Hi!
I pushed changes to master. It introduces https://github.com/nextapps-de/flexsearch in replace for lunr.js. FlexSearch has more configuration options for multi-language support.

There is now BookSearchConfig parameter which is flexsearch configuration object.
So for example for chinese, accroding to this https://github.com/nextapps-de/flexsearch#cjk-word-break-chinese-japanese-korean
BookSearchConfig = '''{
  encode: false,
  tokenize: function(str){
    return str.replace(/[\x00-\x7F]/g, "").split("");
  }
}'''
For russian I think stemmer needs to be set
https://github.com/nextapps-de/flexsearch#add-language-specific-stemmer-andor-filter

Future work will include integration with multi-lang mode, having different configuration for indexing per language.
Unfortunately there is no easy way to make support for multiple languages in same index.

I worked around this issue by below config:

    {
      encode: false,
      tokenize: function(str) {
        return str.split(/\W+/).concat(str.replace(/[\x00-\x7F]/g, '').split('')).filter(e => !!e)
      }
    }

peter-liu · 2023-01-08T11:01:07Z

never mind, it's wrong （only works when you search upper case)

alex-shpak added the enhancement New feature or request label Oct 3, 2019

alex-shpak added a commit that referenced this issue Oct 20, 2019

#80, Migrate to flexsearch

d776999

alex-shpak added a commit that referenced this issue Oct 23, 2019

#80, Migrate to flexsearch

b4307e7

alex-shpak added a commit that referenced this issue Oct 27, 2019

#80, Add search index configuration

a5788d7

alex-shpak closed this as completed Nov 18, 2019

adrawerofthings mentioned this issue Jul 20, 2022

About Chinese Support travis-r6s/gridsome-plugin-flexsearch#23

Closed

clsty mentioned this issue Feb 13, 2024

Chinese (maybe CJK) search broken loikein/hugo-book#4

Closed

loikein mentioned this issue May 8, 2024

Fix multilingual search loikein/hugo-book#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search function multi-language support #80

Search function multi-language support #80

netcmcc commented Sep 16, 2019

netcmcc commented Sep 16, 2019

alex-shpak commented Sep 22, 2019

alex-shpak commented Oct 22, 2019

oshliaer commented Oct 24, 2019

alex-shpak commented Oct 27, 2019

alex-shpak commented Nov 11, 2019 •

edited

alex-shpak commented Nov 16, 2019

kevinclcn commented Apr 29, 2020

peter-liu commented Jan 8, 2023

Search function multi-language support #80

Search function multi-language support #80

Comments

netcmcc commented Sep 16, 2019

netcmcc commented Sep 16, 2019

alex-shpak commented Sep 22, 2019

alex-shpak commented Oct 22, 2019

oshliaer commented Oct 24, 2019

alex-shpak commented Oct 27, 2019

alex-shpak commented Nov 11, 2019 • edited

alex-shpak commented Nov 16, 2019

kevinclcn commented Apr 29, 2020

peter-liu commented Jan 8, 2023

alex-shpak commented Nov 11, 2019 •

edited