How can I search mixed language text? #357

yeonns · 2021-06-22T11:32:36Z

Hello.

My markdown page contains "Korean" and "English" text like below

### 코드code

So, I want to search text with mixed language like "코드code", but I think it only supports one language per file/directory.
Is there any way searching with mixed language?

alex-shpak · 2021-07-01T14:43:40Z

Hi!
For searching flexsearch library is used.
I never tried to make mixed search settings for it, but you can give it a shot.

There is param in i18m called bookSearchConfig, where you can specify flexsearch indexing settings.
https://github.com/alex-shpak/hugo-book/blob/master/i18n/zh.yaml#L13

Current settings are taken from their docs
https://github.com/nextapps-de/flexsearch#cjk-word-break-chinese-japanese-korean

alex-shpak · 2021-07-01T14:45:23Z

Related issues in flexsearch repo
nextapps-de/flexsearch#207
nextapps-de/flexsearch#73

yeonns · 2021-07-02T09:42:32Z

Hi!
I will try editing bookSearchConfig.
Thank you 👍

yeonns · 2021-07-02T10:49:36Z

Hi!
I could search mixed language(Korean & English) with below bookSearchConfig
It was simpler than I thought.

- id: bookSearchConfig
  translation: |
    {
      split: " "
    }

I think It is little strange because I do not need any custom tokenize function, and just use split function. 😢
But the problem has been resolved, and thank you again! 😄

alex-shpak · 2021-07-12T09:53:34Z

Cool 👍

marshall-999 · 2023-05-30T09:39:37Z

it is not work now.

kjs104901 · 2024-02-22T16:22:20Z

This is working for me.

function(str){
  let result = []
  result.push.apply(result, str.replace(/[\x00-\x7F]/g, "").split(""));
  str.split(/[^a-zA-Z0-9\u00C0-\u00ff]/g).forEach(t => {
    if (t.length > 0) {
      result.push.apply(result, t.toLowerCase().split(/[\p{Z}\p{S}\p{P}\p{C}]+/u));
    }
  });
  return result;
}

alex-shpak closed this as completed Jul 12, 2021

dyxang mentioned this issue Oct 27, 2021

有大佬知道搜索功能如何改为中文吗？ #327

Closed

wenbingzhang mentioned this issue Mar 6, 2024

Language zh or cn, support for Chinese and English word splitting #598

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I search mixed language text? #357

How can I search mixed language text? #357

yeonns commented Jun 22, 2021

alex-shpak commented Jul 1, 2021

alex-shpak commented Jul 1, 2021

yeonns commented Jul 2, 2021

yeonns commented Jul 2, 2021 •

edited

Loading

alex-shpak commented Jul 12, 2021

marshall-999 commented May 30, 2023

kjs104901 commented Feb 22, 2024

How can I search mixed language text? #357

How can I search mixed language text? #357

Comments

yeonns commented Jun 22, 2021

alex-shpak commented Jul 1, 2021

alex-shpak commented Jul 1, 2021

yeonns commented Jul 2, 2021

yeonns commented Jul 2, 2021 • edited Loading

alex-shpak commented Jul 12, 2021

marshall-999 commented May 30, 2023

kjs104901 commented Feb 22, 2024

yeonns commented Jul 2, 2021 •

edited

Loading