Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I search mixed language text? #357

Closed
yeonns opened this issue Jun 22, 2021 · 7 comments
Closed

How can I search mixed language text? #357

yeonns opened this issue Jun 22, 2021 · 7 comments

Comments

@yeonns
Copy link

yeonns commented Jun 22, 2021

Hello.

My markdown page contains "Korean" and "English" text like below

### 코드code

So, I want to search text with mixed language like "코드code", but I think it only supports one language per file/directory.
Is there any way searching with mixed language?

@alex-shpak
Copy link
Owner

Hi!
For searching flexsearch library is used.
I never tried to make mixed search settings for it, but you can give it a shot.

There is param in i18m called bookSearchConfig, where you can specify flexsearch indexing settings.
https://github.com/alex-shpak/hugo-book/blob/master/i18n/zh.yaml#L13

Current settings are taken from their docs
https://github.com/nextapps-de/flexsearch#cjk-word-break-chinese-japanese-korean

@alex-shpak
Copy link
Owner

Related issues in flexsearch repo
nextapps-de/flexsearch#207
nextapps-de/flexsearch#73

@yeonns
Copy link
Author

yeonns commented Jul 2, 2021

Hi!
I will try editing bookSearchConfig.
Thank you 👍

@yeonns
Copy link
Author

yeonns commented Jul 2, 2021

Hi!
I could search mixed language(Korean & English) with below bookSearchConfig
It was simpler than I thought.

- id: bookSearchConfig
  translation: |
    {
      split: " "
    }

I think It is little strange because I do not need any custom tokenize function, and just use split function. 😢
But the problem has been resolved, and thank you again! 😄

@alex-shpak
Copy link
Owner

Cool 👍

@marshall-999
Copy link

it is not work now.

@kjs104901
Copy link

This is working for me.

function(str){
  let result = []
  result.push.apply(result, str.replace(/[\x00-\x7F]/g, "").split(""));
  str.split(/[^a-zA-Z0-9\u00C0-\u00ff]/g).forEach(t => {
    if (t.length > 0) {
      result.push.apply(result, t.toLowerCase().split(/[\p{Z}\p{S}\p{P}\p{C}]+/u));
    }
  });
  return result;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants