Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom tokenizer #17

Open
morygonzalez opened this issue May 19, 2022 · 4 comments
Open

Custom tokenizer #17

morygonzalez opened this issue May 19, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@morygonzalez
Copy link

I want to use Tantiny with Japanese. There are several Tantivy tokenizers for Japanese language. I'm now considering lindera-tantivy which supports not only Japanese but also Chinese and Korean. Is it possible to use these custom tokenizers with Tantivy via Tantiny?

@baygeldin baygeldin added the enhancement New feature or request label May 21, 2022
@baygeldin
Copy link
Owner

Hey @morygonzalez, currently Tantiny does not support custom tokenizers. I had some ideas how to implement it, but it's a complex issue to tackle due to the fact that it requires extending behaviour in runtime which is not easy to do with Rust (let alone it's interaction with Ruby).

However, it seems that lidera is quite a useful project and it might make sense to just add a new tokenizer type to Tantiny that uses it. This is much easier than dealing with custom tokenizers. What do you think?

@morygonzalez
Copy link
Author

@baygeldin Thank you! That's cool. I'm happy with your suggestion!!

@baygeldin
Copy link
Owner

Okay, I'll see what I can do, but probably after I deal with aggregations (or you can make a PR yourself if you want).

@morygonzalez
Copy link
Author

I see. I'll try to make a Pull Request though I'm quite new to Rust then it'll take some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants