Support to yaha tokenizer #21

keineahnung2345 · 2017-12-21T03:37:11Z

Hello, this is just an information to people who want to do NER.

I have found jieba tokenizer is not very good at tokenizing chinese surnames. For example: "我姓林" will be tokenized to "我" and "姓林". So I want to use yaha tokenizer instead.
And so far I had make my own yaha_tokenizer.py and conduct some corresponding change in registry.py. For people who also want to do NER, you can visit my repository:
https://github.com/keineahnung2345/Rasa_NLU_Chi
and find the two files.

keineahnung2345 mentioned this issue Dec 21, 2017

Add support to yaha tokenizer #22

Merged

keineahnung2345 closed this as completed Dec 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to yaha tokenizer #21

Support to yaha tokenizer #21

keineahnung2345 commented Dec 21, 2017

Support to yaha tokenizer #21

Support to yaha tokenizer #21

Comments

keineahnung2345 commented Dec 21, 2017