Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate AttaCut to PyThaiNLP #258

Closed
heytitle opened this issue Aug 25, 2019 · 5 comments

Comments

@heytitle
Copy link
Contributor

@heytitle heytitle commented Aug 25, 2019

AttaCut has been recently released. It would be great if we can integrate it into PyThaiNLP's ecosystem.

Below is the speed benchmark of AttaCut comparing to PyThaiNLP and DeepCut:
image
Note:

  • AttaCut-SC uses syllable and character features, while AttaCut-C uses only character features.
  • This benchmark is done on Colab and results on actual production environments might be different.

From what I see, the integration is quite straightforward because AttaCut already has an higher-level API that one can import and use its functionalities.

from attacut import Tokenizer

atta = Tokenizer(model="attacut-sc")
atta.tokenizer(txt)

More information can be found at: https://github.com/heytitle/attacut#higher-level-inferface.

Related to #210

@bkktimber

This comment has been minimized.

Copy link
Contributor

@bkktimber bkktimber commented Aug 30, 2019

I added attacut already. Do you have any contribution guideline so I can follow before I push to github?

thx

@heytitle

This comment has been minimized.

Copy link
Contributor Author

@heytitle heytitle commented Aug 30, 2019

@bkktimber I've published a new version of AttaCut with tokenize fuction that instantiates only one tokenizer (v.0.0.6-dev). Please see the document on how to use it.

https://pythainlp.github.io/attacut/#higher-level-interface

Could you please update the PR accordingly? :)

@bkktimber

This comment has been minimized.

Copy link
Contributor

@bkktimber bkktimber commented Aug 31, 2019

sure. will do over this weekend.

thank you

@wannaphongcom

This comment has been minimized.

Copy link
Member

@wannaphongcom wannaphongcom commented Sep 3, 2019

Done

@bact

This comment has been minimized.

Copy link
Member

@bact bact commented Sep 9, 2019

@bact bact closed this Sep 9, 2019
@bact bact added this to the 2.1 milestone Oct 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.