Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spacymoji support #465

Closed
DamienJacquemart opened this issue Aug 11, 2020 · 5 comments
Closed

spacymoji support #465

DamienJacquemart opened this issue Aug 11, 2020 · 5 comments
Labels
bug bugs in the library
Projects

Comments

@DamienJacquemart
Copy link

DamienJacquemart commented Aug 11, 2020

Hi guys,

Thank you for the work in Thai !

Running spacymoji in a Spacy pipeline gives the following error (below the code). Do you know how could we overcome the problem ?


AttributeError Traceback (most recent call last)
in
1 from spacymoji import Emoji
2 nlp = spacy.blank('th')
----> 3 emoji = Emoji(nlp)
4 nlp.add_pipe(emoji, first=True)

~/Library/DataScienceStudio/dss_home/code-envs/python/nlp_SpaCy/lib/python3.6/site-packages/spacymoji/init.py in init(self, nlp, merge_spans, lookup, pattern_id, attrs, force_extension)
58 self.lookup = lookup
59 self.matcher = PhraseMatcher(nlp.vocab)
---> 60 emoji_patterns = list(nlp.tokenizer.pipe(EMOJI.keys()))
61 self.matcher.add(pattern_id, None, *emoji_patterns)
62 # Add attributes

AttributeError: 'ThaiTokenizer' object has no attribute 'pipe'

from spacymoji import Emoji
nlp = spacy.blank('th')
emoji = Emoji(nlp)
nlp.add_pipe(emoji, first=True)
@bact bact added the bug bugs in the library label Aug 11, 2020
DamienJacquemart added a commit to dataiku/dss-plugin-nlp-preparation that referenced this issue Aug 12, 2020
- disable tagger, parser

- fixed bug with url

- punctuation is retrieved, but it adds extra spaces. ('hellu.' -> 'hello .') No idea how do to better.

- fixed "hello." now it is tokenized ["hello", "."]

- hello.goodbye is still one token hello.goodbye which is spacy behaviour

- dictionnary has not "'nt", "'d", ... hence there are considered as spekking mistakes. We can add them to the dictionary.
@bact
Copy link
Member

bact commented Sep 7, 2020

@DamienJacquemart can we know which spacymoji and pythainlp versions that you use?

@DamienJacquemart
Copy link
Author

spacy==2.3.2
spacymoji==2.0.0
pythainlp==2.2.3

Thanks @bact !

@wannaphong
Copy link
Member

spacymoji not update since 2019. :(

@polm
Copy link

polm commented Sep 19, 2021

A new version of spacymoji with support for spaCy v3 was released in April and I think it resolved this issue 😎

@wannaphong
Copy link
Member

A new version of spacymoji with support for spaCy v3 was released in April and I think it resolved this issue 😎

Thank you for update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bugs in the library
Projects
PyThaiNLP
  
Awaiting triage
Development

No branches or pull requests

4 participants