SpaCy Wrapper #1226
jankounchained
started this conversation in
Ideas
SpaCy Wrapper
#1226
Replies: 2 comments 1 reply
-
This commit #1195 will, I hope, address your a) and b) parts. I'll improve docs later. |
Beta Was this translation helpful? Give feedback.
1 reply
-
I also made a separate repository named cltk-with-spacy https://github.com/clemsciences/cltk-with-spacy (hosted by myself for now) that might become an independent package that only deals with spaCy for cltk. The motivation behind it is to make cltk lighter for those who do not need such dependency. There will be the same for stanza (see https://github.com/clemsciences/cltk-with-stanza). |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey all
Me and @x-tabdeveloping are looking into CLTK's spaCy wrapper.
Right now, spacy models are hardcoded – each language has a default model.
We think instead, there should be two ways how to load a spaCy model within CLTK:
a) Using custom models with CLTK
The
SpacyProcess
class should have an optional argument, which is a spacyLanguage
object.The user would be allowed to download any spaCy model themselves, load it with
spacy.load()
, and then pass the object intoSpacyProcess
.b) Sensible default spaCy models accessible through a language tag.
This means another optional argument in
SpacyProcess
, a language tag, e.g.grc
that is paired with a default model for that language (e.g. odyCy or greCy). CLTK downloads the model, the user doesn't have to touch spaCy.Long-time users of CLTK can use the API the way they are used to.
Perhaps we can discuss which spaCy models we should pick as defaults here as well.
Another suggestion is to add a section to CLTK documentation that will describe:
Beta Was this translation helpful? Give feedback.
All reactions