-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spaCy 3.0 #7869
spaCy 3.0 #7869
Conversation
Milestone: the docker containers build! |
…pacy-3-dot-0-support
I think, as is, this PR is a nice feature. Many more languages will be supported and many more tools can be built on the linguistic features inside of spaCy now. I haven't check support for the new transformer models though. These will require more work because the API is slightly different. We could say "this feature ain't done until those models go in", but I'll leave this up for discussion. Do we need to maybe do a performance check for models? The embeddings have been retrained if I recall correctly. Things to do:
|
@koaning Can we make this not breaking by using the |
I guess we could ... but it gets very tricky. The English models use From my perspective spaCy has made a very sensible change here, as it forces users to be explicit, which is more maintainable. |
@wochinge I'm wondering timeline-wise ... should I merge all of this work on top of the |
As it's feature it should go into
I agree. My issues are just that we shouldn't have this sort of breaking changes in a minor. Couldn't we do
In this case we would at least try? |
@wochinge thanks for pointing out the remaining flaws. Once again, I think the changes are now properly addressed. I didn't resolve any issues myself to make it easy for you to confirm. There's a new issue though, it seems like the doc tests aren't passing. Since the issue revolves around dead links on other parts of the repo, can I safely ignore these? I'm running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nearly there 💯
Co-authored-by: Tobias Wochinger <t.wochinger@rasa.com>
Co-authored-by: Tobias Wochinger <t.wochinger@rasa.com>
Co-authored-by: Tobias Wochinger <t.wochinger@rasa.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good 🚀
@joejuzl is it okay to dismiss your "Request for changes"?
rasa/nlu/utils/spacy_utils.py
Outdated
@@ -9,6 +10,8 @@ | |||
from rasa.shared.nlu.training_data.message import Message | |||
from rasa.nlu.model import InvalidModelError | |||
from rasa.nlu.constants import SPACY_DOCS, DENSE_FEATURIZABLE_ATTRIBUTES | |||
from rasa.shared.utils.io import raise_deprecation_warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from rasa.shared.utils.io import raise_deprecation_warning | |
rasa.shared.utils.io |
According to our Python code conventions we don't import functions directly but reference them by their module. https://www.notion.so/rasa/Python-Code-Conventions-26835ffd72484dd699f28fa693916d2e#5b63fd5269514270aaf61ab8303ae319
Co-authored-by: Tobias Wochinger <t.wochinger@rasa.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job!
Booya! |
Thanks for this awesome change @koaning ! 💯 |
This PR should fix #7724, fix #6906, fix #5400 and add support for many languages!
This PR will add basic support for spaCy 3.0. I'll list a bunch of things that have changed.
poetry update
which caused a lot of changes inpoetry.lock
spacy link
command has been deprecated. That means that some of the Dockerfiles needed to be rewritten as well as ourMakefile
. This also means that this PR contains breaking changes. You must name your spaCy model explicitly in a pipeline from now on.SpacyNLP
component in the tests now needs a model attached. I've chosen to useen_core_web_md
(for English) andde_core_news_sm
(for German). These are already downloaded beforehand in the CI. Note that this means that loads of tests required an update.