New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azimuth for French with language selection in config #239
Conversation
b4566ec
to
7d62425
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this gives us a good idea of the changes required! Some comments.
tests/test_modules/test_dataset_analysis/test_syntax_tagging.py
Outdated
Show resolved
Hide resolved
7d62425
to
b4c3e7f
Compare
efbf50a
to
7d96ce0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hooo yeah! 👍
@gabegma I'll wait for you to read docs (at your leisure - hope you enjoy nonfiction 😄 📖 ) before merging! |
9e4b7f0
to
bdc0c9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome PR! And great work with the documentation. I have a few comments. I've left the regex review to @JosephMarinier ;)
docs/docs/reference/configuration/analyses/behavioral_testing.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!! Love the documentation :)
azimuth/config.py
Outdated
# This class should remain empty! | ||
pass | ||
# Before adding attributes: Remember that dependence on an attribute in AzimuthConfig will | ||
# cause a module to include all other configs in its scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd change only one word to be clear that it won't automatically "cause" that, but it will "force" you to set the whole AzimuthConfig
as the module's scope.
# cause a module to include all other configs in its scope. | |
# force the module to include all other configs in its scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# cause a module to include all other configs in its scope. | |
# force a module to include all other configs in its scope. |
Sounds good! Just going to use "a" instead of "the."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I used "the" was to indicate that I am referring to the same module as the one with a "dependence on an attribute in AzimuthConfig". I didn't want to sentence to be understood as "dependence on an attribute [...] will force [another random] module to include all other configs in its scope".
Maybe I should have modified the first line
Remember that if a module depends on an attribute in AzimuthConfig, it will need to include all other configs in its scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand the intention exactly! When I read what I committed, I read it as exactly that, whereas with "the," the pronoun feels surprising. I'm having trouble figuring out how to explain the grammatical reasoning...
Currently:
Before adding attributes: Remember that dependence on an attribute in AzimuthConfig will cause a module to include all other configs in its scope.
It is clear to me that the first phrase ("dependence...AzimuthConfig") belongs to the "a module" mentioned later - that it's a specific module that is the topic (subject?) of the sentence.
With the:
Before adding attributes: Remember that dependence on an attribute in AzimuthConfig will cause the module to include all other configs in its scope.
...I'm like, wait, what? Which module were we talking about? Were we talking about a module already?
But, all that said, even if I had a sound grammatical argument, it doesn't matter if it is confusing to you (or any other user). So to be super clear, we could change it to what you suggested, with a slight rephrasing in line with your earlier suggestion, and one of mine to be concise:
Reminder: If a module depends on an attribute in AzimuthConfig, the module will be forced to include all other configs in its scope.
What do you think? I'm not opposed to doing a new PR for one comment, all in the name of clear communication. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's perfect! PR: #310
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful! Thank you!
Co-authored-by: Joseph Marinier <joseph.marinier@servicenow.com>
Resolve #265
New news
No longer a draft PR; should run acceptably on French. Language-switching from config, updated behavioral tests, and multilingual FAISS encoder now included. Also more consistent reference to language-specific defaults in docs.
Old news (for reference):
Draft PR because:
Description:
New news
This code makes Azimuth work on French data/pipelines, and the language can be selected in the config.
Main changes:
Out of scope things that could improve French Azimuth:
Old news (for reference)
Addressed one aspect of Azimuth running for French models: spaCy part-of-speech tagging. This branch loads a French model (instead of English) and the
missing_xyz
smart tag POS/dependency tag lists have been updated to include French tags.Several issues came up:
j'
was not tagged as a subject) and search from the control panel. I added a new function to replace single quotes with apostrophes and used it both for searching and filtering. It worked reasonably well; out of 370 instances ofj'
in the eval set, 326 were taggedmissing_verb
before the fix, and 22 afterward (most of which started with "j'aimerais").md
spaCy model (instead ofsm
).md
model was a substantial improvement and fixed the majority of the issue. The exceptions seem to all be cases when the verb is capitalized (but not all capitalized verbs have problems; most don't). The capitalization issue seems to be fixed with thelg
model...but it's like 500 MB...so I'm not sure if that's worth it (and haven't done it yet).Checklist:
You should check all boxes before the PR is ready. If a box does not apply, check it to acknowledge it.
ran
pre-commit run --all-files
at the end.Run
cd webapp && yarn types
while the back-end is running.our users.
README
files and our wiki for any big design decisions, if relevant.