Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My french_conjugation_transformation #212

Closed

Conversation

Louanes1
Copy link

Faced this issue when using pre-commit :

image

When I went to check the "pre-commit-config.yaml" file, it says that the repo for this hook is local.

image

Since black, flake8 and isort hooks passed, I commited the code with following command :

git commit -m "My french_conjugation_transformation" -n

@AbinayaM02
Copy link
Collaborator

Hi @Louanes1 : Thanks for your contribution. The pre-commit hook failed because your test case didn't pass. Did you try running the pytest for your transformation locally?

@Louanes1
Copy link
Author

Louanes1 commented Aug 20, 2021

Hi @Louanes1 : Thanks for your contribution. The pre-commit hook failed because your test case didn't pass. Did you try running the pytest for your transformation locally?

Thanks for the reply @AbinayaM02 :) Yes when I run the pytest , the test passes :

image

Oh, I think it's because I added a direct link to download the "fr_core_news_lg" in my requirements file, the build in "Checks" fails because of some characters in the link. I'll try to add it differently

@AbinayaM02
Copy link
Collaborator

AbinayaM02 commented Aug 20, 2021

Oh, I think it's because I added a direct link to download the "fr_core_news_lg" in my requirements file, the build in "Checks" fails because of some characters in the link. I'll try to add it differently

Yes, you're right. The build is failing because of that.

Since your test case passed when you ran it separately, ideally pre-commit test hook shouldn't fail. Try running the pre-commit and see if it throws any specific message.

[Edit] Are you using ubuntu or windows while committing the code?

@Louanes1
Copy link
Author

Oh, I think it's because I added a direct link to download the "fr_core_news_lg" in my requirements file, the build in "Checks" fails because of some characters in the link. I'll try to add it differently

Yes, you're right. The build is failing because of that.

Since your test case passed when you ran it separately, ideally pre-commit test hook shouldn't fail. Try running the pre-commit and see if it throws any specific message.

[Edit] Are you using ubuntu or windows while committing the code?

I am using windows, and when I run the pre-commit, the test hook still fails :/
After removing the fr_core_news_lg, the build throws another error :
image

It looks like there is a conflict with the nltk version I use in my requirements. I removed the version so pip will attempt to resolve the dependecy conflict.

@Louanes1
Copy link
Author

Hi @AbinayaM02,
The dependencies installed correctly ! No more conflict :)
However, I use a spacy package called "fr_core_news_lg" and this one needs to be download with following : python -m spacy download fr_core_news_lg.

The installation is supposed to be similar to the en_core_web_sm found in initialize.py. I've added a link to download it in my requirements file (its commented otherwise special characters will fail the build : https://github.com/explosion/spacy-models/releases/download/fr_core_news_lg-3.0.0/fr_core_news_lg-3.0.0-py3-none-any.whl)

I have also tried to donwload the fr_core_news_lg, add it to my folder and add a link to that file in my requirements, but obviously it's too large, git won't let me push.

Do you have any idea on how I am suppose to proceed ? Thanks

@AbinayaM02
Copy link
Collaborator

AbinayaM02 commented Aug 23, 2021

Do you have any idea on how I am suppose to proceed ? Thanks

Try adding your model in the below format in the requirement.txt and uncomment it. (Hopefully, it should work!)
fr_core_news_lg @ https://github.com/explosion/spacy-models/releases/download/fr_core_news_lg-3.0.0/fr_core_news_lg-3.0.0-py3-none-any.whl

@Louanes1
Copy link
Author

Do you have any idea on how I am suppose to proceed ? Thanks

Try adding your model in the below format in the requirement.txt and uncomment it. (Hopefully, it should work!)
fr_core_news_lg @ https://github.com/explosion/spacy-models/releases/download/fr_core_news_lg-3.0.0/fr_core_news_lg-3.0.0-py3-none-any.whl

Yes it works thanks ! @AbinayaM02
I've faced an exit code 137, maybe due to memory the model takes, I've switched fr_core_new_lg to fr_core_news_md. I will try with fr_core_news_sm if the issue still persists

@richplant
Copy link

Seems inefficient to load a second Spacy model that might never get used for more than this single transformation in the initialize script. You should probably import it as a module and call the .load() function directly on it in your script.

You also need to add the appropriate language tags and keywords to your class and a robustness evaluation to the readme (check the evaluate.py script in the main dir).

@mille-s
Copy link
Contributor

mille-s commented Sep 21, 2021

Hi, nice transformation! Do we have an idea of the accuracy of the substitution? Does it fail sometimes, and if so how often?

@Louanes1
Copy link
Author

Hi, nice transformation! Do we have an idea of the accuracy of the substitution? Does it fail sometimes, and if so how often?

Hello @mille-s thank you ! The conjugation of the verbs is quite robust since it is relying on the mlconjug library and their model is trained on the different verb groups

In french, we have 3 different group of verbs

  • 1st group where verbs end with "er" like "manger, écouter"
  • 2nd group where they end with "ir" like "finir, courir"
  • 3rd group where they end with "re" like "boire, croire"

The conjugation of each group is different whatever the tense.

The french model behind mlconjug is trained to predict the conjugation based on these different groups, so that even when a verb do not exist like "facebooker", it will still consider it as a verb of 1st group and conjugate it accordingly.

However in order to conjugate a verb we first need to transform it into its "indicative" form (he ate --> to eat) and send it to the mlconjug function. And we do that with lemmatization, so I guess that if we can not get the lemma of the conjugated verb from the original sentence, we won't be able to conjugate it to a different tense. The spacy lemmatizer is quite good though so I didn't encounter verbs that could trigger these issue.

The common issues I encountered are more linked to the "pronoun" it should be conjugated to (because the conjugation also differs based on the pronoun used, and it is an entry of the mlconjug function). Right now it can handle sentences where the subject is a pronoun defined in our dictionnary. But if the subject is a group of words ( the parentsinstead of they) it won't be able to detect the pronoun, therefore won't know to which person the verb is supposed to be conjugated.

@mille-s
Copy link
Contributor

mille-s commented Sep 21, 2021

@Louanes1 ok thanks for the answer! Note that there should be some kind of filter applied on the candidate input sentences that returns only sentences with one main verb and a subject pronoun before applying your transformation. Do you have such a filter at hand?

@Louanes1
Copy link
Author

Louanes1 commented Oct 3, 2021

Hey @tuetschek , @mille-s

Re isn't is safe to assume that if there is no pronoun a verb is in third person? Then it's a matter of finding out if the verb is singular or plural, which should be doable. This could be a future improvement to reach more verbs.

Actually, this makes sense for verbs in english. In french the spelling is different when the verb is conjugated with each plural pronoun :

I ate --> Je mangeai
You ate --> Tu mangeas
He ate --> Il mangea
We ate --> Nous mangâmes
You ate --> Vous mangeâtes
They ate --> Ils mangèrent

There is quite a different ending depending on the pronoun used, that is why we cannot assume one of them in case we didn't find any.
I am trying to build a classification model where I pass a conjugated verb and I expect the pronoun it is conjugated to, to be returned.
I've gathered around 3000 verbs; I conjugate them to past, futur and present, then I map each one of them with its pronoun. (My dataset have 2 columns : all conjugated verbs, and pronoun) I am still working on it, trying to get the prediction, I will let you know as soon as I find something that could help us solve this issue.
By the way, which classification algorithm, you guys think I should use for this kind of task ?

Meanwhile, I believe it is better to conjugate verbs to the latest detected pronoun, so that it can handle cases where a subject does more than 1 action, many verbs should be conjugated to that pronoun.

@AbinayaM02
Copy link
Collaborator

Hi @Louanes1: Please add your transformation name to the test/mapper.py in the right dictionary for the pytest to pick up your test.json. By default, we're testing only light transformations and filters.

@mille-s
Copy link
Contributor

mille-s commented Oct 4, 2021

@Louanes1 : what I meant was that if there is no pronoun before a verb, this verb will very likely be in third person since it's almost mandatory to have a first or second person pronoun to have a verb conjugated in first or second person (except for the imperative mood, for which the verb is used with no pronouns, but this mood is overall quite unfrequent). So to get the right ending of a verb when there is no pronoun, it boils down to finding the number (third sigular or third plural), which I think can be derived from the original verb form with simple regex in many cases (this needs to be checked with more care). In any case this can be added later as an improvement, no need to do it now.

@AbinayaM02
Copy link
Collaborator

Hi @Louanes1: Please do not rebase the branch. Follow the below steps to add your changes only,

  1. Fetch and merge the main branch of your repository which you have already done on the UI.
    image
  2. Checkout the main branch and pull the main branch on to your local repository.
git checkout main
git pull origin main
  1. Switch to your branch where you're making changes and pull the main branch into that.
git checkout french_conjugation_transformation
git pull origin main
  1. If there are any conflicts, resolve them (mostly there won't be any except for the test/mapper.py where the latest file will have your changes). Add your changes pertaining to french conjugation transformation and commit.
  2. Push your changes to the upstream repository.
git push origin french_conjugation_transformation
  1. Your PR will be automatically updated.

You can either try to fix your current PR or open a new clean PR with only your changes.

@Louanes1
Copy link
Author

Louanes1 commented Oct 5, 2021

I've ended up creating a new PR : #308 :)
@AbinayaM02 @mille-s

@AbinayaM02
Copy link
Collaborator

Closing this PR since a PR is created with the requested changes.

@AbinayaM02 AbinayaM02 closed this Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet