Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement augment.gender #43

Merged
merged 12 commits into from
Dec 5, 2023

Conversation

Alnusjaponica
Copy link
Contributor

Motivation

Implement augment.gender

@Alnusjaponica Alnusjaponica marked this pull request as ready for review November 8, 2023 13:57
@Alnusjaponica Alnusjaponica force-pushed the implement-en-gender branch 2 times, most recently from 5b00faa to cc1ac74 Compare November 8, 2023 14:13
@liwii liwii self-requested a review November 13, 2023 03:28
Copy link
Contributor

@liwii liwii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice!!

Made a few comments, will take another look after the pyright error is resolved.

Copy link
Contributor

@liwii liwii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!!

I tried running the code and left one comment about nltk download.

I also found a small bug. When there are more than one sentence or the sentence has an extra space in the last, the space is handled weirdly. Could you investigate the reasons?

>>> langcheck.augment.en.gender("Actually she was a fish. Noooo")
['Actually they was a fish . Noooo']
>>> langcheck.augment.en.gender("Actually she was a fish. ")
['Actually they was a fish.']

src/langcheck/augment/en/_gender/_gender.py Show resolved Hide resolved
src/langcheck/augment/en/_gender/_gender.py Show resolved Hide resolved
@liwii
Copy link
Contributor

liwii commented Nov 22, 2023

Tell me once the small bug I mentioned here is also fixed!!

@Alnusjaponica
Copy link
Contributor Author

I looks like nltk library has a bug in detokenize method.
The first bug comes from here and the second one from here. Shall I

  • create issue in nltk's repository,
  • implement detokenize logic by my own in this PR,
  • or create issue here and fix them later?

@liwii
Copy link
Contributor

liwii commented Dec 2, 2023

Thanks for the investigation!!

In that case, it should be a lot of work to fix the problem and I don't think it's worth it.

As you mentioned,

  • create an issue in nltk's repository,
  • and create an issue here in case we want to fix it before nltk's fix?

After these issues are made and the conflicts are resolved, I'll approve this PR!

@Alnusjaponica
Copy link
Contributor Author

@liwii I opened the two issues I remarked.

Copy link
Contributor

@liwii liwii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!!

@Alnusjaponica Alnusjaponica merged commit de0c9fa into citadel-ai:main Dec 5, 2023
2 checks passed
@Alnusjaponica Alnusjaponica deleted the implement-en-gender branch December 5, 2023 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants