Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plural Stemmer for English #1750

Closed
jgschis opened this issue Nov 1, 2022 · 10 comments
Closed

Plural Stemmer for English #1750

jgschis opened this issue Nov 1, 2022 · 10 comments
Assignees
Milestone

Comments

@jgschis
Copy link

jgschis commented Nov 1, 2022

The snowball and porter stemmers are too aggressive for ecommerce. For example, the word dressing gets stemmed to dress. But dressing and dress are two different concepts and shouldn't be conflated.

I think we need a stemmer that just reduces a plural to its singular form. The OpenSearch project recently added a stemmer that does this:
https://github.com/opensearch-project/OpenSearch/blob/main/modules/analysis-common/src/main/java/org/opensearch/analysis/common/EnglishPluralStemFilter.java

If no one else wants to do this, I will add this to Bleve...

@abhinavdangeti
Copy link
Member

@jgschis Thanks for bringing this up.

We'd welcome a contribution from you. If not, we will look into this eventually.

@jgschis
Copy link
Author

jgschis commented Nov 3, 2022

Yes I am doing it now...

@jgschis
Copy link
Author

jgschis commented Nov 3, 2022

I have ported the code to go. This is the first time i have made a go program, so no sure if I've done this right

https://github.com/jgschis/pluralstem/blob/master/english/english.go

@jgschis
Copy link
Author

jgschis commented Nov 9, 2022

HI,
I am now adding an "entry point" to the plural stemmer in Bleve.

Should I

@jgschis
Copy link
Author

jgschis commented Nov 14, 2022

Feel free to add this to the bleve project
https://github.com/jgschis/pluralstem

once done, i will make a pull request to add the entry point to this library to Bleve.

@jgschis
Copy link
Author

jgschis commented Dec 23, 2022

Hi,

@abhinavdangeti
What has to happen for my code to become part of bleve? I will do all the work, but could you please tell me what I have to do? I have already contributed to couchbase, so I know how to use gerrit.

@abhinavdangeti
Copy link
Member

@jgschis Go ahead and raise a pull request adding your code to this project.
There's instructions for it here.

@jgschis
Copy link
Author

jgschis commented Jan 4, 2023

Hi,

I am not changing bleve yet. First I need to add my plural stemmer repository to https://github.com/blevesearch. How do I do that?

Once it's added, I will make a pull request to update bleve so that it calls the code in the pluralstem project.

  1. Add https://github.com/jgschis/pluralstem to https://github.com/blevesearch
  2. Make https://github.com/blevesearch/bleve call https://github.com/jgschis/pluralstem

abhinavdangeti added a commit that referenced this issue Apr 6, 2023
+ This contribution was made by https://github.com/jgschis .
+ This has not been incorporated into the `en` analyzer.
+ The user will however be able to build a custom analyzer
  with the `en` components alongside this.
+ For: #1750
@abhinavdangeti
Copy link
Member

@jgschis I've ported your code in with this PR: #1808

abhinavdangeti added a commit that referenced this issue Apr 6, 2023
+ This contribution was made by https://github.com/jgschis .
+ This has not been incorporated into the `en` analyzer.
+ The user will however be able to build a custom analyzer
  with the `en` components alongside this.
+ For: #1750
+ Also: https://issues.couchbase.com/browse/MB-56359
abhinavdangeti added a commit that referenced this issue Apr 7, 2023
+ This contribution was made by https://github.com/jgschis .
+ This has not been incorporated into the `en` analyzer.
+ The user will however be able to build a custom analyzer
  with the `en` components alongside this.
+ For: #1750
+ Also: https://issues.couchbase.com/browse/MB-56359
@abhinavdangeti abhinavdangeti added this to the v2.3.8 milestone Apr 7, 2023
@jgschis
Copy link
Author

jgschis commented Apr 8, 2023

Thanks

abhinavdangeti added a commit that referenced this issue May 16, 2023
+ This contribution was made by https://github.com/jgschis .
+ This has not been incorporated into the `en` analyzer.
+ The user will however be able to build a custom analyzer
  with the `en` components alongside this.
+ For: #1750
+ Also: https://issues.couchbase.com/browse/MB-56359
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants