Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9030: Fix different Solr- and WordnetSynonymParser behaviour #981

Merged
merged 2 commits into from
Nov 13, 2019

Conversation

cbuescher
Copy link
Contributor

This fixes an issue where sets of equivalent synonyms in the Wordnet format are
parsed and added to the SynonymMap in a way that leads to the original input
token not being typed as "word" but as SYNONYM instead. Also the original token
doesn't appear first in the token stream output, which is the case for
equivalent solr formatted synonym files.

Currently the WordnetSynonymParser adds all combinations of input/output pairs
of a synset entry into the synonym map, while the SolrSynonymParser excludes
those where input and output term are the same. This change adds the same
behaviour to WordnetSynonymParser and adds tests that show the two formats are
outputting the same token order and types now.

This fixes an issue where sets of equivalent synonyms in the Wordnet format are
parsed and added to the SynonymMap in a way that leads to the original input
token not being typed as "word" but as SYNONYM instead. Also the original token
doesn't appear first in the token stream output, which is the case for
equivalent solr formatted synonym files.
Currently the WordnetSynonymParser adds all combinations of input/output pairs
of a synset entry into the synonym map, while the SolrSynonymParser excludes
those where input and output term are the same. This change adds the same
behaviour to WordnetSynonymParser and adds tests that show the two formats are
outputting the same token order and types now.
@romseygeek
Copy link
Contributor

Could you add a CHANGES.txt entry? All looks good other than that, and precommit passes locally for me.

@cbuescher
Copy link
Contributor Author

@romseygeek thanks for the review, I updated the CHANGES.txt.

@romseygeek romseygeek merged commit 3a7b25b into apache:master Nov 13, 2019
asfgit pushed a commit that referenced this pull request Nov 13, 2019
…981)

This fixes an issue where sets of equivalent synonyms in the Wordnet format are
parsed and added to the SynonymMap in a way that leads to the original input
token not being typed as "word" but as SYNONYM instead. Also the original token
doesn't appear first in the token stream output, which is the case for
equivalent solr formatted synonym files.
Currently the WordnetSynonymParser adds all combinations of input/output pairs
of a synset entry into the synonym map, while the SolrSynonymParser excludes
those where input and output term are the same. This change adds the same
behaviour to WordnetSynonymParser and adds tests that show the two formats are
outputting the same token order and types now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants