Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-word synonyms are unaffected by slop settings #5

Closed
nolanlawson opened this issue Jan 31, 2013 · 7 comments
Closed

Multi-word synonyms are unaffected by slop settings #5

nolanlawson opened this issue Jan 31, 2013 · 7 comments
Labels
Milestone

Comments

@nolanlawson
Copy link
Member

Environment: Mac OS X, Solr 4.1.0

Steps to reproduce: Follow the "Getting Started" instructions, then enter the following test data:

URL='http://localhost:8983/solr/update'

curl $URL -H "Content-Type: text/xml" -d '<delete><query>*:*</query></delete>'

curl $URL/json -H 'Content-type:application/json' -d '
[ { 
    "id"   : "1",
    "name" : "dog"
  },
  { 
    "id"   : "2",
    "name" : "pooch"
  },
  { 
    "id"   : "3",
    "name" : "hound"
  },
  { 
    "id"   : "4",
    "name" : "canis familiaris"
  },
  { 
    "id"   : "5",
    "name" : "canis"
  },
  { 
    "id"   : "6",
    "name" : "familiaris"
  },
  { 
    "id"   : "7",
    "name" : "familiaris canis"
  } ]'
curl "$URL/?commit=true"

Then browse to the url:

http://localhost:8983/solr/select/?q=dog&debugQuery=on&qf=text&defType=synonym_edismax&synonyms=true

Expected result: "pooch","dog","hound",and "canis familiaris" are matched.

Actual result: In addition to these 4, "familiaris canis" is also matched, because the query parser doesn't construct the multi-word synonym as a phrase query.

@janhoy
Copy link
Contributor

janhoy commented Jan 31, 2013

I'd propose that it is contructed as a phrase query, and that you add another config option for phraseSlop which will then be set for the generated phrases, i.e. "canis familiaris"~5

If you do not specify a phrase slop, all phrases will probably inherit the global "qs" parameter, but not sure about this.

@nolanlawson
Copy link
Member Author

It seems to me now that this is not a bug so much as missing functionality. It's not actually true that we want canis familiaris to work as a phrase query, because (as you say) there are issues of mm, qs, ps, ps2, and ps3 that take precedence here.

I would be more inclined to say that we should inherit the global settings for each of these parameters, but more on that in a moment...

Investigating a bit more with the code, it appears that with the synonyms:

dog,hound,pooch,canis familiaris,man's best friend

... the mm is working as expected. E.g. with mm=100% I get a parsed query of:

+((text:dog) ((((text:canis) (text:familiaris))~2) (text:hound) 
(((text:man's) (text:best) (text:friend))~3) (text:pooch)))

and with mm=67% I get:

+((text:dog) ((((text:canis) (text:familiaris))~1) (text:hound) 
(((text:man's) (text:best) (text:friend))~2) (text:pooch)))

Both of these are correct.

However, qs, ps, ps2, and ps3 only affect the main query, not the synonyms. I realize it's because I foresaw the problems with mm, but not the others (see this code).

Probably what I ought to be doing is just deferring to the superclass to handle all of this phrase boosting, rather than rewriting everything myself. But in any case, this bug is not as severe as I originally thought.

Really, I think it's just a new feature: Apply qs, ps, ps2, and ps3 to the synonym queries, the same as the main query.

@janhoy
Copy link
Contributor

janhoy commented Feb 1, 2013

I still see no way of having multi-word synonyms being applied as a phrase. The qs parameter only comes into play for explicit phrase queries.

I think some users may want the MM functionality to apply for multi-word synonyms, but others need explicit phrase. Perhaps a config synonyms.constructPhrases=true that would construct phrase instead of MM-style, and that would inherit the "qs" setting for slop.

@nolanlawson
Copy link
Member Author

Ah, I see what you're saying. I suppose some users would prefer for dog to expand to "canis familiaris" (with the quotes) rather than canis familiaris (without). To me, it was more intuitive that the query stayed exactly the same as the user entered it, but I can see the other side of the argument as well.

Question, though: should this only apply if the user query is a single token and the synonym is multi-token? If the user enters canis familiaris (without quotes), it would be kind of strange for the QParser to generate "man's best friend" with quotes. And if that's the case, does that mean I need to detect how many tokens are in the original user query? It gets kinda tricky.

@janhoy
Copy link
Contributor

janhoy commented Feb 1, 2013

I think most users enter queries without quotes, and we cannot know whether it is an AND or a PHRASE query in their head. But when we expand a synonym from the dictionary we KNOW that it is a phrase, and should be searched as a phrase in my opinion. Doesn't matter if user enters the original query quoted or not.

But if you have a param to enable/disable phrasing then those that see a need for MM treatment of phrases from the synonym dictionary can use that, but those who want prases will get that, possibly with a slop from qs.

@nolanlawson
Copy link
Member Author

Point well taken. So just to summarize, synonyms.constructPhrases=true will cause all synonyms to be wrapped in a phrase query. I can work this into v1.2.2.

@oweneustice
Copy link
Contributor

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants