-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BiG-CZ: Always conjunct CINERGI query items #2262
Conversation
In a test area, the searches "water soil", "water and soil", and "water or soil" all returned the same number of results. Looking at the API requests, all of those searches are converted to "water and soil". Do we want to prevent a user from doing an "OR" based search? |
Hmm, my impression was that AND was significantly more valuable than OR, but perhaps we should support explicit ORs. I'll transform the query into a sum of products format and that should fix it. Fixup coming soon. |
With the latest commit 0135b0c: In [23]: prepare_query('new york')
Out[23]: 'new AND york'
In [24]: prepare_query('new and york')
Out[24]: 'new AND york'
In [25]: prepare_query('new or york')
Out[25]: 'new OR york' |
|
||
factors = re.split(' or ', query, flags=re.IGNORECASE) | ||
|
||
return ' OR '.join(map(intersperse_and, factors)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier I tried writing it as:
return ' OR '.join([' AND '.join(w for w in f.split()
if not re.match('and', w,
flags=re.IGNORECASE))
for f in factors])
but thought it less clear. Any suggestions for improving readability welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe if I replace factors
with phrases
it'll be a little nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a small improvement, but if the API is not case sensitive, you could convert everything to lower/uppercase beforehand which would let you drop flags=re.IGNORECASE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the more verbose way, like what you have now, is more clear and readable than the condensed version. Consider adding a quick comment before each statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked, the API is not case sensitive. Will replace the re
stuff and that should help make it easier to read. Will add comments if it's still not clear enough. Thanks
Taking another look. |
Okay, this should be ready for another look now. Functionally the same, the code looks a little neater, and has more comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Getting different results between "water and soil" and "water soil", and "water or soil". Nice job.
Thanks for the feedback! Going to squash and merge. |
By default, the CINERGI query items would be ORd, thus expanding the search as more terms were specified instead of narrowing it down. By always ANDing the terms, we ensure that the more information a user provides, the narrower the search becomes.
2bc313d
to
df3aa2f
Compare
Overview
By default, the CINERGI query items would be ORd, thus expanding the search as more terms were specified instead of narrowing it down. By always ANDing the terms, we ensure that the more information a user provides, the narrower the search becomes.
Connects #2228
Demo
Testing Instructions