Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BiG-CZ: Always conjunct CINERGI query items #2262

Merged
merged 1 commit into from
Sep 18, 2017
Merged

Conversation

rajadain
Copy link
Member

Overview

By default, the CINERGI query items would be ORd, thus expanding the search as more terms were specified instead of narrowing it down. By always ANDing the terms, we ensure that the more information a user provides, the narrower the search becomes.

Connects #2228

Demo

2017-09-15 17 57 30

Testing Instructions

  • Check out this branch, and go to :8000/?bigcz
  • Search for something on the CINERGI tab
  • Search for more things. Ensure that as the query gets longer, the results get fewer.

@caseycesari
Copy link
Member

In a test area, the searches "water soil", "water and soil", and "water or soil" all returned the same number of results. Looking at the API requests, all of those searches are converted to "water and soil". Do we want to prevent a user from doing an "OR" based search?

@rajadain
Copy link
Member Author

Hmm, my impression was that AND was significantly more valuable than OR, but perhaps we should support explicit ORs. I'll transform the query into a sum of products format and that should fix it. Fixup coming soon.

@rajadain
Copy link
Member Author

With the latest commit 0135b0c:

In [23]: prepare_query('new york')
Out[23]: 'new AND york'

In [24]: prepare_query('new and york')
Out[24]: 'new AND york'

In [25]: prepare_query('new or york')
Out[25]: 'new OR york'


factors = re.split(' or ', query, flags=re.IGNORECASE)

return ' OR '.join(map(intersperse_and, factors))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier I tried writing it as:

return ' OR '.join([' AND '.join(w for w in f.split()
                                 if not re.match('and', w,
                                                 flags=re.IGNORECASE))
                    for f in factors])

but thought it less clear. Any suggestions for improving readability welcome.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if I replace factors with phrases it'll be a little nicer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small improvement, but if the API is not case sensitive, you could convert everything to lower/uppercase beforehand which would let you drop flags=re.IGNORECASE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more verbose way, like what you have now, is more clear and readable than the condensed version. Consider adding a quick comment before each statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked, the API is not case sensitive. Will replace the re stuff and that should help make it easier to read. Will add comments if it's still not clear enough. Thanks

@caseycesari
Copy link
Member

Taking another look.

@rajadain
Copy link
Member Author

Okay, this should be ready for another look now. Functionally the same, the code looks a little neater, and has more comments.

Copy link
Member

@caseycesari caseycesari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Getting different results between "water and soil" and "water soil", and "water or soil". Nice job.

@caseycesari caseycesari assigned rajadain and unassigned caseycesari Sep 18, 2017
@rajadain
Copy link
Member Author

Thanks for the feedback! Going to squash and merge.

By default, the CINERGI query items would be ORd, thus expanding
the search as more terms were specified instead of narrowing it
down. By always ANDing the terms, we ensure that the more
information a user provides, the narrower the search becomes.
@rajadain rajadain merged commit 25f06ed into develop Sep 18, 2017
@rajadain rajadain deleted the tt/bigcz-cinergi-and branch September 18, 2017 17:24
@rajadain rajadain mentioned this pull request Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants