-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly build solr request parameters avoiding lists #24
Correctly build solr request parameters avoiding lists #24
Conversation
- The previous method was creating lists inside the dictionary values such as: ```{'__bigquerySource': [u'Library: bla'], '__qid': [u'9297b2910ea5d9f2fff6fcb042d4edbc'], 'fl': u'title,abstract,comment,bibcode,author,keyword,id,citation_count,[citations],pub,aff,volume,pubdate,doi,pub_raw,page,links_data,property', 'fq': [u'{!bitset}'], 'hl': [u'true'], 'hl.fl': [u'title,abstract,body,ack'], 'hl.maxAnalyzedChars': [u'150000'], 'hl.q': [u'*:*'], 'hl.requireFieldMatch': [u'true'], 'hl.usePhraseHighlighter': [u'true'], 'q': [u'*:*'], 'rows': [u'25'], 'sort': [u'date desc, bibcode desc'], 'start': [u'0'], 'wt': 'json'} ``` And solr was not correctly interpreting the rows parameter, returning as many results as bibcodes were provided in the request data.
solr/views.py
Outdated
if 'rows' in payload and int(payload['rows'][0]) > max_rows: | ||
|
||
# Ensure all values are strings and not lists | ||
for key, value in payload.iteritems(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a problem with this approach is 'fq' - solr can accept multiple filters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code will convert:
{
'fq': [u'{!bitset}'],
'rows': [u'25'],
'sort': [u'date desc, bibcode desc'],
'start': [u'0'],
'wt': 'json'
}
into:
{
'fq': u'{!bitset}',
'rows': u'25',
'sort': u'date desc, bibcode desc',
'start': u'0',
'wt': 'json'
}
In case there are multiple fq, the payload will look like this:
{
'fq': [u'{!bitset}', u'author', u'title'],
'rows': [u'25'],
'sort': [u'date desc, bibcode desc'],
'start': [u'0'],
'wt': 'json'
}
and it will be converted to:
{
'fq': u'{!bitset},author,title',
'rows': u'25',
'sort': u'date desc, bibcode desc',
'start': u'0',
'wt': 'json'
}
Am I misunderstanding your comment? It does not seem to be a problem with multiple fq. Notice also that when the clean function is called:
self.cleanup_solr_request(dict(request.args)...)
The dict conversion of the argument converts multiple fq (e.g., "URL?fq=author&fq=title") into a single key (i.e. 'fq') with a list of values (i.e. ["author", "title"]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fq needs to be a list of values, note that a single 'fq' conditin can contain a comma - so:
payload['fq'] = ','.join(['pos(1,author:foo)', 'title'])
will not give you the right data when you do payload['fq'].split(',')
-- and if we did that (I didn't look), it would be at least weird - to join array into a string, in order to create a list out of the string in the end...
solr/views.py
Outdated
elif len(filter(lambda x: '!bitset' in x, query['fq'])) == 0: | ||
query['fq'].append(u'{!bitset}') | ||
query['fq'] = '{!bitset}' | ||
elif '{!bitset}' not in query['fq']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this wouldn't match (a syntactically correct) '{ !bitset }' or `{!bitset compression=gzip}'
solr/views.py
Outdated
query['fq'].append(u'{!bitset}') | ||
query['fq'] = '{!bitset}' | ||
elif '{!bitset}' not in query['fq']: | ||
query['fq'] = ",".join(query['fq'].split(",") + [u'{!bitset}']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm slightly confused by this; {!....}
is usually used as a prefix; in the case of a bigquery it stands alone fq={!bitset}
but I think this is not going to work: title:redshift,{!bitset}
at least that would be a big surprise (for me) if it did :)
@marblestation not sure how the requests serializes the parameters, but when i try multiple row params, solr takes the first one: |
- Apart from '{!bibset}', recognise also '{ !bitset }' or `{!bitset compression=gzip}' as valid
43729cf changes will allow multiple rows parameters and select the first one as you mentioned that solr behaves. |
btw: i mentioned the settings in requests, i was confusing requests and urrlib https://docs.python.org/2/library/urllib.html#urllib.urlencode and requests seems to be already doing it: https://github.com/requests/requests/blob/master/requests/models.py#L105 |
I understood the first issue you pointed out with fq (thanks!), I changed things to ensure it is always a list and treated as so. |
After discussing in person, the strategy to convert everything into lists does not seem reasonable. It's better to discard this route and start over. |
And solr was not correctly interpreting the rows parameter, returning as many results as bibcodes were provided in the request data.