Correctly build solr request parameters avoiding lists #24

marblestation · 2018-03-02T16:57:32Z

The previous method was creating lists inside the dictionary values such as:

 '__qid': [u'9297b2910ea5d9f2fff6fcb042d4edbc'],
 'fl': u'title,abstract,comment,bibcode,author,keyword,id,citation_count,[citations],pub,aff,volume,pubdate,doi,pub_raw,page,links_data,property',
 'fq': [u'{!bitset}'],
 'hl': [u'true'],
 'hl.fl': [u'title,abstract,body,ack'],
 'hl.maxAnalyzedChars': [u'150000'],
 'hl.q': [u'*:*'],
 'hl.requireFieldMatch': [u'true'],
 'hl.usePhraseHighlighter': [u'true'],
 'q': [u'*:*'],
 'rows': [u'25'],
 'sort': [u'date desc, bibcode desc'],
 'start': [u'0'],
 'wt': 'json'}

And solr was not correctly interpreting the rows parameter, returning as many results as bibcodes were provided in the request data.

- The previous method was creating lists inside the dictionary values such as: ```{'__bigquerySource': [u'Library: bla'], '__qid': [u'9297b2910ea5d9f2fff6fcb042d4edbc'], 'fl': u'title,abstract,comment,bibcode,author,keyword,id,citation_count,[citations],pub,aff,volume,pubdate,doi,pub_raw,page,links_data,property', 'fq': [u'{!bitset}'], 'hl': [u'true'], 'hl.fl': [u'title,abstract,body,ack'], 'hl.maxAnalyzedChars': [u'150000'], 'hl.q': [u'*:*'], 'hl.requireFieldMatch': [u'true'], 'hl.usePhraseHighlighter': [u'true'], 'q': [u'*:*'], 'rows': [u'25'], 'sort': [u'date desc, bibcode desc'], 'start': [u'0'], 'wt': 'json'} ``` And solr was not correctly interpreting the rows parameter, returning as many results as bibcodes were provided in the request data.

coveralls · 2018-03-02T17:01:06Z

Coverage decreased (-0.2%) to 96.41% when pulling abad5dc on marblestation:bibquery_rows_param_not_in_array2 into fc01766 on adsabs:master.

romanchyla · 2018-03-02T17:51:57Z

solr/views.py

-        if 'rows' in payload and int(payload['rows'][0]) > max_rows:
+
+        # Ensure all values are strings and not lists
+        for key, value in payload.iteritems():


a problem with this approach is 'fq' - solr can accept multiple filters

The code will convert:

{ 'fq': [u'{!bitset}'], 'rows': [u'25'], 'sort': [u'date desc, bibcode desc'], 'start': [u'0'], 'wt': 'json' }

into:

{ 'fq': u'{!bitset}', 'rows': u'25', 'sort': u'date desc, bibcode desc', 'start': u'0', 'wt': 'json' }

In case there are multiple fq, the payload will look like this:

{ 'fq': [u'{!bitset}', u'author', u'title'], 'rows': [u'25'], 'sort': [u'date desc, bibcode desc'], 'start': [u'0'], 'wt': 'json' }

and it will be converted to:

{ 'fq': u'{!bitset},author,title', 'rows': u'25', 'sort': u'date desc, bibcode desc', 'start': u'0', 'wt': 'json' }

Am I misunderstanding your comment? It does not seem to be a problem with multiple fq. Notice also that when the clean function is called:

self.cleanup_solr_request(dict(request.args)...)

The dict conversion of the argument converts multiple fq (e.g., "URL?fq=author&fq=title") into a single key (i.e. 'fq') with a list of values (i.e. ["author", "title"]).

fq needs to be a list of values, note that a single 'fq' conditin can contain a comma - so:

payload['fq'] = ','.join(['pos(1,author:foo)', 'title'])

will not give you the right data when you do payload['fq'].split(',') -- and if we did that (I didn't look), it would be at least weird - to join array into a string, in order to create a list out of the string in the end...

romanchyla · 2018-03-02T17:52:46Z

solr/views.py

-        elif len(filter(lambda x: '!bitset' in x, query['fq'])) == 0:
-            query['fq'].append(u'{!bitset}')
+            query['fq'] = '{!bitset}'
+        elif '{!bitset}' not in query['fq']:


this wouldn't match (a syntactically correct) '{ !bitset }' or `{!bitset compression=gzip}'

romanchyla · 2018-03-02T17:56:28Z

solr/views.py

-            query['fq'].append(u'{!bitset}')
+            query['fq'] = '{!bitset}'
+        elif '{!bitset}' not in query['fq']:
+            query['fq'] = ",".join(query['fq'].split(",") + [u'{!bitset}'])


i'm slightly confused by this; {!....} is usually used as a prefix; in the case of a bigquery it stands alone fq={!bitset} but I think this is not going to work: title:redshift,{!bitset}

at least that would be a big surprise (for me) if it did :)

romanchyla · 2018-03-02T18:03:24Z

@marblestation not sure how the requests serializes the parameters, but when i try multiple row params, solr takes the first one: http://localhost:9984/solr/collection1/select?indent=on&q=*:*&rows=9&wt=json&rows=3&fl=bibcode will return 9 results; i think there is a setting in the requests to tell it to treat single item arrays as normal strings. that would be preferable

- Apart from '{!bibset}', recognise also '{ !bitset }' or `{!bitset compression=gzip}' as valid

marblestation · 2018-03-05T19:40:57Z

43729cf changes will allow multiple rows parameters and select the first one as you mentioned that solr behaves.

romanchyla · 2018-03-05T21:14:11Z

btw: i mentioned the settings in requests, i was confusing requests and urrlib

https://docs.python.org/2/library/urllib.html#urllib.urlencode

and requests seems to be already doing it: https://github.com/requests/requests/blob/master/requests/models.py#L105

marblestation · 2018-03-05T22:15:01Z

I understood the first issue you pointed out with fq (thanks!), I changed things to ensure it is always a list and treated as so.

marblestation · 2018-03-07T14:58:24Z

After discussing in person, the strategy to convert everything into lists does not seem reasonable. It's better to discard this route and start over.

romanchyla suggested changes Mar 2, 2018

View reviewed changes

marblestation added 2 commits March 5, 2018 20:24

Recognise accepted variations of !bibset

22f9f56

- Apart from '{!bibset}', recognise also '{ !bitset }' or `{!bitset compression=gzip}' as valid

Accept multiple row params and select the first one

43729cf

Expanded unittests

b26db44

Ensure 'fq' param is always a list

abad5dc

marblestation closed this Mar 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly build solr request parameters avoiding lists #24

Correctly build solr request parameters avoiding lists #24

marblestation commented Mar 2, 2018

coveralls commented Mar 2, 2018 •

edited

Loading

romanchyla Mar 2, 2018

marblestation Mar 5, 2018

romanchyla Mar 5, 2018

romanchyla Mar 2, 2018

romanchyla Mar 2, 2018

romanchyla commented Mar 2, 2018

marblestation commented Mar 5, 2018

romanchyla commented Mar 5, 2018

marblestation commented Mar 5, 2018

marblestation commented Mar 7, 2018

Correctly build solr request parameters avoiding lists #24

Correctly build solr request parameters avoiding lists #24

Conversation

marblestation commented Mar 2, 2018

coveralls commented Mar 2, 2018 • edited Loading

romanchyla Mar 2, 2018

Choose a reason for hiding this comment

marblestation Mar 5, 2018

Choose a reason for hiding this comment

romanchyla Mar 5, 2018

Choose a reason for hiding this comment

romanchyla Mar 2, 2018

Choose a reason for hiding this comment

romanchyla Mar 2, 2018

Choose a reason for hiding this comment

romanchyla commented Mar 2, 2018

marblestation commented Mar 5, 2018

romanchyla commented Mar 5, 2018

marblestation commented Mar 5, 2018

marblestation commented Mar 7, 2018

coveralls commented Mar 2, 2018 •

edited

Loading