Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly build solr request parameters avoiding lists #24

Conversation

marblestation
Copy link
Contributor

  • The previous method was creating lists inside the dictionary values such as:
 '__qid': [u'9297b2910ea5d9f2fff6fcb042d4edbc'],
 'fl': u'title,abstract,comment,bibcode,author,keyword,id,citation_count,[citations],pub,aff,volume,pubdate,doi,pub_raw,page,links_data,property',
 'fq': [u'{!bitset}'],
 'hl': [u'true'],
 'hl.fl': [u'title,abstract,body,ack'],
 'hl.maxAnalyzedChars': [u'150000'],
 'hl.q': [u'*:*'],
 'hl.requireFieldMatch': [u'true'],
 'hl.usePhraseHighlighter': [u'true'],
 'q': [u'*:*'],
 'rows': [u'25'],
 'sort': [u'date desc, bibcode desc'],
 'start': [u'0'],
 'wt': 'json'}

And solr was not correctly interpreting the rows parameter, returning as many results as bibcodes were provided in the request data.

- The previous method was creating lists inside the dictionary values such as:

```{'__bigquerySource': [u'Library: bla'],
 '__qid': [u'9297b2910ea5d9f2fff6fcb042d4edbc'],
 'fl': u'title,abstract,comment,bibcode,author,keyword,id,citation_count,[citations],pub,aff,volume,pubdate,doi,pub_raw,page,links_data,property',
 'fq': [u'{!bitset}'],
 'hl': [u'true'],
 'hl.fl': [u'title,abstract,body,ack'],
 'hl.maxAnalyzedChars': [u'150000'],
 'hl.q': [u'*:*'],
 'hl.requireFieldMatch': [u'true'],
 'hl.usePhraseHighlighter': [u'true'],
 'q': [u'*:*'],
 'rows': [u'25'],
 'sort': [u'date desc, bibcode desc'],
 'start': [u'0'],
 'wt': 'json'}
```

And solr was not correctly interpreting the rows parameter, returning as many results as bibcodes were provided in the request data.
@coveralls
Copy link

coveralls commented Mar 2, 2018

Coverage Status

Coverage decreased (-0.2%) to 96.41% when pulling abad5dc on marblestation:bibquery_rows_param_not_in_array2 into fc01766 on adsabs:master.

solr/views.py Outdated
if 'rows' in payload and int(payload['rows'][0]) > max_rows:

# Ensure all values are strings and not lists
for key, value in payload.iteritems():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a problem with this approach is 'fq' - solr can accept multiple filters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code will convert:

{
 'fq': [u'{!bitset}'],
 'rows': [u'25'],
 'sort': [u'date desc, bibcode desc'],
 'start': [u'0'],
 'wt': 'json'
}

into:

{
 'fq': u'{!bitset}',
 'rows': u'25',
 'sort': u'date desc, bibcode desc',
 'start': u'0',
 'wt': 'json'
}

In case there are multiple fq, the payload will look like this:

{
 'fq': [u'{!bitset}', u'author', u'title'],
 'rows': [u'25'],
 'sort': [u'date desc, bibcode desc'],
 'start': [u'0'],
 'wt': 'json'
}

and it will be converted to:

{
 'fq': u'{!bitset},author,title',
 'rows': u'25',
 'sort': u'date desc, bibcode desc',
 'start': u'0',
 'wt': 'json'
}

Am I misunderstanding your comment? It does not seem to be a problem with multiple fq. Notice also that when the clean function is called:

self.cleanup_solr_request(dict(request.args)...)

The dict conversion of the argument converts multiple fq (e.g., "URL?fq=author&fq=title") into a single key (i.e. 'fq') with a list of values (i.e. ["author", "title"]).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fq needs to be a list of values, note that a single 'fq' conditin can contain a comma - so:

payload['fq'] = ','.join(['pos(1,author:foo)', 'title'])

will not give you the right data when you do payload['fq'].split(',') -- and if we did that (I didn't look), it would be at least weird - to join array into a string, in order to create a list out of the string in the end...

solr/views.py Outdated
elif len(filter(lambda x: '!bitset' in x, query['fq'])) == 0:
query['fq'].append(u'{!bitset}')
query['fq'] = '{!bitset}'
elif '{!bitset}' not in query['fq']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wouldn't match (a syntactically correct) '{ !bitset }' or `{!bitset compression=gzip}'

solr/views.py Outdated
query['fq'].append(u'{!bitset}')
query['fq'] = '{!bitset}'
elif '{!bitset}' not in query['fq']:
query['fq'] = ",".join(query['fq'].split(",") + [u'{!bitset}'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm slightly confused by this; {!....} is usually used as a prefix; in the case of a bigquery it stands alone fq={!bitset} but I think this is not going to work: title:redshift,{!bitset}

at least that would be a big surprise (for me) if it did :)

@romanchyla
Copy link
Contributor

@marblestation not sure how the requests serializes the parameters, but when i try multiple row params, solr takes the first one: http://localhost:9984/solr/collection1/select?indent=on&q=*:*&rows=9&wt=json&rows=3&fl=bibcode will return 9 results; i think there is a setting in the requests to tell it to treat single item arrays as normal strings. that would be preferable

- Apart from '{!bibset}', recognise also '{ !bitset }' or `{!bitset compression=gzip}' as valid
@marblestation
Copy link
Contributor Author

43729cf changes will allow multiple rows parameters and select the first one as you mentioned that solr behaves.

@romanchyla
Copy link
Contributor

btw: i mentioned the settings in requests, i was confusing requests and urrlib

https://docs.python.org/2/library/urllib.html#urllib.urlencode

and requests seems to be already doing it: https://github.com/requests/requests/blob/master/requests/models.py#L105

@marblestation
Copy link
Contributor Author

I understood the first issue you pointed out with fq (thanks!), I changed things to ensure it is always a list and treated as so.

@marblestation
Copy link
Contributor Author

After discussing in person, the strategy to convert everything into lists does not seem reasonable. It's better to discard this route and start over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants