Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept multiple values per field in SearchService #2913

Merged
merged 2 commits into from
Dec 17, 2018

Conversation

jeanleonov
Copy link
Contributor

The PR contains two commits: the first is related to ability to debug the problem when the second is about actual fix.
solr_interface was building a list of fields to update in index schema by just looping through document fields and ignoring fields which are already in schema. But it didn't take in account that document can contain multiple values for the field, so it was sending duplicated fields in update_schema request.
https://ocd.appscale.com:8080/job/Daily%20Build/5623

@jeanleonov
Copy link
Contributor Author

I have to say that it actually doesn't bring multiValued functionality to SearchService.
It just allows user to index documents with multi-valued fields with no errors. The only one of multiple will be queryable.
Full multiValue field functionality will be possibly implemented before next release. Though it's much more complex task.

@jeanleonov
Copy link
Contributor Author

@jeanleonov
Copy link
Contributor Author

Demo

Prerequisite

Using remote_api_shell a list of fields with the same name and type are created (multivalued field).

Before

remote_api_shell

In [165]: index.put(search.Document(doc_id='doc2', fields=fields))
... stacktrace ...
PutError: one or more put document operations failed

search log

2018-12-06 10:59:12,854 ERROR search_api.py:117 Exception raised while indexing document 
2018-12-06 10:59:12,854 ERROR search_api.py:118 HTTP Error 400: Bad Request 
Traceback (most recent call last):
  File "/root/appscale/SearchService/search_api.py", line 114, in index_document
    self.solr_conn.update_document(request.app_id(), doc, index_spec)
  File "/root/appscale/SearchService/solr_interface.py", line 239, in update_document
    self.update_schema(updates)
  File "/root/appscale/SearchService/solr_interface.py", line 143, in update_schema
    conn = urllib2.urlopen(req)
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 400: Bad Request

After

In [166]: index.put(search.Document(doc_id='doc2', fields=fields))
Out[166]: [search.PutResult(code='OK', id=u'doc2')]

Note

This PR doesn't actually make multivalued fields work. It just makes it possible to avoid error when document is indexed.

In [170]: index.search('one')
Out[170]: search.SearchResults(number_found=0L)

In [171]: index.search('two')
Out[171]: search.SearchResults(number_found=0L)

In [172]: index.search('three')
Out[172]: search.SearchResults(results=[search.ScoredDocument(doc_id=u'doc2', fields=[search.AtomField(name=u'multival', value=u'three')], language=u'en', rank=0), search.ScoredDocument(doc_id=u'doc2', fields=[search.AtomField(name=u'multival', value=u'three')], language=u'en', rank=0)], number_found=2L)

So the only latest value of the field is indexed and stored.

As Search API is still experimental and we're going to implement multivalued fields support I would vote for merging it anyway as a first step in that direction.

@scragraham scragraham added this to the 3.7.0 milestone Dec 6, 2018
@sjones4 sjones4 self-requested a review December 12, 2018 20:00
@jeanleonov
Copy link
Contributor Author

@scragraham scragraham merged commit b3ba67f into AppScale:master Dec 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants