Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr6 template upgrade #1504

Merged
merged 55 commits into from
May 24, 2017
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
2955843
Fix django template context passing
RabidCicada Apr 6, 2017
a8d79ff
Fix to tests to run with context dicts instead of context objects for…
RabidCicada Apr 6, 2017
c4f73d6
Update solr template to be solr6 compatible
RabidCicada Apr 8, 2017
eaaf3a2
Reworked start-solr-test-server to work with modern solr. Reworked s…
RabidCicada Apr 11, 2017
d664828
Fixed LiveSolrContentExtractionTestCase Failure. Reworked core creat…
RabidCicada Apr 12, 2017
085fc91
Fixed LiveSolrAutocompleteTestCase Failure
RabidCicada Apr 12, 2017
493085c
Fix LiveSolrMoreLikeThisTestCase.
RabidCicada Apr 12, 2017
cfae3ed
Fix LiveSolrMoreLikeThisTestCase. Also fix the deferred case (whoops)
RabidCicada Apr 12, 2017
17fb620
Fix LiveSolrSearchQueryTestCase. Specifically spellcheck. Added spe…
RabidCicada Apr 13, 2017
9a375d4
Fix Collation based results. Add future plumbing for returning more …
RabidCicada Apr 13, 2017
3389712
Fix result_class swap failure
RabidCicada Apr 13, 2017
59bf22d
Attempting to get travis to see jdk8 request
RabidCicada Apr 13, 2017
ac52128
Trying to get a travis platform that supports jdk setting
RabidCicada Apr 13, 2017
f384e74
Adjusting matrix to include django 1.11. Adjusting wait_for_solr scr…
RabidCicada Apr 13, 2017
4197a9f
Troubleshooting travis failure that is not replicatable here
RabidCicada Apr 13, 2017
5a51fa0
more troubleshooting
RabidCicada Apr 13, 2017
544378a
more troubleshooting
RabidCicada Apr 13, 2017
0eae986
Fix wrong object in test for spelling suggestions.
RabidCicada Apr 14, 2017
db19db6
More troubleshooting
RabidCicada Apr 14, 2017
045fa19
More troubleshooting and fixing old test back to original check
RabidCicada Apr 14, 2017
0a6613d
Printing raw response that I found existed:)
RabidCicada Apr 14, 2017
d4e7a5b
Thinking solr versoin is wrong.
RabidCicada Apr 14, 2017
b356097
Final Fixes to support 6.4.0 and 6.5.0 spelling suggestions
RabidCicada Apr 14, 2017
f3a30f1
Working template management and tests. Lots of plumbing to test. Mo…
RabidCicada Apr 18, 2017
24ee453
Fixing a problem introduced in build_template.
RabidCicada Apr 18, 2017
19fbf00
Allow overriding collate for spellcheck at most entrypoints that acce…
RabidCicada Apr 18, 2017
30058af
Python 3 compatibility updates
RabidCicada Apr 18, 2017
49388f2
PEP8 Fixes. Mostly ignoring line length PEP violations due to concis…
RabidCicada Apr 19, 2017
042d6c3
Merge and deconflict of upstream PEP8 changes
RabidCicada Apr 19, 2017
9f8c583
Run isort on files updated in this branch
acdha Apr 19, 2017
3b94141
Remove unused imports
acdha Apr 19, 2017
ed228fb
PEP-8
acdha Apr 19, 2017
c4e30e1
PEP-8
acdha Apr 19, 2017
3d5583c
Tidy imports
acdha Apr 19, 2017
898a51b
build_solr_schema: less abbreviated keyword argument name
acdha Apr 19, 2017
0509703
build_solr_schema tidying
acdha Apr 19, 2017
4ccc96a
Test Solr launcher updates
acdha Apr 19, 2017
65c592c
Moved constants.HAYSTACK_DOCUMENT_FIELD to constants.DOCUMENT_FIELD t…
RabidCicada Apr 20, 2017
f8bad8f
Addressing Chris' comments on boolean check
RabidCicada Apr 20, 2017
1d810db
Addressing Chris' comments on comment style :) >.<
RabidCicada Apr 20, 2017
bd877a5
Update Solr spelling suggestion handling
acdha Apr 21, 2017
46fb6c2
Solr: ensure that the default document field is always applied
acdha Apr 21, 2017
6dca6ac
Update Solr management command tests
acdha Apr 25, 2017
477c677
Tests: better name for Solr-specific management commands
acdha Apr 25, 2017
168c9ae
Update build_solr_schema arguments
acdha Apr 25, 2017
6cf4f26
Tidy Solr backend tests
acdha Apr 25, 2017
245b946
Tidying test suite
acdha Apr 25, 2017
6501ce6
Updated docs to match new implementation
RabidCicada Apr 27, 2017
a46a515
Removed Unnecessary stopword files as requested
RabidCicada Apr 27, 2017
01dc17b
Updated docs to add warning about template filename change. Fixed typo
RabidCicada Apr 28, 2017
47a11b7
Cleaner approach based on acdh's comments. We don't carry around bag…
RabidCicada May 1, 2017
0e41ff5
Attempt to fix on Travis. I guess it runs from different directory
RabidCicada May 1, 2017
03209da
build_solr_schema: reload should not assume the backend name
acdha May 22, 2017
5608db1
Tidy build_solr_schema help text and exceptions
acdha May 22, 2017
038dd5f
Documentation copy-editing

acdha May 24, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 5 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
sudo: true
sudo: required
dist: trusty
language: python
python:
- 2.7
Expand All @@ -11,6 +12,8 @@ cache:
pip: true
directories:
- $HOME/download-cache
jdk:
- oraclejdk8

addons:
apt_packages:
Expand Down Expand Up @@ -59,6 +62,7 @@ env:
- DJANGO_VERSION=">=1.8,<1.9" VERSION_ES=">=2.0.0,<3.0.0"
- DJANGO_VERSION=">=1.9,<1.10" VERSION_ES=">=2.0.0,<3.0.0"
- DJANGO_VERSION=">=1.10,<1.11" VERSION_ES=">=2.0.0,<3.0.0"
- DJANGO_VERSION=">=1.11,<1.12" VERSION_ES=">=2.0.0,<3.0.0"

matrix:
allow_failures:
Expand Down
3 changes: 2 additions & 1 deletion haystack/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,8 @@
# DEBUG = True.
def reset_search_queries(**kwargs):
for conn in connections.all():
conn.reset_queries()
if conn:
conn.reset_queries()


if settings.DEBUG:
Expand Down
48 changes: 39 additions & 9 deletions haystack/backends/solr_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ def __init__(self, connection_alias, **connection_options):
if 'URL' not in connection_options:
raise ImproperlyConfigured("You must specify a 'URL' in your settings for connection '%s'." % connection_alias)

self.collate = connection_options.get('COLLATE_SPELLING', True)

self.conn = Solr(connection_options['URL'], timeout=self.timeout,
**connection_options.get('KWARGS', {}))
self.log = logging.getLogger('haystack')
Expand Down Expand Up @@ -151,7 +153,7 @@ def build_search_kwargs(self, query_string, sort_by=None, start_offset=0, end_of
narrow_queries=None, spelling_query=None,
within=None, dwithin=None, distance_point=None,
models=None, limit_to_registered_models=None,
result_class=None, stats=None,
result_class=None, stats=None, collate=None,
**extra_kwargs):
kwargs = {'fl': '* score'}

Expand Down Expand Up @@ -201,9 +203,11 @@ def build_search_kwargs(self, query_string, sort_by=None, start_offset=0, end_of
for key in highlight.keys()
})

if collate is None:
collate = self.collate
if self.include_spelling is True:
kwargs['spellcheck'] = 'true'
kwargs['spellcheck.collate'] = 'true'
kwargs['spellcheck.collate'] = str(collate).lower()
kwargs['spellcheck.count'] = 1

if spelling_query:
Expand Down Expand Up @@ -389,13 +393,38 @@ def _process_results(self, raw_results, highlight=False, result_class=None, dist
facets[key][facet_field][1::2]))

if self.include_spelling and hasattr(raw_results, 'spellcheck'):
# Solr 5+ changed the JSON response format so the suggestions will be key-value mapped rather
# than simply paired elements in a list, which is a nice improvement but incompatible with
# Solr 4: https://issues.apache.org/jira/browse/SOLR-3029
if len(raw_results.spellcheck.get('collations', [])):
spelling_suggestion = raw_results.spellcheck['collations'][-1]
elif len(raw_results.spellcheck.get('suggestions', [])):
spelling_suggestion = raw_results.spellcheck['suggestions'][-1]
# There are many different formats for Legacy, 6.4, and 6.5
# e.g. https://issues.apache.org/jira/browse/SOLR-3029
collations = raw_results.spellcheck.get('collations', [])
suggestions = raw_results.spellcheck.get('suggestions', [])
if len(collations):
#Handle sol6.5 collation format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick but we should probably have a space after # in the comments in this block

if isinstance(collations, dict):
spelling_suggestions= [col['collationQuery'] for col in collations.values()] #aggregate for future use in multi suggestion response
#Legacy Legacy & 6.4 handling
else:
if isinstance(collations[1], dict): #Solr6.4
spelling_suggestions = [item["collationQuery"] for item in collations if isinstance(item,dict)] #aggregate for future use in multi suggestion response
else: #Legacy Solr format
spelling_suggestions=collations[-1]

spelling_suggestion = spelling_suggestions[-1] #Keep current method of returning single value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we would want the last suggestion rather than the first? I'm thinking we should have a comment with the rationale

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the way it was already done. I have no intuition as to why the last was taken. I believe solr CAN do some sorting of the returns based on some priority...But just wanted to keep convention.

elif len(suggestions):
#Handle sol6.5 suggestion format
if isinstance(suggestions, dict):
for word,sug in suggestions.items():
spelling_suggestions = [item["word"] for item in sug['suggestion']] #aggregate for future use in multi suggestion response
#Legacy Legacy & 6.4 handling
else:
spelling_suggestions = []
if isinstance(suggestions[1], dict): #Solr6.4
for item in suggestions:
if isinstance(item, dict):
spelling_suggestions += [subitem["word"] for subitem in item['suggestion']]
else: #Legacy Solr
spelling_suggestions=suggestions[-1]

spelling_suggestion = spelling_suggestions[-1] #Keep current method of returning single value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have this code in a few places, what do you think about declaring spelling_suggestions once, having the various branches populate it, and then having the check to return None or a single value at one place afterwards?


assert spelling_suggestion is None or isinstance(spelling_suggestion, six.string_types)

Expand Down Expand Up @@ -722,6 +751,7 @@ def run(self, spelling_query=None, **kwargs):
search_kwargs.update(kwargs)

results = self.backend.search(final_query, **search_kwargs)

self._results = results.get('results', [])
self._hit_count = results.get('hits', 0)
self._facet_counts = self.post_process_facets(results)
Expand Down
1 change: 1 addition & 0 deletions haystack/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
ID = getattr(settings, 'HAYSTACK_ID_FIELD', 'id')
DJANGO_CT = getattr(settings, 'HAYSTACK_DJANGO_CT_FIELD', 'django_ct')
DJANGO_ID = getattr(settings, 'HAYSTACK_DJANGO_ID_FIELD', 'django_id')
HAYSTACK_DOCUMENT_FIELD = getattr(settings, 'HAYSTACK_DOCUMENT_FIELD', 'text')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this could be DOCUMENT_FIELD following the other settings losing the HAYSTACK_ prefix when retrieved.


# Default operator. Valid options are AND/OR.
DEFAULT_OPERATOR = getattr(settings, 'HAYSTACK_DEFAULT_OPERATOR', 'AND')
Expand Down
76 changes: 67 additions & 9 deletions haystack/management/commands/build_solr_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,21 @@
from __future__ import absolute_import, division, print_function, unicode_literals

from django.core.exceptions import ImproperlyConfigured
from django.core.management.base import BaseCommand
from django.core.management.base import BaseCommand,CommandError
from django.template import Context, loader
from django.conf import settings

from haystack import connections, connection_router, constants
from haystack.backends.solr_backend import SolrSearchBackend


import pysolr
import os
import traceback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use isort to sort imports and split them into multiple sections (stdlib, third-party, this project). See 9f8c583

import requests
class Command(BaseCommand):
help = "Generates a Solr schema that reflects the indexes."
help = "Generates a Solr schema that reflects the indexes using templates under a django template dir 'search_configuration/*.xml'"
schema_template_loc = 'search_configuration/schema.xml'
solrcfg_template_loc = 'search_configuration/solrconfig.xml'

def add_arguments(self, parser):
parser.add_argument(
Expand All @@ -22,15 +28,66 @@ def add_arguments(self, parser):
"-u", "--using", default=constants.DEFAULT_ALIAS,
help='If provided, chooses a connection to work with.'
)
parser.add_argument(
"-c", "--configure_dir",
help='If provided, attempts to configure a core located in the given directory by removing the managed-schema.xml(renaming), configuring the core to use a classic (non-dynamic) schema, and generating the schema.xml from the template provided in'
)
parser.add_argument(
"-r", "--reload",
help='If provided, attempts to automatically reload the solr core'
)


def handle(self, **options):
"""Generates a Solr schema that reflects the indexes."""
using = options.get('using')
schema_xml = self.build_template(using=using)
if not isinstance(connections[using].get_backend(), SolrSearchBackend):
raise ImproperlyConfigured("'%s' isn't configured as a SolrEngine)." % connections[using].get_backend().connection_alias)

schema_xml = self.build_template(using=using,tfile=Command.schema_template_loc)
solrcfg_xml = self.build_template(using=using,tfile=Command.solrcfg_template_loc)

if options.get('filename'):
self.stdout.write("Trying to write schema file located at {}".format(options.get('filename')))
self.write_file(options.get('filename'), schema_xml)
else:
if options.get('reload'):
connections[using].get_backend().reload()

if options.get('configure_dir'):
cdir = options.get('configure_dir')
self.stdout.write("Trying to configure core located at {}".format(cdir))
if os.path.isfile(cdir+'/managed-schema'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use os.path.join for all of the filesystem path construction logic in this section to avoid issues with Windows

try:
os.rename(cdir+'/managed-schema',cdir+'/managed-schema.old')
except:
raise CommandError('Could not rename managed schema out of the way: {}'.format(cdir+'/managed-schema'))
try:
self.write_file(cdir+'/schema.xml', schema_xml)
except:
raise CommandError('Could not configure {}: {}'.format(cdir+'/schema.xml',traceback.format_exc()))

try:
self.write_file(cdir+'/solrconfig.xml',solrcfg_xml)
except:
raise CommandError('Could not configure core to use classic Schema Factory {}'.format(cdir+'/solrconfig.xml'))

if options.get('reload'):
core= settings.HAYSTACK_CONNECTIONS['solr']['URL'].rsplit('/',1)[-1]
if 'ADMIN_URL' not in settings.HAYSTACK_CONNECTIONS['solr']:
raise ImproperlyConfigured("'ADMIN_URL' must be specifid in the HAYSTACK_CONNECTIONS settins for the backend." )
if 'URL' not in settings.HAYSTACK_CONNECTIONS['solr']:
raise ImproperlyConfigured("'URL' to the core must be specifid in the HAYSTACK_CONNECTIONS settins for the backend.")
try:
self.stdout.write("Trying to relaod core named {}".format(core))
resp = requests.get(settings.HAYSTACK_CONNECTIONS['solr']['ADMIN_URL'],params="action=RELOAD&core="+core).text#TODO: Fix when pysolr passes params as request params instead of data
if resp.find('SolrException')!=-1:
raise CommandError('Solr Exception Thrown -- Failed to reload core: {}'.format(resp))
except CommandError:
raise
except:
raise CommandError('Failed to reload core: {}'.format(traceback.format_exc()))

if options.get('filename') is None and options.get('configure_dir') is None and options.get('reload') is None:
self.print_stdout(schema_xml)

def build_context(self, using):
Expand All @@ -42,17 +99,17 @@ def build_context(self, using):
content_field_name, fields = backend.build_schema(
connections[using].get_unified_index().all_searchfields()
)
return Context({
return {
'content_field_name': content_field_name,
'fields': fields,
'default_operator': constants.DEFAULT_OPERATOR,
'ID': constants.ID,
'DJANGO_CT': constants.DJANGO_CT,
'DJANGO_ID': constants.DJANGO_ID,
})
}

def build_template(self, using):
t = loader.get_template('search_configuration/solr.xml')
def build_template(self, using, tfile=schema_template_loc):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a keyword argument it should be something like template_filename which is a little more obvious when someone looks at the help, uses autocomplete in an editor, etc.

t = loader.get_template(tfile)
c = self.build_context(using=using)
return t.render(c)

Expand All @@ -68,3 +125,4 @@ def print_stdout(self, schema_xml):
def write_file(self, filename, schema_xml):
with open(filename, 'w') as schema_file:
schema_file.write(schema_xml)
os.fsync(schema_file.fileno())