Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Added geospatial search support!

I have anxiously waited to add this feature for almost 3 years now.
Support is finally present in more than one backend & I was
generously given some paid time to work on implementing this.

Thanks go out to:

  * CMGdigital, who paid for ~50% of the development of this feature
    & were awesomely supportive.
  * Jannis Leidel (jezdez), who did the original version of this
    patch & was an excellent sounding board.
  * Adam Fast, for patiently holding my hand through some of the
    geospatial confusions & for helping me verify GeoDjango
    functionality.
  * Justin Bronn, for the great work he originally did on
    GeoDjango, which served as a point of reference/inspiration
    on the API.

And thanks to all others who have submitted a variety of
patches/pull requests/interest throughout the years trying to get
this feature in place.
  • Loading branch information...
commit ad79f05b3ccdb53bfddae1e9d488f31968a57704 1 parent 419550a
Daniel Lindsley toastdriven authored
Showing with 1,509 additions and 275 deletions.
  1. +2 −1  AUTHORS
  2. +9 −8 docs/index.rst
  3. +6 −5 docs/installing_search_engines.rst
  4. +4 −3 docs/searchfield_api.rst
  5. +22 −0 docs/searchquery_api.rst
  6. +28 −0 docs/searchqueryset_api.rst
  7. +5 −0 docs/searchresult_api.rst
  8. +412 −0 docs/spatial.rst
  9. +7 −5 docs/toc.rst
  10. +58 −1 haystack/backends/__init__.py
  11. +21 −20 haystack/backends/simple_backend.py
  12. +96 −10 haystack/backends/solr_backend.py
  13. +2 −1  haystack/backends/whoosh_backend.py
  14. +4 −0 haystack/constants.py
  15. +4 −0 haystack/exceptions.py
  16. +113 −79 haystack/fields.py
  17. +78 −37 haystack/models.py
  18. +27 −0 haystack/query.py
  19. +72 −31 haystack/templates/search_configuration/solr.xml
  20. +74 −0 haystack/utils/geo.py
  21. +9 −1 tests/core/models.py
  22. +11 −12 tests/overrides/tests/altered_internal_names.py
  23. +4 −0 tests/run_all_tests.sh
  24. +2 −0  tests/settings.py
  25. +21 −17 tests/solr_test_schema.xml
  26. +13 −13 tests/solr_tests/tests/admin.py
  27. +31 −20 tests/solr_tests/tests/solr_backend.py
  28. +11 −11 tests/solr_tests/tests/templatetags.py
  29. 0  tests/spatial/__init__.py
  30. +112 −0 tests/spatial/fixtures/sample_spatial_data.json
  31. +27 −0 tests/spatial/models.py
  32. +19 −0 tests/spatial/search_indexes.py
  33. +178 −0 tests/spatial/tests.py
  34. +27 −0 tests/spatial_settings.py
3  AUTHORS
View
@@ -17,7 +17,7 @@ Thanks to
* Brian Rosner for various patches.
* Richard Boulton for feedback and suggestions.
* Cyberdelia for feedback and patches.
- * Jannis Leidel for consistently "walking behind the elephant" and cleaning up the setup.py.
+ * Jannis Leidel for consistently "walking behind the elephant", cleaning up the setup.py & the initial geospatial implementation.
Alex Gaynor
alex added a note

Oxford comma!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
* Ask Solem for for patching the setup.py.
* Ben Spaulding for feedback and documentation patches.
* smulloni for various patches.
@@ -55,4 +55,5 @@ Thanks to
* a multiprocessing-enabled version of ``update_index``.
* the addition of ``--start/--end`` options in ``update_index``.
* the ability to specify both apps & models to ``update_index``.
+ * A significant portion of the geospatial & function query features.
* Aram Dulyan (Aramgutang) for fixing the included admin class to be Django 1.4 compatible.
17 docs/index.rst
View
@@ -26,12 +26,12 @@ you up and running:
.. toctree::
:maxdepth: 2
-
+
tutorial
-
+
.. toctree::
:maxdepth: 1
-
+
views_and_forms
templatetags
glossary
@@ -41,7 +41,7 @@ you up and running:
other_apps
installing_search_engines
debugging
-
+
migration_from_1_to_2
@@ -53,7 +53,7 @@ you may want to include in your application.
.. toctree::
:maxdepth: 1
-
+
best_practices
highlighting
faceting
@@ -61,6 +61,7 @@ you may want to include in your application.
boost
multiple_index
rich_content_extraction
+ spatial
Reference
@@ -71,14 +72,14 @@ looking for API documentation and advanced usage as detailed in:
.. toctree::
:maxdepth: 2
-
+
searchqueryset_api
searchindex_api
searchfield_api
searchresult_api
searchquery_api
searchbackend_api
-
+
architecture_overview
backend_support
settings
@@ -94,7 +95,7 @@ additional backends:
.. toctree::
:maxdepth: 1
-
+
running_tests
creating_new_backends
11 docs/installing_search_engines.rst
View
@@ -11,18 +11,19 @@ Official Download Location: http://www.apache.org/dyn/closer.cgi/lucene/solr/
Solr is Java but comes in a pre=packaged form that requires very little other
than the JRE and Jetty. It's very performant and has an advanced featureset.
-Haystack requires Solr 1.3+. Installation is relatively simple::
+Haystack suggests using Solr 3.5+, though it's possible to get it working on
+Solr 1.4 with a little effort. Installation is relatively simple::
- curl -O http://apache.mirrors.tds.net/lucene/solr/1.4.1/apache-solr-1.4.1.tgz
- tar xvzf apache-solr-1.4.1.tgz
- cd apache-solr-1.4.1
+ curl -O http://apache.mirrors.tds.net/lucene/solr/3.5.0/apache-solr-3.5.0.tgz
+ tar xvzf apache-solr-3.5.0.tgz
+ cd apache-solr-3.5.0
cd example
java -jar start.jar
You'll need to revise your schema. You can generate this from your application
(once Haystack is installed and setup) by running
``./manage.py build_solr_schema``. Take the output from that command and place
-it in ``apache-solr-1.4.1/example/solr/conf/schema.xml``. Then restart Solr.
+it in ``apache-solr-3.5.0/example/solr/conf/schema.xml``. Then restart Solr.
You'll also need a Solr binding, ``pysolr``. The official ``pysolr`` package,
distributed via PyPI, is the best version to use (2.1.0+). Place ``pysolr.py``
7 docs/searchfield_api.rst
View
@@ -32,6 +32,7 @@ Included with Haystack are the following field types:
* ``EdgeNgramField``
* ``FloatField``
* ``IntegerField``
+* ``LocationField``
* ``MultiValueField``
* ``NgramField``
@@ -63,13 +64,13 @@ example::
from haystack import indexes
from myapp.models import Note
-
-
+
+
class NoteIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
author = indexes.CharField(model_attr='user')
pub_date = indexes.DateTimeField(model_attr='pub_date')
-
+
def get_model(self):
return Note
22 docs/searchquery_api.rst
View
@@ -257,6 +257,28 @@ similar to the provided instance.
Adds highlighting to the search results.
+``add_within``
+~~~~~~~~~~~~~~~~~
+
+.. method:: SearchQuery.add_within(self, field, point_1, point_2):
+
+Adds bounding box parameters to search query.
+
+``add_dwithin``
+~~~~~~~~~~~~~~~~~
+
+.. method:: SearchQuery.add_dwithin(self, field, point, distance):
+
+Adds radius-based parameters to search query.
+
+``add_distance``
+~~~~~~~~~~~~~~~~~
+
+.. method:: SearchQuery.add_distance(self, field, point):
+
+Denotes that results should include distance measurements from the
+point passed in.
+
``add_field_facet``
~~~~~~~~~~~~~~~~~~~
28 docs/searchqueryset_api.rst
View
@@ -320,6 +320,34 @@ Example::
# Count document hits for authors that start with 'jo' within the index.
SearchQuerySet().filter(content='foo').query_facet('author', 'jo*')
+``query_facet``
+~~~~~~~~~~~~~~~
+
+.. method:: SearchQuerySet.within(self, field, point_1, point_2):
+
+Spatial: Adds a bounding box search to the query.
+
+See the :ref:`spatial` docs for more information.
+
+``query_facet``
+~~~~~~~~~~~~~~~
+
+.. method:: SearchQuerySet.dwithin(self, field, point, distance):
+
+Spatial: Adds a distance-based search to the query.
+
+See the :ref:`spatial` docs for more information.
+
+``query_facet``
+~~~~~~~~~~~~~~~
+
+.. method:: SearchQuerySet.distance(self, field, point):
+
+Spatial: Denotes results must have distance measurements from the
+provided point.
+
+See the :ref:`spatial` docs for more information.
+
``narrow``
~~~~~~~~~~
5 docs/searchresult_api.rst
View
@@ -23,6 +23,11 @@ The class exposes the following useful attributes/properties:
* ``object`` - The actual model instance (lazy loaded).
* ``model`` - The model class.
* ``verbose_name`` - A prettier version of the model's class name for display.
+* ``verbose_name_plural`` - A prettier version of the model's *plural* class name for display.
+* ``search_index`` - Returns the ``SearchIndex`` class associated with this
+ result.
+* ``distance`` - On geo-spatial queries, this returns a ``Distance`` object
+ representing the distance the result was from the focused point.
Method Reference
412 docs/spatial.rst
View
@@ -0,0 +1,412 @@
+.. _ref-spatial:
+
+==============
+Spatial Search
+==============
+
+Spatial search (also called geospatial search) allows you to take data that
+has a geographic location & enhance the search results by limiting them to a
+physical area. Haystack, combined with the latest versions of a couple engines,
+can provide this type of search.
+
+In addition, Haystack tries to implement these features in a way that is as
+close to GeoDjango_ as possible. There are some differences, which we'll
+highlight throughout this guide. Additionally, while the support isn't as
+comprehensive as PostGIS (for example), it is still quite useful.
+
+.. _GeoDjango: http://geodjango.org/
+
+
+Additional Requirements
+=======================
+
+The spatial functionality has only one non-included, non-available-in-Django
+dependency:
+
+* ``geopy`` - ``pip install geopy``
+
+If you do not ever need distance information, you may be able to skip
+installing ``geopy``.
+
+
+Support
+=======
+
+You need the latest & greatest of either Solr or Elasticsearch. None of the
+other backends (specifially the engines) support this kind of search.
+
+For Solr_, you'll need at least **v3.5+**. In addition, if you have an existing
+install of Haystack & Solr, you'll need to upgrade the schema & reindex your
+data. If you're adding geospatial data, you would have to reindex anyhow.
+
+For Elasticsearch, you'll need...
+
+.. _Solr: http://lucene.apache.org/solr/
+
+====================== ====== =============== ======== ======== ========
+Lookup Type Solr Elasticsearch Whoosh Xapian Simple
+====================== ====== =============== ======== ======== ========
+:lookup:`within` X X
+:lookup:`dwithin` X X
+`distance` X X
+`order_by('distance')` X X
+:lookup:`polygon` X
+
+For more details, you can inspect http://wiki.apache.org/solr/SpatialSearch
+or http://www.elasticsearch.org/guide/reference/query-dsl/geo-bounding-box-filter.html.
+
+
+Geospatial Assumptions
+======================
+
+``Points``
+----------
+
+Haystack prefers to work with ``Point`` objects, which are located in
+``django.contrib.gis.geos.Point`` but conviently importable out of
+``haystack.utils.geo.Point``.
+
+``Point`` objects use **LONGITUDE, LATITUDE** for their construction, regardless
+if you use the parameters to instantiate them or WKT_/``GEOSGeometry``.
+
+.. _WKT: http://en.wikipedia.org/wiki/Well-known_text
+
+Examples::
+
+ # Using positional arguments.
+ from haystack.utils.geo import Point
+ pnt = Point(-95.23592948913574, 38.97127105172941)
+
+ # Using WKT.
+ from django.contrib.gis.geos import GEOSGeometry
+ pnt = GEOSGeometry('POINT(-95.23592948913574 38.97127105172941)')
+
+They are preferred over just providing ``latitude, longitude`` because they are
+more intelligent, have a spatial reference system attached & are more consistent
+with GeoDjango's use.
+
+
+``Distance``
+------------
+
+Haystack also uses the ``D`` (or ``Distance``) objects from GeoDjango,
+implemented in ``django.contrib.gis.measure.Distance`` but conveniently
+importable out of ``haystack.utils.geo.D`` (or ``haystack.utils.geo.Distance``).
+
+``Distance`` objects accept a very flexible set of measurements during
+instantiaton and can convert amongst them freely. This is important, because
+the engines rely on measurements being in kilometers but you're free to use
+whatever units you want.
+
+Examples::
+
+ from haystack.utils.geo import D
+
+ # Start at 5 miles.
+ imperial_d = D(mi=5)
+
+ # Convert to fathoms...
+ fathom_d = imperial_d.fathom
+
+ # Now to kilometers...
+ km_d = imperial_d.km
+
+ # And back to miles.
+ mi = imperial_d.mi
+
+They are preferred over just providing a raw distance because they are
+more intelligent, have a well-defined unit system attached & are consistent
+with GeoDjango's use.
+
+
+``WGS-84``
+----------
+
+All engines assume WGS-84 (SRID 4326). At the time of writing, there does **not**
+appear to be a way to switch this. Haystack will transform all points into this
+coordinate system for you.
+
+
+Indexing
+========
+
+Indexing is relatively simple. Simply add a ``LocationField`` (or several)
+onto your ``SearchIndex`` class(es) & provide them a ``Point`` object. For
+example::
+
+ from haystack import indexes
+ from shops.models import Shop
+
+
+ class ShopIndex(indexes.SearchIndex, indexes.Indexable):
+ text = indexes.CharField(document=True, use_template=True)
+ # ... the usual, then...
+ location = indexes.LocationField(model_attr='coordinates')
+
+ def get_model(self):
+ return Shop
+
+If you must manually prepare the data, you have to do something slightly less
+convenient, returning a string-ified version of the coordinates in WGS-84 as
+``lat,long``::
+
+ from haystack import indexes
+ from shops.models import Shop
+
+
+ class ShopIndex(indexes.SearchIndex, indexes.Indexable):
+ text = indexes.CharField(document=True, use_template=True)
+ # ... the usual, then...
+ location = indexes.LocationField()
+
+ def get_model(self):
+ return Shop
+
+ def prepare_location(self, obj):
+ # If you're just storing the floats...
+ return "%s,%s" % (obj.latitude, obj.longitude)
+
+Alternatively, you could build a method/property onto the ``Shop`` model that
+returns a ``Point`` based on those coordinates::
+
+ # shops/models.py
+ from django.contrib.gis.geos import Point
+ from django.db import models
+
+
+ class Shop(models.Model):
+ # ... the usual, then...
+ latitude = models.FloatField()
+ longitude = models.FloatField()
+
+ # Usual methods, then...
+ def get_location(self):
+ # Remember, longitude FIRST!
+ return Point(self.longitude, self.latitude)
+
+
+ # shops/search_indexes.py
+ from haystack import indexes
+ from shops.models import Shop
+
+
+ class ShopIndex(indexes.SearchIndex, indexes.Indexable):
+ text = indexes.CharField(document=True, use_template=True)
+ location = indexes.LocationField(model_attr='get_location')
+
+ def get_model(self):
+ return Shop
+
+
+Querying
+========
+
+There are two types of geospatial queries you can run, ``within`` & ``dwithin``.
+Like their GeoDjango_ counterparts_, these methods focus on finding results
+within an area.
+
+.. _GeoDjango: https://docs.djangoproject.com/en/dev/ref/contrib/gis/geoquerysets/#within
+.. _counterparts: https://docs.djangoproject.com/en/dev/ref/contrib/gis/geoquerysets/#dwithin
+
+
+``within``
+----------
+
+.. method:: SearchQuerySet.dwithin(self, field, point_1, point_2)
+
+``within`` is a bounding box comparison. A bounding box is a rectangular area
+within which to search. It's composed of a bottom-left point & a top-right
+point, though provided you give two opposing corners in either order, Haystack
+will determine the right coordinates. It is faster but slighty sloppier than
+its counterpart.
+
+Examples::
+
+ from haystack.query import SearchQuerySet
+ from haystack.utils.geo import Point
+
+ downtown_bottom_left = Point(-95.23947, 38.9637903)
+ downtown_top_right = Point(-95.23362278938293, 38.973081081164715)
+
+ # 'location' is the fieldname from our ``SearchIndex``...
+
+ # Do the bounding box query.
+ sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right)
+
+ # Can be chained with other Haystack calls.
+ sqs = SearchQuerySet().auto_query('coffee').within('location', downtown_bottom_left, downtown_top_right).order_by('-popularity')
+
+.. note::
+
+ In GeoDjango, assuming the ``Shop`` model had been properly geo-ified, this
+ would have been implemented as::
+
+ from shops.models import Shop
+ Shop.objects.filter(location__within=(downtown_bottom_left, downtown_top_right))
+
+ Haystack's form differs because it yielded a cleaner implementation, was
+ no more typing than the GeoDjango version & tried to maintain the same
+ terminology/similar signature.
+
+
+``dwithin``
+----------
+
+.. method:: SearchQuerySet.dwithin(self, field, point, distance)
+
+``dwithin`` is a radius-based search. A radius-based search is a circular area
+within which to search. It's composed of a center point & a radius (in
+kilometers, though Haystack will use the ``D`` object's conversion utilities to
+get it there). It is slower than``within`` but very exact & can involve fewer
+calculations on your part.
+
+Examples::
+
+ from haystack.query import SearchQuerySet
+ from haystack.utils.geo import Point, D
+
+ ninth_and_mass = Point(-95.23592948913574, 38.96753407043678)
+ # Within a two miles.
+ max_dist = D(mi=2)
+
+ # 'location' is the fieldname from our ``SearchIndex``...
+
+ # Do the radius query.
+ sqs = SearchQuerySet().dwithin('location', ninth_and_mass, max_dist)
+
+ # Can be chained with other Haystack calls.
+ sqs = SearchQuerySet().auto_query('coffee').dwithin('location', ninth_and_mass, max_dist).order_by('-popularity')
+
+.. note::
+
+ In GeoDjango, assuming the ``Shop`` model had been properly geo-ified, this
+ would have been implemented as::
+
+ from shops.models import Shop
+ Shop.objects.filter(location__dwithin=(ninth_and_mass, D(mi=2)))
+
+ Haystack's form differs because it yielded a cleaner implementation, was
+ no more typing than the GeoDjango version & tried to maintain the same
+ terminology/similar signature.
+
+
+``distance``
+------------
+
+.. method:: SearchQuerySet.distance(self, field, point)
+
+By default, search results will come back without distance information attached
+to them. In the concept of a bounding box, it would be ambiguous what the
+distances would be calculated against. And it is more calculation that may not
+be necessary.
+
+So like GeoDjango, Haystack exposes a method to signify that you want to
+include these calculated distances on results.
+
+Examples::
+
+ from haystack.query import SearchQuerySet
+ from haystack.utils.geo import Point, D
+
+ ninth_and_mass = Point(-95.23592948913574, 38.96753407043678)
+
+ # On a bounding box...
+ downtown_bottom_left = Point(-95.23947, 38.9637903)
+ downtown_top_right = Point(-95.23362278938293, 38.973081081164715)
+
+ sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right).distance('location', ninth_and_mass)
+
+ # ...Or on a radius query.
+ sqs = SearchQuerySet().dwithin('location', ninth_and_mass, D(mi2)).distance('location', ninth_and_mass)
+
+You can even apply a different field, for instance if you calculate results of
+key, well-cached hotspots in town but want distances from the user's current
+position::
+
+ from haystack.query import SearchQuerySet
+ from haystack.utils.geo import Point, D
+
+ ninth_and_mass = Point(-95.23592948913574, 38.96753407043678)
+ user_loc = Point(-95.23455619812012, 38.97240128290697)
+
+ sqs = SearchQuerySet().dwithin('location', ninth_and_mass, D(mi2)).distance('location', user_loc)
+
+.. note::
+
+ The astute will notice this is Haystack's biggest departure from GeoDjango.
+ In GeoDjango, this would have been implemented as::
+
+ from shops.models import Shop
+ Shop.objects.filter(location__dwithin=(ninth_and_mass, D(mi=2))).distance(user_loc)
+
+ Note that, by default, the GeoDjango form leaves *out* the field to be
+ calculating against (though it's possible to override it & specify the
+ field).
+
+ Haystack's form differs because the same assumptions are difficult to make.
+ GeoDjango deals with a single model at a time, where Haystack deals with
+ a broad mix of models. Additionally, accessing ``Model`` information is a
+ couple hops away, so Haystack favors the explicit (if slightly more typing)
+ approach.
+
+
+Ordering
+========
+
+Because you're dealing with search, even with geospatial queries, results still
+come back in **RELEVANCE** order. If you want to offer the user ordering
+results by distance, there's a simple way to enable this ordering.
+
+Using the standard Haystack ``order_by`` method, if you specify ``distance`` or
+``-distance`` **ONLY**, you'll get geographic ordering. Additionally, you must
+have a call to ``.distance()`` somewhere in the chain, otherwise there is no
+distance information on the results & nothing to sort by.
+
+Examples::
+
+ from haystack.query import SearchQuerySet
+ from haystack.utils.geo import Point, D
+
+ ninth_and_mass = Point(-95.23592948913574, 38.96753407043678)
+ downtown_bottom_left = Point(-95.23947, 38.9637903)
+ downtown_top_right = Point(-95.23362278938293, 38.973081081164715)
+
+ # Non-geo ordering.
+ sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right).order_by('title')
+ sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right).distance('location', ninth_and_mass).order_by('-created')
+
+ # Geo ordering, closest to farthest.
+ sqs = SearchQuerySet().within('location', downtown_bottom_left, downtown_top_right).distance('location', ninth_and_mass).order_by('distance')
+ # Geo ordering, farthest to closest.
+ sqs = SearchQuerySet().dwithin('location', ninth_and_mass, D(mi2)).distance('location', ninth_and_mass).order_by('-distance')
+
+.. note::
+
+ This call is identical to the GeoDjango usage.
+
+.. warning::
+
+ You can not specify both a distance & lexicographic ordering. If you specify
+ more than just ``distance`` or ``-distance``, Haystack assumes ``distance``
+ is a field in the index & tries to sort on it. Example::
+
+ # May blow up!
+ sqs = SearchQuerySet().dwithin('location', ninth_and_mass, D(mi2)).distance('location', ninth_and_mass).order_by('distance', 'title')
+
+ This is a limitation in the engine's implementation.
+
+ If you actually **have** a field called ``distance`` (& aren't using
+ calculated distance information), Haystack will do the right thing in
+ these circumstances.
+
+
+Caveats
+=======
+
+In all cases, you may call the ``within/dwithin/distance`` methods as many times
+as you like. However, the **LAST** call is the information that will be used.
+No combination logic is available, as this is largely a backend limitation.
+
+Combining calls to both ``within`` & ``dwithin`` may yield unexpected or broken
+results. They don't overlap when performing queries, so it may be possible to
+construct queries that work. Your Mileage May Vary.
12 docs/toc.rst
View
@@ -3,7 +3,7 @@ Table Of Contents
.. toctree::
:maxdepth: 2
-
+
index
tutorial
glossary
@@ -18,23 +18,25 @@ Table Of Contents
who_uses
other_apps
debugging
-
+
migration_from_1_to_2
-
+
best_practices
highlighting
faceting
autocomplete
boost
multiple_index
-
+ rich_content_extraction
+ spatial
+
searchqueryset_api
searchindex_api
searchfield_api
searchresult_api
searchquery_api
searchbackend_api
-
+
running_tests
creating_new_backends
utils
59 haystack/backends/__init__.py
View
@@ -9,6 +9,7 @@
from haystack.constants import DJANGO_CT, VALID_FILTERS, FILTER_SEPARATOR, DEFAULT_ALIAS
from haystack.exceptions import MoreLikeThisError, FacetingError
from haystack.models import SearchResult
+from haystack.utils.geo import ensure_point, ensure_distance
from haystack.utils.loading import UnifiedIndex
@@ -70,6 +71,7 @@ def __init__(self, connection_alias, **connection_options):
self.include_spelling = connection_options.get('INCLUDE_SPELLING', False)
self.batch_size = connection_options.get('BATCH_SIZE', 1000)
self.silently_fail = connection_options.get('SILENTLY_FAIL', True)
+ self.distance_available = connection_options.get('DISTANCE_AVAILABLE', False)
def update(self, index, iterable):
"""
@@ -104,7 +106,8 @@ def clear(self, models=[], commit=True):
@log_query
def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
fields='', highlight=False, facets=None, date_facets=None, query_facets=None,
- narrow_queries=None, spelling_query=None,
+ narrow_queries=None, spelling_query=None, within=None,
+ dwithin=None, distance_point=None,
limit_to_registered_models=None, result_class=None, **kwargs):
"""
Takes a query to search on and returns dictionary.
@@ -297,6 +300,11 @@ def __init__(self, using=DEFAULT_ALIAS):
#: and django_id when using code which expects those to be included in
#: the results
self.fields = []
+ # Geospatial-related information
+ self.within = {}
+ self.dwithin = {}
+ self.distance_point = {}
+ # Internal.
self._raw_query = None
self._raw_query_params = {}
self._more_like_this = False
@@ -363,6 +371,15 @@ def build_params(self, spelling_query=None):
if self.boost:
kwargs['boost'] = self.boost
+ if self.within:
+ kwargs['within'] = self.within
+
+ if self.dwithin:
+ kwargs['dwithin'] = self.dwithin
+
+ if self.distance_point:
+ kwargs['distance_point'] = self.distance_point
+
if self.result_class:
kwargs['result_class'] = self.result_class
@@ -603,6 +620,10 @@ def add_order_by(self, field):
"""Orders the search result by a field."""
self.order_by.append(field)
+ def add_order_by_distance(self, **kwargs):
+ """Orders the search result by distance from point."""
+ raise NotImplementedError("Subclasses must provide a way to add order by distance in the 'add_order_by_distance' method.")
+
def clear_order_by(self):
"""
Clears out all ordering that has been already added, reverting the
@@ -610,6 +631,13 @@ def clear_order_by(self):
"""
self.order_by = []
+ def clear_order_by_distance(self):
+ """
+ Clears out all distance ordering that has been already added, reverting the
+ query to relevancy.
+ """
+ self.order_by_distance = []
+
def add_model(self, model):
"""
Restricts the query requiring matches in the given model.
@@ -663,6 +691,32 @@ def add_highlight(self):
"""Adds highlighting to the search results."""
self.highlight = True
+ def add_within(self, field, point_1, point_2):
+ """Adds bounding box parameters to search query."""
+ self.within = {
+ 'field': field,
+ 'point_1': ensure_point(point_1),
+ 'point_2': ensure_point(point_2),
+ }
+
+ def add_dwithin(self, field, point, distance):
+ """Adds radius-based parameters to search query."""
+ self.dwithin = {
+ 'field': field,
+ 'point': ensure_point(point),
+ 'distance': ensure_distance(distance),
+ }
+
+ def add_distance(self, field, point):
+ """
+ Denotes that results should include distance measurements from the
+ point passed in.
+ """
+ self.distance_point = {
+ 'field': field,
+ 'point': ensure_point(point),
+ }
+
def add_field_facet(self, field):
"""Adds a regular facet on a field."""
from haystack import connections
@@ -769,6 +823,9 @@ def _clone(self, klass=None, using=None):
clone.start_offset = self.start_offset
clone.end_offset = self.end_offset
clone.result_class = self.result_class
+ clone.within = self.within.copy()
+ clone.dwithin = self.dwithin.copy()
+ clone.distance_point = self.distance_point.copy()
clone._raw_query = self._raw_query
clone._raw_query_params = self._raw_query_params
return clone
41 haystack/backends/simple_backend.py
View
@@ -10,7 +10,7 @@
if settings.DEBUG:
import logging
-
+
class NullHandler(logging.Handler):
def emit(self, record):
pass
@@ -29,26 +29,27 @@ class SimpleSearchBackend(BaseSearchBackend):
def update(self, indexer, iterable, commit=True):
if settings.DEBUG:
logger.warning('update is not implemented in this backend')
-
+
def remove(self, obj, commit=True):
if settings.DEBUG:
logger.warning('remove is not implemented in this backend')
-
+
def clear(self, models=[], commit=True):
if settings.DEBUG:
logger.warning('clear is not implemented in this backend')
-
+
@log_query
def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
fields='', highlight=False, facets=None, date_facets=None, query_facets=None,
- narrow_queries=None, spelling_query=None,
+ narrow_queries=None, spelling_query=None, within=None,
+ dwithin=None, distance_point=None,
limit_to_registered_models=None, result_class=None, **kwargs):
hits = 0
results = []
-
+
if result_class is None:
result_class = SearchResult
-
+
if query_string:
for model in connections[self.connection_alias].get_unified_index().get_indexed_models():
if query_string == '*':
@@ -56,35 +57,35 @@ def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
else:
for term in query_string.split():
queries = []
-
+
for field in model._meta._fields():
if hasattr(field, 'related'):
continue
-
+
if not field.get_internal_type() in ('TextField', 'CharField', 'SlugField'):
continue
-
+
queries.append(Q(**{'%s__icontains' % field.name: term}))
-
+
qs = model.objects.filter(reduce(lambda x, y: x|y, queries))
-
+
hits += len(qs)
-
+
for match in qs:
result = result_class(match._meta.app_label, match._meta.module_name, match.pk, 0, **match.__dict__)
# For efficiency.
result._model = match.__class__
result._object = match
results.append(result)
-
+
return {
'results': results,
'hits': hits,
}
-
+
def prep_value(self, db_field, value):
return value
-
+
def more_like_this(self, model_instance, additional_query_string=None,
start_offset=0, end_offset=None,
limit_to_registered_models=None, result_class=None, **kwargs):
@@ -98,18 +99,18 @@ class SimpleSearchQuery(BaseSearchQuery):
def build_query(self):
if not self.query_filter:
return '*'
-
+
return self._build_sub_query(self.query_filter)
-
+
def _build_sub_query(self, search_node):
term_list = []
-
+
for child in search_node.children:
if isinstance(child, SearchNode):
term_list.append(self._build_sub_query(child))
else:
term_list.append(child[1])
-
+
return (' ').join(term_list)
106 haystack/backends/solr_backend.py
View
@@ -1,4 +1,5 @@
import logging
+import warnings
from django.conf import settings
from django.core.exceptions import ImproperlyConfigured
from django.db.models.loading import get_model
@@ -7,6 +8,7 @@
from haystack.exceptions import MissingDependency, MoreLikeThisError
from haystack.models import SearchResult
from haystack.utils import get_identifier
+from haystack.utils.geo import Distance, generate_bounding_box
try:
from django.db.models.sql.query import get_proxied_model
except ImportError:
@@ -106,7 +108,8 @@ def clear(self, models=[], commit=True):
@log_query
def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
fields='', highlight=False, facets=None, date_facets=None, query_facets=None,
- narrow_queries=None, spelling_query=None,
+ narrow_queries=None, spelling_query=None, within=None,
+ dwithin=None, distance_point=None,
limit_to_registered_models=None, result_class=None, **kwargs):
if len(query_string) == 0:
return {
@@ -117,6 +120,7 @@ def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
kwargs = {
'fl': '* score',
}
+ geo_sort = False
if fields:
if isinstance(fields, (list, set)):
@@ -125,7 +129,23 @@ def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
kwargs['fl'] = fields
if sort_by is not None:
- kwargs['sort'] = sort_by
+ if sort_by in ['distance asc', 'distance desc'] and distance_point:

If you apply multiple sort_by, this part fails to work. For example, if my sort_by is:

sort_by: u"bookable desc, available_pitches desc, distance asc, has_primary_photo desc, expected_value desc"

Then the "in ['distance asc', ..." part won't match and replace distance with geodist() like it needs to and the query will fail. It works if distance is the ONLY sort by. This is because sort_by is a unicode string from this point and not a list.

I would change it to something like:

    if sort_by is not None:
        print "sort_by: %s" % (sort_by,)
        if 'distance ' in sort_by and distance_point:
            # Do the geo-enabled sort.
            lng, lat = distance_point['point'].get_coords()
            kwargs['sfield'] = distance_point['field']
            kwargs['pt'] = '%s,%s' % (lat, lng)

            if 'distance asc' in sort_by:
                kwargs['sort'] = 'geodist() asc'
            else:
                kwargs['sort'] = 'geodist() desc'
        else:
            if 'distance ' in sort_by:
                warnings.warn("In order to sort by distance, you must call the '.distance(...)' method.")

            # Regular sorting.
            kwargs['sort'] = sort_by

I will fork and submit a pull request like I should

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ # Do the geo-enabled sort.
+ lng, lat = distance_point['point'].get_coords()
+ kwargs['sfield'] = distance_point['field']
+ kwargs['pt'] = '%s,%s' % (lat, lng)
+ geo_sort = True
+
+ if sort_by == 'distance asc':
+ kwargs['sort'] = 'geodist() asc'
+ else:
+ kwargs['sort'] = 'geodist() desc'
+ else:
+ if sort_by.startswith('distance '):
+ warnings.warn("In order to sort by distance, you must call the '.distance(...)' method.")
+
+ # Regular sorting.
+ kwargs['sort'] = sort_by
if start_offset is not None:
kwargs['start'] = start_offset
@@ -186,6 +206,31 @@ def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
if narrow_queries is not None:
kwargs['fq'] = list(narrow_queries)
+ if within is not None:
+ kwargs.setdefault('fq', [])
+ ((min_lat, min_lng), (max_lat, max_lng)) = generate_bounding_box(within['point_1'], within['point_2'])
+ # Bounding boxes are min, min TO max, max. Solr's wiki was *NOT*
+ # very clear on this.
+ bbox = '%s:[%s,%s TO %s,%s]' % (within['field'], min_lat, min_lng, max_lat, max_lng)
+ kwargs['fq'].append(bbox)
+
+ if dwithin is not None:
+ kwargs.setdefault('fq', [])
+ lng, lat = dwithin['point'].get_coords()
+ geofilt = '{!geofilt pt=%s,%s sfield=%s d=%s}' % (lat, lng, dwithin['field'], dwithin['distance'].km)
+ kwargs['fq'].append(geofilt)
+
+ # Check to see if the backend should try to include distances
+ # (Solr 4.X+) in the results.
+ if self.distance_available and distance_point:
+ # In early testing, you can't just hand Solr 4.X a proper bounding box
+ # & request distances. To enable native distance would take calculating
+ # a center point & a radius off the user-provided box, which kinda
+ # sucks. We'll avoid it for now, since Solr 4.x's release will be some
+ # time yet.
+ # kwargs['fl'] += ' _dist_:geodist()'
+ pass
+
try:
raw_results = self.conn.search(query_string, **kwargs)
except (IOError, SolrError), e:
@@ -195,7 +240,7 @@ def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
self.log.error("Failed to query Solr using '%s': %s", query_string, e)
raw_results = EmptyResults()
- return self._process_results(raw_results, highlight=highlight, result_class=result_class)
+ return self._process_results(raw_results, highlight=highlight, result_class=result_class, distance_point=distance_point)
def more_like_this(self, model_instance, additional_query_string=None,
start_offset=0, end_offset=None,
@@ -255,7 +300,7 @@ def more_like_this(self, model_instance, additional_query_string=None,
return self._process_results(raw_results, result_class=result_class)
- def _process_results(self, raw_results, highlight=False, result_class=None):
+ def _process_results(self, raw_results, highlight=False, result_class=None, distance_point=None):
from haystack import connections
results = []
hits = raw_results.hits
@@ -310,6 +355,14 @@ def _process_results(self, raw_results, highlight=False, result_class=None):
if raw_result[ID] in getattr(raw_results, 'highlighting', {}):
additional_fields['highlighted'] = raw_results.highlighting[raw_result[ID]]
+ if distance_point:
+ additional_fields['_point_of_origin'] = distance_point
+
+ if raw_result.get('__dist__'):
+ additional_fields['_distance'] = Distance(km=float(raw_result['__dist__']))
+ else:
+ additional_fields['_distance'] = None
+
result = result_class(app_label, model_name, raw_result[DJANGO_ID], raw_result['score'], **additional_fields)
results.append(result)
else:
@@ -329,7 +382,7 @@ def build_schema(self, fields):
for field_name, field_class in fields.items():
field_data = {
'field_name': field_class.index_fieldname,
- 'type': 'text',
+ 'type': 'text_en',
'indexed': 'true',
'stored': 'true',
'multi_valued': 'false',
@@ -344,15 +397,17 @@ def build_schema(self, fields):
if field_class.field_type in ['date', 'datetime']:
field_data['type'] = 'date'
elif field_class.field_type == 'integer':
- field_data['type'] = 'slong'
+ field_data['type'] = 'long'
elif field_class.field_type == 'float':
- field_data['type'] = 'sfloat'
+ field_data['type'] = 'float'
elif field_class.field_type == 'boolean':
field_data['type'] = 'boolean'
elif field_class.field_type == 'ngram':
field_data['type'] = 'ngram'
elif field_class.field_type == 'edge_ngram':
field_data['type'] = 'edge_ngram'
+ elif field_class.field_type == 'location':
+ field_data['type'] = 'location'
if field_class.is_multivalued:
field_data['multi_valued'] = 'true'
@@ -366,13 +421,13 @@ def build_schema(self, fields):
# If it's text and not being indexed, we probably don't want
# to do the normal lowercase/tokenize/stemming/etc. dance.
- if field_data['type'] == 'text':
+ if field_data['type'] == 'text_en':
field_data['type'] = 'string'
# If it's a ``FacetField``, make sure we don't postprocess it.
if hasattr(field_class, 'facet_for'):
# If it's text, it ought to be a string.
- if field_data['type'] == 'text':
+ if field_data['type'] == 'text_en':
field_data['type'] = 'string'
schema_fields.append(field_data)
@@ -416,6 +471,25 @@ class SolrSearchQuery(BaseSearchQuery):
def matching_all_fragment(self):
return '*:*'
+ def add_spatial(self, lat, lon, sfield, distance, filter='bbox'):
+ """Adds spatial query parameters to search query"""
+ kwargs = {
+ 'lat': lat,
+ 'long': long,
+ 'sfield': sfield,
+ 'distance': distance,
+ }
+ self.spatial_query.update(kwargs)
+
+ def add_order_by_distance(self, lat, long, sfield):
+ """Orders the search result by distance from point."""
+ kwargs = {
+ 'lat': lat,
+ 'long': long,
+ 'sfield': sfield,
+ }
+ self.order_by_distance.update(kwargs)
+
def build_query_fragment(self, field, filter_type, value):
from haystack import connections
result = ''
@@ -469,8 +543,11 @@ def run(self, spelling_query=None, **kwargs):
'result_class': self.result_class,
}
+ order_by_list = None
+
if self.order_by:
- order_by_list = []
+ if order_by_list is None:
+ order_by_list = []
for order_by in self.order_by:
if order_by.startswith('-'):
@@ -504,6 +581,15 @@ def run(self, spelling_query=None, **kwargs):
if spelling_query:
search_kwargs['spelling_query'] = spelling_query
+ if self.within:
+ search_kwargs['within'] = self.within
+
+ if self.dwithin:
+ search_kwargs['dwithin'] = self.dwithin
+
+ if self.distance_point:
+ search_kwargs['distance_point'] = self.distance_point
+
results = self.backend.search(final_query, **search_kwargs)
self._results = results.get('results', [])
self._hit_count = results.get('hits', 0)
3  haystack/backends/whoosh_backend.py
View
@@ -255,7 +255,8 @@ def optimize(self):
@log_query
def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
fields='', highlight=False, facets=None, date_facets=None, query_facets=None,
- narrow_queries=None, spelling_query=None,
+ narrow_queries=None, spelling_query=None, within=None,
+ dwithin=None, distance_point=None,
limit_to_registered_models=None, result_class=None, **kwargs):
if not self.setup_complete:
self.setup()
4 haystack/constants.py
View
@@ -23,3 +23,7 @@
# A marker class in the hierarchy to indicate that it handles search data.
class Indexable(object):
haystack_use_for_indexing = True
+
+# For the geo bits, since that's what Solr & Elasticsearch seem to silently
+# assume...
+WGS_84_SRID = 4326
4 haystack/exceptions.py
View
@@ -25,3 +25,7 @@ class MoreLikeThisError(HaystackError):
class FacetingError(HaystackError):
"""Raised when incorrect arguments have been provided for faceting."""
pass
+
+class SpatialError(HaystackError):
+ """Raised when incorrect arguments have been provided for spatial."""
+ pass
192 haystack/fields.py
View
@@ -1,8 +1,8 @@
-from decimal import Decimal
import re
from django.utils import datetime_safe
from django.template import loader, Context
from haystack.exceptions import SearchFieldError
+from haystack.utils.geo import ensure_point, Point
class NOT_PROVIDED:
@@ -17,7 +17,7 @@ class NOT_PROVIDED:
class SearchField(object):
"""The base implementation of a search field."""
field_type = None
-
+
def __init__(self, model_attr=None, use_template=False, template_name=None,
document=False, indexed=True, stored=True, faceted=False,
default=NOT_PROVIDED, null=False, index_fieldname=None,
@@ -36,34 +36,34 @@ def __init__(self, model_attr=None, use_template=False, template_name=None,
self.index_fieldname = index_fieldname
self.boost = weight or boost
self.is_multivalued = False
-
+
# We supply the facet_class for making it easy to create a faceted
# field based off of this field.
self.facet_class = facet_class
-
+
if self.facet_class is None:
self.facet_class = FacetCharField
-
+
self.set_instance_name(None)
-
+
def set_instance_name(self, instance_name):
self.instance_name = instance_name
-
+
if self.index_fieldname is None:
self.index_fieldname = self.instance_name
-
+
def has_default(self):
"""Returns a boolean of whether this field has a default value."""
return self._default is not NOT_PROVIDED
-
+
@property
def default(self):
"""Returns the default value for the field."""
if callable(self._default):
return self._default()
-
+
return self._default
-
+
def prepare(self, obj):
"""
Takes data from the provided object and prepares it for storage in the
@@ -76,13 +76,13 @@ def prepare(self, obj):
# Check for `__` in the field for looking through the relation.
attrs = self.model_attr.split('__')
current_object = obj
-
+
for attr in attrs:
if not hasattr(current_object, attr):
raise SearchFieldError("The model '%s' does not have a model_attr '%s'." % (repr(obj), attr))
-
+
current_object = getattr(current_object, attr, None)
-
+
if current_object is None:
if self.has_default():
current_object = self._default
@@ -96,21 +96,21 @@ def prepare(self, obj):
break
else:
raise SearchFieldError("The model '%s' has an empty model_attr '%s' and doesn't allow a default or null value." % (repr(obj), attr))
-
+
if callable(current_object):
return current_object()
-
+
return current_object
-
+
if self.has_default():
return self.default
else:
return None
-
+
def prepare_template(self, obj):
"""
Flattens an object for indexing.
-
+
This loads a template
(``search/indexes/{app_label}/{model_name}_{field_name}.txt``) and
returns the result of rendering that template. ``object`` will be in
@@ -118,22 +118,22 @@ def prepare_template(self, obj):
"""
if self.instance_name is None and self.template_name is None:
raise SearchFieldError("This field requires either its instance_name variable to be populated or an explicit template_name in order to load the correct template.")
-
+
if self.template_name is not None:
template_names = self.template_name
-
+
if not isinstance(template_names, (list, tuple)):
template_names = [template_names]
else:
template_names = ['search/indexes/%s/%s_%s.txt' % (obj._meta.app_label, obj._meta.module_name, self.instance_name)]
-
+
t = loader.select_template(template_names)
return t.render(Context({'object': obj}))
-
+
def convert(self, value):
"""
Handles conversion between the data found and the type of the field.
-
+
Extending classes should override this method and provide correct
data coercion.
"""
@@ -142,30 +142,64 @@ def convert(self, value):
class CharField(SearchField):
field_type = 'string'
-
+
def __init__(self, **kwargs):
if kwargs.get('facet_class') is None:
kwargs['facet_class'] = FacetCharField
-
+
super(CharField, self).__init__(**kwargs)
-
+
def prepare(self, obj):
return self.convert(super(CharField, self).prepare(obj))
-
+
def convert(self, value):
if value is None:
return None
-
+
return unicode(value)
+class LocationField(SearchField):
+ field_type = 'location'
+
+ def prepare(self, obj):
+ value = super(LocationField, self).prepare(obj)
+
+ if value is None:
+ return None
+
+ pnt = ensure_point(value)
+ pnt_lng, pnt_lat = pnt.get_coords()
+ return "%s,%s" % (pnt_lat, pnt_lng)
+
+ def convert(self, value):
+ if value is None:
+ return None
+
+ if hasattr(value, 'geom_type'):
+ value = ensure_point(value)
+ return value
+
+ if isinstance(value, basestring):
+ lat, lng = value.split(',')
+ elif isinstance(value, (list, tuple)):
+ # GeoJSON-alike
+ lat, lng = value[1], value[0]
+ elif ininstance(value, dict):
+ lat = value.get('lat', 0)
+ lng = value.get('lon', 0)
+
+ value = Point(float(lng), float(lat))
+ return value
+
+
class NgramField(CharField):
field_type = 'ngram'
-
+
def __init__(self, **kwargs):
if kwargs.get('faceted') is True:
raise SearchFieldError("%s can not be faceted." % self.__class__.__name__)
-
+
super(NgramField, self).__init__(**kwargs)
@@ -175,150 +209,150 @@ class EdgeNgramField(NgramField):
class IntegerField(SearchField):
field_type = 'integer'
-
+
def __init__(self, **kwargs):
if kwargs.get('facet_class') is None:
kwargs['facet_class'] = FacetIntegerField
-
+
super(IntegerField, self).__init__(**kwargs)
-
+
def prepare(self, obj):
return self.convert(super(IntegerField, self).prepare(obj))
-
+
def convert(self, value):
if value is None:
return None
-
+
return int(value)
class FloatField(SearchField):
field_type = 'float'
-
+
def __init__(self, **kwargs):
if kwargs.get('facet_class') is None:
kwargs['facet_class'] = FacetFloatField
-
+
super(FloatField, self).__init__(**kwargs)
-
+
def prepare(self, obj):
return self.convert(super(FloatField, self).prepare(obj))
-
+
def convert(self, value):
if value is None:
return None
-
+
return float(value)
class DecimalField(SearchField):
field_type = 'string'
-
+
def __init__(self, **kwargs):
if kwargs.get('facet_class') is None:
kwargs['facet_class'] = FacetDecimalField
-
+
super(DecimalField, self).__init__(**kwargs)
-
+
def prepare(self, obj):
return self.convert(super(DecimalField, self).prepare(obj))
-
+
def convert(self, value):
if value is None:
return None
-
+
return unicode(value)
class BooleanField(SearchField):
field_type = 'boolean'
-
+
def __init__(self, **kwargs):
if kwargs.get('facet_class') is None:
kwargs['facet_class'] = FacetBooleanField
-
+
super(BooleanField, self).__init__(**kwargs)
-
+
def prepare(self, obj):
return self.convert(super(BooleanField, self).prepare(obj))
-
+
def convert(self, value):
if value is None:
return None
-
+
return bool(value)
class DateField(SearchField):
field_type = 'date'
-
+
def __init__(self, **kwargs):
if kwargs.get('facet_class') is None:
kwargs['facet_class'] = FacetDateField
-
+
super(DateField, self).__init__(**kwargs)
-
+
def convert(self, value):
if value is None:
return None
-
+
if isinstance(value, basestring):
match = DATETIME_REGEX.search(value)
-
+
if match:
data = match.groupdict()
return datetime_safe.date(int(data['year']), int(data['month']), int(data['day']))
else:
raise SearchFieldError("Date provided to '%s' field doesn't appear to be a valid date string: '%s'" % (self.instance_name, value))
-
+
return value
class DateTimeField(SearchField):
field_type = 'datetime'
-
+
def __init__(self, **kwargs):
if kwargs.get('facet_class') is None:
kwargs['facet_class'] = FacetDateTimeField
-
+
super(DateTimeField, self).__init__(**kwargs)
-
+
def convert(self, value):
if value is None:
return None
-
+
if isinstance(value, basestring):
match = DATETIME_REGEX.search(value)
-
+
if match:
data = match.groupdict()
return datetime_safe.datetime(int(data['year']), int(data['month']), int(data['day']), int(data['hour']), int(data['minute']), int(data['second']))
else:
raise SearchFieldError("Datetime provided to '%s' field doesn't appear to be a valid datetime string: '%s'" % (self.instance_name, value))
-
+
return value
class MultiValueField(SearchField):
field_type = 'string'
-
+
def __init__(self, **kwargs):
if kwargs.get('facet_class') is None:
kwargs['facet_class'] = FacetMultiValueField
-
+
if kwargs.get('use_template') is True:
raise SearchFieldError("'%s' fields can not use templates to prepare their data." % self.__class__.__name__)
-
+
super(MultiValueField, self).__init__(**kwargs)
self.is_multivalued = True
-
+
def prepare(self, obj):
return self.convert(super(MultiValueField, self).prepare(obj))
-
+
def convert(self, value):
if value is None:
return None
-
+
return list(value)
@@ -326,12 +360,12 @@ class FacetField(SearchField):
"""
``FacetField`` is slightly different than the other fields because it can
work in conjunction with other fields as its data source.
-
+
Accepts an optional ``facet_for`` kwarg, which should be the field name
(not ``index_fieldname``) of the field it should pull data from.
"""
instance_name = None
-
+
def __init__(self, **kwargs):
handled_kwargs = self.handle_facet_parameters(kwargs)
super(FacetField, self).__init__(**handled_kwargs)
@@ -339,28 +373,28 @@ def __init__(self, **kwargs):
def handle_facet_parameters(self, kwargs):
if kwargs.get('faceted', False):
raise SearchFieldError("FacetField (%s) does not accept the 'faceted' argument." % self.instance_name)
-
+
if not kwargs.get('null', True):
raise SearchFieldError("FacetField (%s) does not accept False for the 'null' argument." % self.instance_name)
-
+
if not kwargs.get('indexed', True):
raise SearchFieldError("FacetField (%s) does not accept False for the 'indexed' argument." % self.instance_name)
-
+
if kwargs.get('facet_class'):
raise SearchFieldError("FacetField (%s) does not accept the 'facet_class' argument." % self.instance_name)
-
+
self.facet_for = None
self.facet_class = None
-
+
# Make sure the field is nullable.
kwargs['null'] = True
-
+
if 'facet_for' in kwargs:
self.facet_for = kwargs['facet_for']
del(kwargs['facet_for'])
-
+
return kwargs
-
+
def get_facet_for_name(self):
return self.facet_for or self.instance_name
115 haystack/models.py
View
@@ -4,7 +4,13 @@
from django.db import models
from django.utils.encoding import force_unicode
from django.utils.text import capfirst
-from haystack.exceptions import NotHandled
+from haystack.exceptions import NotHandled, SpatialError
+from haystack.utils.geo import Distance
+
+try:
+ from geopy import distance as geopy_distance
+except ImportError:
+ geopy_distance = None
# Not a Django model, but tightly tied to them and there doesn't seem to be a
@@ -13,7 +19,7 @@ class SearchResult(object):
"""
A single search result. The actual object is loaded lazily by accessing
object; until then this object only stores the model, pk, and score.
-
+
Note that iterating over SearchResults and getting the object for each
result will do O(N) database queries, which may not fit your needs for
performance.
@@ -26,27 +32,29 @@ def __init__(self, app_label, model_name, pk, score, **kwargs):
self._model = None
self._verbose_name = None
self._additional_fields = []
+ self._point_of_origin = kwargs.pop('_point_of_origin', None)
+ self._distance = kwargs.pop('_distance', None)
self.stored_fields = None
self.log = self._get_log()
-
+
for key, value in kwargs.items():
if not key in self.__dict__:
self.__dict__[key] = value
self._additional_fields.append(key)
-
+
def _get_log(self):
return logging.getLogger('haystack')
-
+
def __repr__(self):
return "<SearchResult: %s.%s (pk=%r)>" % (self.app_label, self.model_name, self.pk)
-
+
def __unicode__(self):
return force_unicode(self.__repr__())
-
+
def __getattr__(self, attr):
if attr == '__getnewargs__':
raise AttributeError
-
+
return self.__dict__.get(attr, None)
def _get_searchindex(self):
@@ -60,7 +68,7 @@ def _get_object(self):
if self.model is None:
self.log.error("Model could not be found for SearchResult '%s'." % self)
return None
-
+
try:
try:
self._object = self.searchindex.read_queryset().get(pk=self.pk)
@@ -71,93 +79,126 @@ def _get_object(self):
except ObjectDoesNotExist:
self.log.error("Object could not be found in database for SearchResult '%s'." % self)
self._object = None
-
+
return self._object
-
+
def _set_object(self, obj):
self._object = obj
-
+
object = property(_get_object, _set_object)
-
+
def _get_model(self):
if self._model is None:
self._model = models.get_model(self.app_label, self.model_name)
-
+
return self._model
-
+
def _set_model(self, obj):
self._model = obj
-
+
model = property(_get_model, _set_model)
-
+
+ def _get_distance(self):
+ if self._distance is None:
+ # We didn't get it from the backend & we haven't tried calculating
+ # it yet. Check if geopy is available to do it the "slow" way
+ # (even though slow meant 100 distance calculations in 0.004 seconds
+ # in my testing).
+ if geopy_distance is None:
+ raise SpatialError("The backend doesn't have 'DISTANCE_AVAILABLE' enabled & the 'geopy' library could not be imported, so distance information is not available.")
+
+ if not self._point_of_origin:
+ raise SpatialError("The original point is not available.")
+
+ if not hasattr(self, self._point_of_origin['field']):
+ raise SpatialError("The field '%s' was not included in search results, so the distance could not be calculated." % self._point_of_origin['field'])
+
+ po_lng, po_lat = self._point_of_origin['point'].get_coords()
+ location_field = getattr(self, self._point_of_origin['field'])
+
+ if location_field is None:
+ return None
+
+ lf_lng, lf_lat = location_field.get_coords()
+ self._distance = Distance(km=geopy_distance.distance((po_lat, po_lng), (lf_lat, lf_lng)).km)
+
+ # We've either already calculated it or the backend returned it, so
+ # let's use that.
+ return self._distance
+
+ def _set_distance(self, dist):
+ self._distance = dist
+
+ distance = property(_get_distance, _set_distance)
+
def _get_verbose_name(self):
if self.model is None:
self.log.error("Model could not be found for SearchResult '%s'." % self)
return u''
-
+
return force_unicode(capfirst(self.model._meta.verbose_name))
-
+
verbose_name = property(_get_verbose_name)
-
+
def _get_verbose_name_plural(self):
if self.model is None:
self.log.error("Model could not be found for SearchResult '%s'." % self)
return u''
-
+
return force_unicode(capfirst(self.model._meta.verbose_name_plural))
-
+
verbose_name_plural = property(_get_verbose_name_plural)
-
+
def content_type(self):
"""Returns the content type for the result's model instance."""
if self.model is None:
self.log.error("Model could not be found for SearchResult '%s'." % self)
return u''
-
+
return unicode(self.model._meta)
-
+
def get_additional_fields(self):
"""
Returns a dictionary of all of the fields from the raw result.
-
+
Useful for serializing results. Only returns what was seen from the
search engine, so it may have extra fields Haystack's indexes aren't
aware of.
"""
additional_fields = {}
-
+
for fieldname in self._additional_fields:
additional_fields[fieldname] = getattr(self, fieldname)
-
+
return additional_fields
-
+
def get_stored_fields(self):
"""
Returns a dictionary of all of the stored fields from the SearchIndex.
-
+
Useful for serializing results. Only returns the fields Haystack's
indexes are aware of as being 'stored'.
"""
if self._stored_fields is None:
from haystack import connections
from haystack.exceptions import NotHandled
-
+
try:
index = connections['default'].get_unified_index().get_index(self.model)
except NotHandled:
# Not found? Return nothing.
return {}
-
+
self._stored_fields = {}
-
+
# Iterate through the index's fields, pulling out the fields that
# are stored.
for fieldname, field in index.fields.items():
if field.stored is True:
self._stored_fields[fieldname] = getattr(self, fieldname, u'')
-
+
return self._stored_fields
-
+
def __getstate__(self):
"""
Returns a dictionary representing the ``SearchResult`` in order to
@@ -168,7 +209,7 @@ def __getstate__(self):
ret_dict = self.__dict__.copy()
del(ret_dict['log'])
return ret_dict
-
+
def __setstate__(self, data_dict):
"""
Updates the object's attributes according to data passed by pickle.
@@ -181,7 +222,7 @@ def __setstate__(self, data_dict):
# ``RealTimeSearchIndex`` are setup in time to handle data changes.
def load_indexes(sender, instance, *args, **kwargs):
from haystack import connections
-
+
for conn in connections.all():
conn.get_unified_index().setup_indexes()
27 haystack/query.py
View
@@ -313,6 +313,12 @@ def order_by(self, *args):
return clone
+ def order_by_distance(self, **kwargs):
+ """Alters the order in which the results should appear."""
+ clone = self._clone()
+ clone.query.add_order_by_distance(**kwargs)
+ return clone
+
def highlight(self):
"""Adds highlighting to the results."""
clone = self._clone()
@@ -354,6 +360,27 @@ def facet(self, field):
clone.query.add_field_facet(field)
return clone
+ def within(self, field, point_1, point_2):
+ """Spatial: Adds a bounding box search to the query."""
+ clone = self._clone()
+ clone.query.add_within(field, point_1, point_2)
+ return clone
+
+ def dwithin(self, field, point, distance):
+ """Spatial: Adds a distance-based search to the query."""
+ clone = self._clone()
+ clone.query.add_dwithin(field, point, distance)
+ return clone
+
+ def distance(self, field, point):
+ """
+ Spatial: Denotes results must have distance measurements from the
+ provided point.
+ """
+ clone = self._clone()
+ clone.query.add_distance(field, point)
+ return clone
+
def date_facet(self, field, start_date, end_date, gap_by, gap_amount=1):
"""Adds faceting to a query for the provided field by date."""
clone = self._clone()
103 haystack/templates/search_configuration/solr.xml
View
@@ -16,42 +16,82 @@
limitations under the License.
-->
-<schema name="default" version="1.1">
+<schema name="default" version="1.4">
<types>
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
+ <fieldtype name="binary" class="solr.BinaryField"/>
<!-- Numeric field types that manipulate the value into
a string value that isn't human-readable in its internal form,
but with a lexicographic ordering the same as the numeric ordering,
so that range queries work correctly. -->
- <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
- <fieldType name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/>
- <fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>
- <fieldType name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/>
+ <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" sortMissingLast="true" positionIncrementGap="0"/>
+ <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" omitNorms="true" sortMissingLast="true" positionIncrementGap="0"/>
+ <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" sortMissingLast="true" positionIncrementGap="0"/>
+ <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" omitNorms="true" sortMissingLast="true" positionIncrementGap="0"/>
- <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>
+ <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
+ <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
+ <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
+ <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
- <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
+ <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0"/>
+ <!-- A Trie based date field for faster date range queries and date faceting. -->
+ <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/>
+
+ <fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>
+ <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
+ <fieldtype name="geohash" class="solr.GeoHashField"/>
+
+ <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
- <tokenizer class="solr.WhitespaceTokenizerFactory"/>
+ <tokenizer class="solr.StandardTokenizerFactory"/>
+ <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
- <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
- <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
- <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
- <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
- <tokenizer class="solr.WhitespaceTokenizerFactory"/>
+ <tokenizer class="solr.StandardTokenizerFactory"/>
+ <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
- <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
- <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
- <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
- <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
+ </analyzer>
+ </fieldType>
+
+ <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
+ <analyzer type="index">
+ <tokenizer class="solr.StandardTokenizerFactory"/>
+ <filter class="solr.StopFilterFactory"
+ ignoreCase="true"
+ words="stopwords_en.txt"
+ enablePositionIncrements="true"
+ />
+ <filter class="solr.LowerCaseFilterFactory"/>
+ <filter class="solr.EnglishPossessiveFilterFactory"/>
+ <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
+ <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
+ <filter class="solr.EnglishMinimalStemFilterFactory"/>
+ -->
+ <filter class="solr.PorterStemFilterFactory"/>
+ </analyzer>
+ <analyzer type="query">
+ <tokenizer class="solr.StandardTokenizerFactory"/>
+ <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
+ <filter class="solr.StopFilterFactory"
+ ignoreCase="true"
+ words="stopwords_en.txt"
+ enablePositionIncrements="true"
+ />
+ <filter class="solr.LowerCaseFilterFactory"/>
+ <filter class="solr.EnglishPossessiveFilterFactory"/>
+ <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
+ <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
+ <filter class="solr.EnglishMinimalStemFilterFactory"/>
+ -->
+ <filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
@@ -60,7 +100,7 @@
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
-
+
<fieldType name="ngram" class="solr.TextField" >
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
@@ -72,7 +112,7 @@
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
-
+
<fieldType name="edge_ngram" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
@@ -88,21 +128,23 @@
</fieldType>
</types>
- <fields>
+ <fields>
<!-- general -->
<field name="{{ ID }}" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
- <field name="{{ DJANGO_CT }}" type="string" indexed="true" stored="true" multiValued="false" />
- <field name="{{ DJANGO_ID }}" type="string" indexed="true" stored="true" multiValued="false" />
+ <field name="{{ DJANGO_CT }}" type="string" indexed="true" stored="true" multiValued="false"/>
+ <field name="{{ DJANGO_ID }}" type="string" indexed="true" stored="true" multiValued="false"/>
- <dynamicField name="*_i" type="sint" indexed="true" stored="true"/>
+ <dynamicField name="*_i" type="int" indexed="true" stored="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
- <dynamicField name="*_l" type="slong" indexed="true" stored="true"/>
- <dynamicField name="*_t" type="text" indexed="true" stored="true"/>
+ <dynamicField name="*_l" type="long" indexed="true" stored="true"/>
+ <dynamicField name="*_t" type="text_en" indexed="true" stored="true"/>
<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
- <dynamicField name="*_f" type="sfloat" indexed="true" stored="true"/>
- <dynamicField name="*_d" type="sdouble" indexed="true" stored="true"/>
- <dynamicField name="*_dt" type="date" indexed="true" stored="true"/>
-
+ <dynamicField name="*_f" type="float" indexed="true" stored="true"/>
+ <dynamicField name="*_d" type="double" indexed="true" stored="true"/>
+ <dynamicField name="*_dt" type="date" indexed="true" stored="true"/>
+ <dynamicField name="*_p" type="location" indexed="true" stored="true"/>
+ <dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false"/>
+
{% for field in fields %}
<field name="{{ field.field_name }}" type="{{ field.type }}" indexed="{{ field.indexed }}" stored="{{ field.stored }}" multiValued="{{ field.multi_valued }}" />
{% endfor %}
@@ -115,6 +157,5 @@
<defaultSearchField>{{ content_field_name }}</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
- <solrQueryParser defaultOperator="{{ default_operator }}" />
+ <solrQueryParser defaultOperator="{{ default_operator }}"/>
</schema>
-
74 haystack/utils/geo.py
View
@@ -0,0 +1,74 @@
+from django.contrib.gis.geos import Point
+from django.contrib.gis.measure import Distance, D
+from haystack.constants import WGS_84_SRID
+from haystack.exceptions import SpatialError
+
+
+def ensure_geometry(geom):
+ """
+ Makes sure the parameter passed in looks like a GEOS ``GEOSGeometry``.
+ """
+ if not hasattr(geom, 'geom_type'):
+ raise SpatialError("Point '%s' doesn't appear to be a GEOS geometry." % geom)
+
+ return geom
+
+
+def ensure_point(geom):
+ """
+ Makes sure the parameter passed in looks like a GEOS ``Point``.
+ """
+ ensure_geometry(geom)
+
+ if geom.geom_type != 'Point':
+ raise SpatialError("Provided geometry '%s' is not a 'Point'." % geom)
+
+ return geom
+
+
+def ensure_wgs84(point):
+ """
+ Ensures the point passed in is a GEOS ``Point`` & returns that point's
+ data is in the WGS-84 spatial reference.
+ """
+ ensure_point(point)
+ # Clone it so we don't alter the original, in case they're using it for
+ # something else.
+ new_point = point.clone()
+
+ if not new_point.srid:
+ # It has no spatial reference id. Assume WGS-84.
+ new_point.set_srid(WGS_84_SRID)
+ elif new_point.srid != WGS_84_SRID:
+ # Transform it to get to the right system.
+ new_point.transform(WGS_84_SRID)
+
+ return new_point
+
+
+def ensure_distance(dist):
+ """
+ Makes sure the parameter passed in is a 'Distance' object.
+ """
+ try:
+ # Since we mostly only care about the ``.km`` attribute, make sure
+ # it's there.
+ km = dist.km
+ except AttributeError:
+ raise SpatialError("'%s' does not appear to be a 'Distance' object." % dist)
+
+ return dist
+
+
+def generate_bounding_box(point_1, point_2):
+ """
+ Takes two opposite corners of a bounding box (in any order) & generates
+ a two-tuple of the correct coordinates for the bounding box.
+
+ The two-tuple is in the form ``((min_lat, min_lng), (max_lat, max_lng))``.
+ """
+ lng_1, lat_1 = point_1.get_coords()
+ lng_2, lat_2 = point_2.get_coords()
+ min_lat, max_lat = min(lat_1, lat_2), max(lat_1, lat_2)
+ min_lng, max_lng = min(lng_1, lng_2), max(lng_1, lng_2)
+ return ((min_lat, min_lng), (max_lat, max_lng))
10 tests/core/models.py
View
@@ -59,4 +59,12 @@ class AFifthMockModel(models.Model):
objects = SoftDeleteManager()
def __unicode__(self):
- return self.author
+ return self.author
+
+class ASixthMockModel(models.Model):
+ name = models.CharField(max_length=255)
+ lat = models.FloatField()
+ lon = models.FloatField()
+
+ def __unicode__(self):
+ return self.name
23 tests/overrides/tests/altered_internal_names.py
View
@@ -13,7 +13,7 @@ class MockModelSearchIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(model_attr='foo', document=True)
name = indexes.CharField(model_attr='author')
pub_date = indexes.DateField(model_attr='pub_date')