Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for fuzzy searches? #164

Open
toastdriven opened this issue Jan 13, 2010 · 12 comments
Open

Support for fuzzy searches? #164

toastdriven opened this issue Jan 13, 2010 · 12 comments

Comments

@toastdriven
Copy link
Contributor

Solr supports fuzzy searching ("roam~" - http://lucene.apache.org/java/2_4_0/queryparsersyntax.html) which a lot of users have been looking for (perhaps better than wildcard search).

Investigate if Xapian/Whoosh support this and implement if so.

@notanumber
Copy link
Contributor

Xapian does not support this as far as I can tell. However, it could possibly be faked using spelling suggestions.

@gilnyc
Copy link

gilnyc commented Mar 4, 2010

I think Xapian does support this but the feature is turned off by default (see below from the documentation). Is there a way to supply flag arguments to Xapian (and presumably other backends) using the haystack framework?

Wildcards
The QueryParser supports using a trailing '' wildcard, which matches any number of trailing characters, so wildc would match wildcard, wildcarded, wildcards, wildcat, wildcats, etc. This feature is disabled by default - pass Xapian::QueryParser::FLAG_WILDCARD in the flags argument of Xapian::QueryParser::parse_query(query_string, flags) to enable it, and tell the QueryParser which database to expand wildcards from using the QueryParser::set_database(database) method.

@notanumber
Copy link
Contributor

FLAG_WILDCARD isn't quite the same as fuzzy search. In fact, the xapian backend code is using FLAG_WILDCARD to support the '*' operator.

Fuzzy search is a more complicated problem.

For example, given an index with the following terms:

'foo', 'food', 'foobar', 'for', 'four',

Searching for 'foo' with wildcard enabled would return:

'foo', 'food', and 'foobar'

When using fuzzy search (depending on the proximity), we'd get all of the terms.

See this Wikipedia article for more info: http://en.wikipedia.org/wiki/Approximate_string_matching

Just to clarify, I'm not saying it can't be done. Just that it's a little bit more complex then simply enabling FLAG_WILDCARD.

@gilnyc
Copy link

gilnyc commented Mar 4, 2010

I didn't mean to imply that FLAG_WILDCARD is the end-all for fuzzy matching. I guess there are 2 issues:

  1. Haystack could have a generic method for passing backend-specific keywords. Another one I could use right now is FLAG_PARTIAL.
  2. For fuzzy matching I'm now using a combination of icontain and 'more like this' and will add a phonetic capability (double-metaphone or whatever I can find a Python implementation for). I'll also readup what you suggested (e.g. n-gram, suffix-tree). Together they'll probably address what I need without too much custom coding for a quick-and-dirty fuzzy match.

On a somewhat different note, is there an example of sqs.spelling_suggestion? I'm not sure what its supposed to return and so haven't been able to build a test case to see if I got it working (would it suggest for example "barn" for "barne"?). Also, with Xapian I can't get .highlight to work - I didn't think I needed any special HTML tags to get this to work. Again, maybe I don't understand what it actually is supposed to be doing (assumed it would highlight the query text in the results with some color).

@LucianU
Copy link

LucianU commented Feb 7, 2013

Whoosh supports this through the FuzzyTerm class found in whoosh.query.

@webyneter
Copy link

webyneter commented Apr 10, 2017

Am I right that today there's no way to make Solr-driven fuzzy search work via haystack?

@acdha
Copy link
Contributor

acdha commented Apr 10, 2017

@webyneter No - it works in every backend which supports it. What we don't have is an abstraction layer which tries to cover the differences between the different search engines, since that's a fairly hard problem for all but the most basic use-cases.

http://django-haystack.readthedocs.io/en/v2.6.0/inputtypes.html#raw

@webyneter
Copy link

@acdha, yep, I've just stumbled upon Field Lookups section in the docs. Thanks for the quick response.

@SalahAdDin
Copy link

Maybe will be good make a way of extends haystack for people that need add more features, it makes easy implements this kind of features by self.

@acdha
Copy link
Contributor

acdha commented Apr 17, 2017

@SalahAdDin That's exactly what the input types system is designed to do: see http://django-haystack.readthedocs.io/en/v2.6.0/inputtypes.html

@divonelnc
Copy link

I am using haystack with Whoosh as a backend. Is there any way to perform a fuzzy search with those? SearchQuerySet().filter(content=search_string) does not seem to handle fuzziness.

@prafgup
Copy link

prafgup commented Jun 30, 2021

@divonelnc Any update regarding implementing fuzziness with Whoosh?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants