Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use JSON queries with Haystack using Elasticsearch? #927

Open
rogaha opened this issue Jan 17, 2014 · 12 comments
Open

How to use JSON queries with Haystack using Elasticsearch? #927

rogaha opened this issue Jan 17, 2014 · 12 comments

Comments

@rogaha
Copy link

rogaha commented Jan 17, 2014

I would like to use following JSON query with Haystack, but cannot find how to use JSON queries instead of QuerySet (as it seems there is no way to do this in QuerySet).

"query": {
    "filtered": {
        "filter": {
            "bool": {
                "must": [
                    {
                        "term": {
                            "django_ct": "repositories.repository"
                        }
                    },
                    {
                        "term": {
                            "status": 1
                        }
                    },
                    {
                        "term": {
                            "is_private": false
                        }
                    },
                    {
                        "term": {
                            "is_library": false
                        }
                    }
                ]
            }
        },
        "query": {
            "function_score": {
                "query": {
                    "bool": {
                        "should": [
                            {
                                "match": {
                                    "name": {
                                        "query": "hipache"
                                    }
                                }
                            },
                            {
                                "custom_boost_factor": {
                                    "boost_factor": 0.005,
                                    "query": {
                                        "multi_match": {
                                            "fields": [
                                                "name_auto^3",
                                                "name_auto.partial^2",
                                                "name_auto.partial_back",
                                                "name_auto.partial_middle"
                                            ],
                                            "query": "hipache"
                                        }
                                    }
                                }
                            }
                        ]
                    }
                },
                "script_score": {
                    "script": "_score * (1 + 10.0*log(1 + doc['pull_count'].value))"
                }
            }
        }
    }
}
@honzakral
Copy link
Contributor

Currently there is no clean and direct way to do it, I'd suggest either extending the backend to allow for such queries or dropping to the low-level client (by accessing conn attribute on the backend) and then calling backend._process_results to hook back into haystack's result parsing. See the search method on the backend to see how to exactly hook into it.

I will leave this open to serve as a place holder for the plug-in functionality I'd like to have - ability to provide a raw query and have haystack's query set functionality wrap it.

@rogaha
Copy link
Author

rogaha commented Jan 17, 2014

Ok, cool! Thanks for all the description. Let's see if I can do what you suggested.

@rogaha
Copy link
Author

rogaha commented Jan 18, 2014

Hi @honzakral, now it's working! I created a custom search and build_search__kwargsmethods to perform the custom queries! :)

Thanks you very much for your help!

@honi
Copy link

honi commented Jan 25, 2014

@rogaha could you please share your modifications? I'm after the same thing.

Did you manage to use the JSON query from a SearchQuerySet?

What I've managed to do so far is something like this:

# query is the JSON query
backend = connections.all()[0].get_backend()
raw_results = backend.conn.search(query, index=backend.index_name, doc_type='modelresult')
results = backend._process_results(raw_results)

This actually works, but I feel like a better approach would be something like:

sqs = SearchQuerySet().raw_query(query)

And this way I suppose it would be possible to use other methods provided by the SearchQuerySet, like using(), load_all(), etc.

@rogaha
Copy link
Author

rogaha commented Jan 25, 2014

Hi @honi,

I extended the class ElasticsearchSearchBackend with a custom_search() that takes a query term and build the JSON query in the elasticsearch format method and then I overwrote the method that I was using from SearchQuerySet() which was autocomplete(). So now my autocomplete looks like this:

def autocomplete(self, is_private=None, is_library=None, status=None, **kwargs):
        """
        A shortcut method to perform an autocomplete search.
        Must be run against fields that are either ``NgramField`` or
        ``EdgeNgramField``.
        """
        clone = self._clone()
        query_bits = []
        for field_name, query in kwargs.items():
            if not query:
                continue
            for word in query.split(' '):
                bit = clone.query.clean(word.strip())
                # fixes the issue with '/' from elasticsearch parser (issue: #2980)
                bit = bit.replace("/", "\\/")
                # Validate the term before add it to the ElasticSearch's request
                if re.match('[\w\d_-]+', word):
                    if '.' in field_name:
                        kwargs = [field_name]
                    else:
                        kwargs = [
                            field_name + '^3',
                            field_name + '.partial^2',
                            field_name + '.partial_back',
                            field_name + '.partial_middle'
                        ]
                    query = {bit: kwargs}
                    query_bits.append(query)
        if len(query_bits):
            results = clone.query.backend.custom_search(query_bits,
                                                        status=status,
                                                        is_private=is_private,
                                                        is_library=is_library)
            clone.query._results = results.get('results', [])
            clone.query._hit_count = results.get('hits', 0)
            return clone.query.get_results()
        return []

@maltem-za
Copy link

@rogaha
Copy link
Author

rogaha commented Feb 3, 2016

Awesome, thanks for sharing @maltem-za

@gamesbrainiac
Copy link

Is this yet fixed?

@barseghyanartur
Copy link
Contributor

@gamesbrainiac:

I doubt if would ever be fixed. Haystack is very slow in adapting new features. Even crucial must haves, like ElasticSearch 2.0 integration is taking so long to implement.

It has been a good library in past and is something most of us know how to use, but at the moment, I would think twice before using haystack in new projects at all.

@acdha
Copy link
Contributor

acdha commented Sep 19, 2016

Remember that Haystack is an entirely volunteer project. If something hasn't been implemented it usually means that nobody has volunteered to do it or, as in the case of ES2, get a pull request up to mergeable quality.

In this case, it hits one of the greatest challenges Haystack has which is that search engines are not as similar as SQL databases. If I needed this, I'd make a raw query against the backend connection as Holger suggested since you're already tying your app tightly to ES.

@gamesbrainiac
Copy link

@acdha I understand that, but what is the best way currently to do that? Is it to use backend.conn.search or would it rather be better to use the elasticsearch package, and use the DSL to query, therefore just using haystack as a means to push data to ES and not to search it.

@acdha
Copy link
Contributor

acdha commented Sep 19, 2016

@gamesbrainiac Using backend.conn.search avoids the need to maintain configuration in multiple places if you're using more than one backend server. The other question I'd ask is response formats: Haystack is designed to return ORM-like results and so you might reasonably set a policy that anything you can easily do using the SearchQuerySet interface stays there but once you start needing more complexity you move to the full full native query interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants