Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElasticSearch 2.0 support #1336

Closed
wants to merge 53 commits into from

Conversation

PedroAquilino
Copy link

Gives haystack support for Elasticsearch 2.x

  • Adds a new backend haystack.backends.elasticsearch2_backend.Elasticsearch2SearchEngine
  • Reuses the code and tests for ES 1.x
  • Changes facets (aggregations on ES 2.x)
  • Changes delete_by_query (scroll API and bulk actions)
  • "more like this" (More Like This Query)
  • Modifies the dependecy on elasticsearch library version

Issue:
#1247

Pedro Aquilino added 6 commits March 23, 2016 16:18
- Fix localhost IP in elasticsearch2 settings
- Port to connect 29200
- daemonize ES 2.x
- commented out ES 1.x service
- Fix catching exception on skipping tests
@@ -49,8 +52,8 @@ matrix:
- env: DJANGO_VERSION=">=1.9,<1.10"
- python: "pypy"

services:
- elasticsearch
#services:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep testing with 1.x. I'm not sure whether it'd be easier to run the tests with an alternate port or to use a build matrix to pick one of the two. It looks like travis-ci/apt-source-safelist@189b706 was merged awhile back so I think we could use a matrix like what they're doing here:

insomnia-lab/libreant@49fc5d0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a build matrix seems the best solution, maybe with only elasticsearch-1.7 and elasticsearch-2.1 versions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.7 and 2.x seems great to me - the main thing I was thinking is e.g. AWS with the hosted 1.7 used by many people

@acdha
Copy link
Contributor

acdha commented Mar 23, 2016

This is generally looking quite good — testing is always tedious with Haystack but I think you're very close.

@PedroAquilino
Copy link
Author

Finally elasticsearch in travis is installed using tar. The apt sources for es 2.x are broken, the url used is not longer valid.

W: Failed to fetch http://packages.elastic.co/elasticsearch/2.1/debian/dists/stable/main/binary-amd64/Packages  404  Not Found [IP: 184.73.186.87 80]
W: Failed to fetch http://packages.elastic.co/elasticsearch/2.1/debian/dists/stable/main/binary-i386/Packages  404  Not Found [IP: 184.73.186.87 80]

The versions are:
elasticsearch 1.7.5
elasticsearch 2.2.1

@PedroAquilino
Copy link
Author

Rebuilding on travis, I see than sometimes
test_haystack.solr_tests.test_management_commands.ManagementCommandTestCase#test_multiprocessing fails

Pedro Aquilino added 4 commits March 24, 2016 23:49
test_haystack.solr_tests.test_management_commands.ManagementCommandTestCase#test_multiprocessing
test_haystack.solr_tests.test_management_commands.ManagementCommandTestCase#test_multiprocessing
@acdha
Copy link
Contributor

acdha commented Mar 25, 2016

It would probably be a good idea to rebase this on 10bcc76 so to pick up Django 1.9.

re:the tests failing on multiprocessing, that should probably go into a separate ticket – there's a race condition which I have thought was fixed at least twice and it deserves a deeper investigation.

Pedro Aquilino added 7 commits March 25, 2016 19:55
- Fix localhost IP in elasticsearch2 settings
- Port to connect 29200
- daemonize ES 2.x
- commented out ES 1.x service
- Fix catching exception on skipping tests
@Jyrno42
Copy link

Jyrno42 commented Apr 5, 2016

Hey,

i wanted to point out that any query against elasticsearch>=2.1 will fail if one uses a slice where start_offset + end_offset is bigger than index.max_result_window which defaults to 10000. This is generally not an issue since standard use cases will not do this.

However, update_index --remove uses slices of SearchQuerySet(using=backend.connection_alias).models(model) to figure out which items in the index are stale which causes elasticsearch to return a 500 resulting in update index breaking

elasticsearch.exceptions.TransportError: TransportError(
    500, 
    'search_phase_execution_exception', 
    'Result window is too large, from + size must be less than or '
      'equal to: [10000] but was [10010]. See the scroll api for a '
      'more efficient way to request large data sets. This limit can be set by  '
      'changing the [index.max_result_window] index level parameter.')

@yeago
Copy link
Contributor

yeago commented Apr 11, 2016

Does anyone have a sense of what django versions will be still be supported by this changeset?

@PedroAquilino
Copy link
Author

The pull request is tested on django 1.8 and 1.9

@anmolghosh
Copy link

Any ETA for when these changes will be merged and released ?

@acdha
Copy link
Contributor

acdha commented Apr 25, 2016

@anmolghosh if you have time to contribute, please update this ticket with the results of your testing.

@frague59
Copy link

frague59 commented Jun 3, 2016

Add the AttachmentField from:

https://gist.github.com/frague59/8ab2470ed133754a6327

should be great, to support attachments : files indexed using apache Tika to explore documents.

@Alkalit
Copy link

Alkalit commented Jun 10, 2016

I use this branch on production. What I notice:
highlight doesn't work against autocomplete on Index class. Only against SearchQuerySet.

AdIndex.objects.autocomplete(text=request.GET['q']).highlight() # no highlight

SearchQuerySet().models(Ad).autocomplete(text_auto=request.GET['q']).highlight() # will highlight

highlight method doesn't accept arguments. And highlighted attribute contains list of highlighted snippets, not a dict.

@Alkalit
Copy link

Alkalit commented Jun 10, 2016

For arguments of highlighted - it would be enough to replace with this snippet

        if highlight:

            kwargs['highlight'] = {
                'fields': {
                    content_field: {'store': 'yes'},
                }
            }

            if isinstance(highlight, dict):
                kwargs['highlight'].update(highlight)

just here
https://github.com/PedroAquilino/django-haystack/blob/es2/haystack/backends/elasticsearch2_backend.py#L350

@PedroAquilino
Copy link
Author

Before adding new funcionality. (arguments of highlighted, AttachmentField) @acdha What we need to merge the changes?

@nherriot
Copy link

Hi Guys,

just to let you know - testing against elastic search 2.3 still gives errors with 'facets'. I'm just fresh into this and started looking at the commits.

Can someone get me started and point me to where the construction of the JSON object is done that gets passed to Elastic ? I'd like to just add some debug on my machine.

Kind regards, Nicholas.

@JanMalte
Copy link

@nherriot add simple logging to django maybe helps

LOGGING = {
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
            'stream': sys.stdout,
        }
    },
    'loggers': {
        'haystack': {
            'handlers': ['console'],
            'level': 'DEBUG',
            'propagate': False,
        },
        'elasticsearch': {
            'handlers': ['console'],
            'level': 'DEBUG',
            'propagate': False,
        },
    },
}

the JSON objects seems to be build in https://github.com/elastic/elasticsearch-py/

@nherriot
Copy link

Hi all,

OK this may be obvious to many of you. Just trying to checkout this branch. How do I do this?
I'm on origin/Head -> origin/master.

When I do a /> git branch -r . I can't see branch 'es2' ?
Is this private?
At the moment I can see that master does not have many of the v2 files - so I guess I'm doing something silly!

Thanks in advance to anyone who responds.

Kind regards, Nicholas.

@Alkalit
Copy link

Alkalit commented Jun 15, 2016

@nherriot I do this

git clone https://github.com/PedroAquilino/django-haystack.git
cd django-haystack
git checkout es2

@nherriot
Copy link

Hi all,

anything in particular that has to be done for Haystack to start using the new elasticsearch2_backend.py?
Looks like to me it still uses {'facet' .....bla } rather than {'aggs' .......bla}.

By the way if anyone is interested I have a hack that just really changes the string in the json request to ES to use the new 'aggs' parm. It works for us and gets us moving again. But I still see I might have some work to do on parsing the new response.
I take it the thinking is that Haystack is going to keep it's interface static and change the protocol to suit ES - is that correct? Is there anyone controlling this or managing our direction? Should this sort of chat be better on IRC or user groups?

Kind regards, Nicholas

@joaojunior
Copy link
Contributor

@PedroAquilino I see that exist some conflicts to merge, you have time to fix this?
I would like to refactoring this code because the code is very similar with version 1, What do you think about this?

@PedroAquilino
Copy link
Author

I see that exists some conflicts to merge, but I don't have time to fix it now...
About the refactoring... Go for it. My code is the elasticsearch_backend with little changes (deprecated ES APIs)

joaojunior added a commit to joaojunior/django-haystack that referenced this pull request Jun 28, 2016
… request is to permit use ElasticSearch 2.X
@joaojunior joaojunior mentioned this pull request Jun 28, 2016
joaojunior added a commit to joaojunior/django-haystack that referenced this pull request Jul 13, 2016
… request is to permit use ElasticSearch 2.X
@hanneman
Copy link

hanneman commented Aug 30, 2016

Is there already an idea about a potential delivery...? And will it be production-ready? We have tested the code but are already experiencing issues while getting it running... We really need the ES2.x support (and the announced version 5 after summer). We use ES for GEO-queries and facets. #just-asking

Is there any other way we can contribute?

@acdha
Copy link
Contributor

acdha commented Sep 28, 2016

@hanneman The main thing we're missing is someone who actually uses ES2 helping get a release-quality branch ready to merge. If memory serves, #1391 was the closest and could use more testing / review.

ElSaico pushed a commit to CraveFood/django-haystack that referenced this pull request Dec 5, 2016
… request is to permit use ElasticSearch 2.X
@seocam
Copy link

seocam commented Dec 13, 2016

This has been replaced by #1460.

@acdha acdha closed this Dec 13, 2016
acdha added a commit to acdha/django-haystack that referenced this pull request Jan 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet