Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Haystack + Elasticsearch strange intermittent bug #1620

Open
2 tasks done
pascallando opened this issue Jun 24, 2018 · 6 comments
Open
2 tasks done

Haystack + Elasticsearch strange intermittent bug #1620

pascallando opened this issue Jun 24, 2018 · 6 comments
Labels

Comments

@pascallando
Copy link

I'm facing a strange issue with Elasticsearch backend.

Everything works great (indexing, querying…), but sometimes (and sometimes only), querying the index crashes with this error:

File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py" in inner
 35.             response = get_response(request)

File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py" in _get_response
 128.                 response = self.process_exception_by_middleware(e, request)

File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py" in _get_response
 126.                 response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/usr/local/lib/python3.5/dist-packages/django/contrib/auth/decorators.py" in _wrapped_view
 21.                 return view_func(request, *args, **kwargs)

File "/home/django/prod/mysite/forum/api.py" in get_topics
 66.     topics_list = SearchQuerySet().filter(**filter).models(Topic)

File "/usr/local/lib/python3.5/dist-packages/haystack/query.py" in __init__
 29.         self._determine_backend()

File "/usr/local/lib/python3.5/dist-packages/haystack/query.py" in _determine_backend
 55.         backend_alias = connection_router.for_read(**hints)

File "/usr/local/lib/python3.5/dist-packages/haystack/utils/loading.py" in for_read
 167.         return self._for_action('for_read', False, **hints)[0]

Exception Type: IndexError at /forum/api/get-topics/
Exception Value: list index out of range

The issue seems to be with the _determine_backend function.

My backend is configured this way in Django settings file:

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch2_backend.Elasticsearch2SearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'INDEX_NAME': 'myindex',
    },
}
  • Tested with the latest Haystack release
  • Tested with the current Haystack master branch

Expected behaviour

Search query should be issued to the search engine.

Actual behaviour

Fails at backend_alias = connection_router.for_read(**hints)

Steps to reproduce the behaviour

Hard to reproduce: issue is intermittent. Happens about 10% of the time.

Configuration

  • Operating system version: Mac OS 10.13.5 / Debian Stretch
  • Search engine version: 2.4.6, Build: 5376dca/2017-07-18T12:17:44Z, JVM: 1.8.0_121
  • Python version: 3.5/3.6
  • Django version: (2, 0, 5, 'final', 0)
  • Haystack version: 2.8.1
@acdha
Copy link
Contributor

acdha commented Jun 25, 2018

So there's clearly a problem with https://github.com/django-haystack/django-haystack/blob/master/haystack/utils/loading.py#L179 not handling the case where it gets no results. Since you have only one connection defined it's not immediately obvious why that's the case.

@acdha acdha added the bug label Jun 25, 2018
@pascallando
Copy link
Author

Hi there! Is there something I could do to help? Specific tests or something?

In addition to the details above, I noticed that although the error seems to occur "randomly", it is "more frequent" right after starting serving the application. For exemple on Django dev server (command python manage.py runserver), the very first refresh of the page frequently fails with a "Internal Server Error - IndexError: list index out of range" like documented above.

@acdha
Copy link
Contributor

acdha commented Jul 3, 2018 via email

@ckrybus
Copy link

ckrybus commented May 4, 2021

I had the same error, it turns out it is a concurrency issue. There is a warning in the docs The standard SearchView is not thread-safe. Use the search_view_factory function, which returns thread-safe instances of SearchView., but I missed probably because I'm using the SearchForm with a custom view ...

Switching from gunicorn --workers=X --threads=Y to gunicorn --workers=Z fixed it for me. So the fix is to use e.g. gunicorn without threads or use the search_view_factory as the docs say.

Below simplified haystack code which reproduces it:

reproduce.py

import random
from time import sleep
import threading


class DefaultRouter:
    pass


class ConnectionRouter(object):
    def __init__(self):
        self._routers = None

    @property
    def routers(self):
        if self._routers is None:
            print("None")
            router_list = [DefaultRouter]
            sleep(0.3)
            self._routers = []
            sleep(0.3)
            for router_class in router_list:
                self._routers.append(router_class())
        return self._routers

connection_router = ConnectionRouter()

def get_routers():
    sleep(random.choice([0.3,0.5,0.1]))
    return connection_router.routers[0]


threads = []
for i in range(10):
    t = threading.Thread(target=get_routers)
    threads.append(t)

for t in threads:
    t.start()

for t in threads:
    t.join()

python reproduce.py

λ python bla.py
None
None
None
None
None
None
None
None
None
Exception in thread Thread-10:
Traceback (most recent call last):
  File "/home/User/.pyenv/versions/3.6.12/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/User/.pyenv/versions/3.6.12/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "reproduce.py", line 33, in get_routers
    return connection_router.routers[0]
IndexError: list index out of range

@keeth
Copy link

keeth commented Feb 3, 2023

It looks like ConnectionRouter is not thread safe.

My hacky workaround was to hardcode the backend and skip the connection router entirely:

class ThreadSafeSearchQuerySet(SearchQuerySet):
    def __init__(self, using=None, query=None):

        if not using:
            using = "default"
        super().__init__(using, query)

A better fix would be to store ConnectionRouter.routers in a thread local.

@cristihainic
Copy link

cristihainic commented Jul 19, 2023

@keeth wouldn't SearchQuerySet(using='default') achieve the same as your child class?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants