Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_indexing_queryset() returns unpredictable results with unordered querysets #29

Closed
cedricraud opened this issue Nov 9, 2022 · 0 comments · Fixed by #30
Closed

get_indexing_queryset() returns unpredictable results with unordered querysets #29

cedricraud opened this issue Nov 9, 2022 · 0 comments · Fixed by #30

Comments

@cedricraud
Copy link
Contributor

cedricraud commented Nov 9, 2022

The get_indexing_queryset() method slices the queryset into manageable chunks, but the logic will be unpredictable if the queryset isn't ordered. Indeed querysets are not ordered by default, so slicing them can return the same objects several times.

Here is an illustration of this bevavior, on a PostreSQL database:

In  [1]: a = [o.id for o in doc.get_queryset()[0:4096]
In  [2]: b = [o.id for o in doc.get_queryset()[4096:8192]
In  [3]: len(a), len(set(a))
Out [3]: (4096, 4096)
In  [4]: len(b), len(set(b))
Out [4]: (3664, 3664)
In  [5]: len(a+b), len(set(a+b))
Out [5]: (7760, 4096) # 3664 objects are missing 🤷‍♂️

This is unfortunate since it impacts the ability of document index management command to actually index all documents.

We could advise users to always provide ordered querysets in the documentation, but ordering directly in get_indexing_queryset() would be more transparent and reliable.

Environment: Django 2.2.24, django-opensearch-dsl 0.2.0 and PostgreSQL 13.7.

cedricraud added a commit to cedricraud/django-opensearch-dsl that referenced this issue Nov 9, 2022
cedricraud added a commit to cedricraud/django-opensearch-dsl that referenced this issue Nov 9, 2022
@cedricraud cedricraud changed the title get_indexing_queryset() returns unpredictable results with an unordered queryset get_indexing_queryset() returns unpredictable results with unordered querysets Nov 9, 2022
qcoumes pushed a commit that referenced this issue Nov 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant