You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The get_indexing_queryset() method slices the queryset into manageable chunks, but the logic will be unpredictable if the queryset isn't ordered. Indeed querysets are not ordered by default, so slicing them can return the same objects several times.
Here is an illustration of this bevavior, on a PostreSQL database:
In [1]: a= [o.idforoindoc.get_queryset()[0:4096]
In [2]: b= [o.idforoindoc.get_queryset()[4096:8192]
In [3]: len(a), len(set(a))
Out [3]: (4096, 4096)
In [4]: len(b), len(set(b))
Out [4]: (3664, 3664)
In [5]: len(a+b), len(set(a+b))
Out [5]: (7760, 4096) # 3664 objects are missing 🤷♂️
This is unfortunate since it impacts the ability of document index management command to actually index all documents.
We could advise users to always provide ordered querysets in the documentation, but ordering directly in get_indexing_queryset() would be more transparent and reliable.
Environment: Django 2.2.24, django-opensearch-dsl 0.2.0 and PostgreSQL 13.7.
The text was updated successfully, but these errors were encountered:
cedricraud
added a commit
to cedricraud/django-opensearch-dsl
that referenced
this issue
Nov 9, 2022
cedricraud
changed the title
get_indexing_queryset() returns unpredictable results with an unordered querysetget_indexing_queryset() returns unpredictable results with unordered querysets
Nov 9, 2022
The get_indexing_queryset() method slices the queryset into manageable chunks, but the logic will be unpredictable if the queryset isn't ordered. Indeed querysets are not ordered by default, so slicing them can return the same objects several times.
Here is an illustration of this bevavior, on a PostreSQL database:
This is unfortunate since it impacts the ability of
document index
management command to actually index all documents.We could advise users to always provide ordered querysets in the documentation, but ordering directly in get_indexing_queryset() would be more transparent and reliable.
Environment: Django 2.2.24, django-opensearch-dsl 0.2.0 and PostgreSQL 13.7.
The text was updated successfully, but these errors were encountered: