Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large queries cause errors #9

Open
beruic opened this issue Oct 21, 2019 · 8 comments
Open

Large queries cause errors #9

beruic opened this issue Oct 21, 2019 · 8 comments

Comments

@beruic
Copy link

beruic commented Oct 21, 2019

I have a product where around 300k instances are created in an import.

Before importing (which I fear may fail as well), I perform an existence check against a single field in the form MyModel.objects.filter(my_field__in=new_values).exists() where new_values is a set of strings.

Here is the stack trace of the issue:

Traceback (most recent call last):
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/sql_server/pyodbc/base.py", line 536, in execute
    return self.cursor.execute(sql, params)
pyodbc.ProgrammingError: ('The SQL contains 6623 parameter markers, but 137695 parameters were supplied', 'HY000')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/home/jk/git/my-project/core/management/command_util.py", line 117, in wrapper
    return f(*args, **kwargs)
  File "/home/jk/git/my-project/core/management/command_util.py", line 145, in wrapper
    return f(*args, **options)
  File "/home/jk/git/my-project/core/management/command_util.py", line 117, in wrapper
    return f(*args, **kwargs)
  File "/home/jk/git/my-project/core/management/commands/my_import_command.py", line 159, in handle
    verify_only=verify_only,
  File "/home/jk/git/my-project/core/util.py", line 150, in import_and_verify_terms
    import_and_log_new_terms(user, model, new_terms[search_type], source_description)
  File "/home/jk/git/my-project/core/util.py", line 228, in import_and_log_new_terms
    if model.objects.filter(term__in=new_terms).exists():
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/models/query.py", line 766, in exists
    return self.query.has_results(using=self.db)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/models/sql/query.py", line 522, in has_results
    return compiler.has_results()
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1070, in has_results
    return bool(self.execute_sql(SINGLE))
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1100, in execute_sql
    cursor.execute(sql, params)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/backends/utils.py", line 99, in execute
    return super().execute(sql, params)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/home/jk/.virtualenvs/my-project/lib/python3.6/site-packages/sql_server/pyodbc/base.py", line 536, in execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: ('The SQL contains 6623 parameter markers, but 137695 parameters were supplied', 'HY000')
@beruic
Copy link
Author

beruic commented Oct 21, 2019

This issue seems similar to this one from the original repository: michiya#143

At least the hot-fix solution if the same: Chunk requests in chunks of 2000 and swallow the bad performance :)

@etiennepouliot
Copy link

etiennepouliot commented Mar 12, 2020

Here a workaround :

new_values = ','.join(new_values)
MyModel.objects.extra(where=[f"my_field in ( select * FROM SPLIT_STRING('{new_values}'),',')"]).exists()

Basically what it does it send a comma seperated value and then resplit it to use it in a query.

@etiennepouliot
Copy link

Don't event need SPLIT_STRING...

new_values = str(new_values).replace( '[' , '(' ).replace( ']' ,')' )
MyModel.objects.extra(where=[f"my_field in {new_values}"])

@etiennepouliot
Copy link

The issue that I have with my previous solutions is that it only works on a case by case basis and it will not work if we use prefetch_related with more that 2100 related objects.

To fix this , create a lookup.py next to your models.py with :

from django.db.models.fields import Field from django.db.models.lookups import In
@Field.register_lookup
class In(In):
    lookup_name = 'in'

    def as_sql(self, compiler, connection):
        max_in_list_size = 2100
        if self.rhs_is_direct_value() and max_in_list_size and len(self.rhs) > max_in_list_size:
            return self.split_parameter_list_as_sql(compiler, connection)
        return super().as_sql(compiler, connection)


    def split_parameter_list_as_sql(self, compiler, connection):
        lhs, lhs_params = self.process_lhs(compiler, connection)
        rhs, rhs_params = self.batch_process_rhs(compiler, connection)
        in_clause_elements = f"( {lhs} IN ( SELECT * FROM SPLIT_STRING(%s)))"
        params = [','.join(map(str,rhs_params)) ]
        return in_clause_elements, params

make sure to import in your models.py file :

from .lookups import In

and voilà it will works!

@etiennepouliot
Copy link

Split_string was added in SQL 2016, for previous version this function works for me :
https://stackoverflow.com/questions/10914576/t-sql-split-string

@techknowfile
Copy link

@etiennepouliot I'm trying your solution, but despite importing it into models.py, those functions in the IN lookup don't seem to be ran when I call prefetch_related. Is this not meant to work for that?

I'm using graphene django with gql_optimizer, which results in one giant query with 25000+ parameters

@techknowfile
Copy link

A solution has been implemented on the dev branch of the Microsoft fork. See my issue on that repo for details.

The solution uses the Microsoft recommended solution to large parameter lists by creating a TEMP TABLE and joining over that.

@etiennepouliot
Copy link

@etiennepouliot I'm trying your solution, but despite importing it into models.py, those functions in the IN lookup don't seem to be ran when I call prefetch_related. Is this not meant to work for that?

I'm using graphene django with gql_optimizer, which results in one giant query with 25000+ parameters

Indeed it's not working for prefetch_related this way.
I ended up editing directly django/db/models/lookups.py in my project as I didn't know how to overwrite this file globally.

If anybody know how, I would appreciate. Maybe I need to dig this deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants