New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #26167 -- Added support for functional indexes #8056

Open
wants to merge 26 commits into
base: master
from

Conversation

@MarkusH
Member

MarkusH commented Feb 12, 2017

This is a first shot at it would or could look like. There are probably dozens more things to look into and consider.

$ cat app/models.py
from django.db import models
from django.db.models.expressions import Ref, Value
from django.db.models.functions import Concat, Lower


class Foo(models.Model):
    name = models.CharField(max_length=255)

    class Meta:
        indexes = [
            models.FuncIndex(
                Concat(
                    Lower(Ref('name', None), output_field=models.CharField()),
                    Value('blub'),
                    output_field=models.CharField(),
                ),
                name='some_func_index'
            ),
        ]
$ python manage.py makemigrations -v 3
Migrations for 'app':
  app/migrations/0001_initial.py
    - Create model Foo
    - Create index some_func_index for model foo
$ cat app/migrations/0001_initial.py
# Generated by Django 2.0.dev20170212181215 on 2017-02-12 18:14

from django.db import migrations, models
import django.db.models.expressions
import django.db.models.functions.base


class Migration(migrations.Migration):

    initial = True

    dependencies = [
    ]

    operations = [
        migrations.CreateModel(
            name='Foo',
            fields=[
                ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                ('name', models.CharField(max_length=255)),
            ],
        ),
        migrations.AddIndex(
            model_name='foo',
            index=models.FuncIndex(expression=django.db.models.functions.base.Concat(django.db.models.functions.base.Lower(django.db.models.expressions.Ref('name', None), output_field=models.CharField()), django.db.models.expressions.Value('blub'), output_field=models.CharField()), name='some_func_index'),
        ),
    ]
$ python manage.py migrate app
Operations to perform:
  Apply all migrations: app
Running migrations:
  Applying app.0001_initial... OK
$ sqlite3 db.sqlite3
SQLite version 3.16.2 2017-01-06 16:32:41
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE IF NOT EXISTS "django_migrations" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "app" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "applied" datetime NOT NULL);
CREATE TABLE IF NOT EXISTS "app_foo" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "name" varchar(255) NOT NULL);
CREATE INDEX "some_func_index" ON "app_foo" (COALESCE(LOWER("name"), '') || COALESCE('blub', ''));
sqlite> .quit
@charettes

This comment has been minimized.

Member

charettes commented Feb 12, 2017

Out of curiosity, what's the reason the example index can't be defined as

Concat(Lower(F('name')), Value('blub'))
@MarkusH

This comment has been minimized.

Member

MarkusH commented Feb 13, 2017

@charettes Apparently, I haven't fully understood it, F() looks up fields in the Compiler and refers to already mentioned fields. However, the Compiler used here doesn't have any mentioned fields, thus using F() fails. But that's something I want to get to: Concat(Lower('name'), Value('blub'))

@jarshwah

This comment has been minimized.

Member

jarshwah commented Feb 13, 2017

It's been a little while since I looked into this, but you can't use F() because:

  1. F doesn't define an as_sql() method because;
  2. F must be resolved by the query, which turns it into either a Col or a Ref (I suspect a Col would work as well), which do have as_sql() methods
  3. To resolve an F, you need to hand it a query (+ joins), and the query is used to setup joins if needed.

You might be able to work around this by passing F.resolve_expression a fake query object with a resolve_ref method that simply verifies the column existing on the model, and returning the appropriate Col or Ref. Col looks like it outputs "table.column" while Ref outputs "column_alias" which can just be the column name.

You could try building non-query support into the F expression by overloading the resolve_expression method, but that kind of defeats the purpose of using F anyway, while complicating the code. F doesn't do the column/reference verification, that's handled by query.resolve_ref.

Hope that helps. Also, nice work! I really look forward to seeing this available.

return self.__class__ == other.__class__ and self.name == other.name
def __hash__(self):
return hash(self.name)

This comment has been minimized.

@Ian-Foote

Ian-Foote Feb 19, 2017

Contributor

These should be on F. I fixed this in my rebase of #7517.

@MarkusH MarkusH force-pushed the MarkusH:ticket26167-func-indexes branch from 9fe867d to 4d430fb Mar 3, 2017

@MarkusH MarkusH force-pushed the MarkusH:ticket26167-func-indexes branch from 8eb1989 to b2a6e82 Apr 5, 2017

@mjtamlyn mjtamlyn referenced this pull request Apr 7, 2017

Closed

[WIP] Fixed #28053 -- Support db_index=Index() #8322

2 of 10 tasks complete

@MarkusH MarkusH force-pushed the MarkusH:ticket26167-func-indexes branch 2 times, most recently from f473af1 to 0d77c47 May 7, 2017

@MarkusH

This comment has been minimized.

Member

MarkusH commented May 7, 2017

This is a second shot at how function based indexes could look like.

$ cat app/models.py
from django.db import models
from django.db.models.functions import Lower


class Foo(models.Model):
    name = models.CharField(max_length=255)

    class Meta:
        indexes = [
            models.Index(
                [Lower('name').desc()],
                name='some_func_index'
            ),
        ]
$ python manage.py makemigrations -v 3
Migrations for 'app':
  app/migrations/0001_initial.py
    - Create model Foo
    - Create index some_func_index for model foo
$ cat app/migrations/0001_initial.py
# Generated by Django 2.0.dev20170507133517 on 2017-05-07 13:52

from django.db import migrations, models
import django.db.models.expressions
import django.db.models.functions.base


class Migration(migrations.Migration):

    initial = True

    dependencies = [
    ]

    operations = [
        migrations.CreateModel(
            name='Foo',
            fields=[
                ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                ('name', models.CharField(max_length=255)),
            ],
        ),
        migrations.AddIndex(
            model_name='foo',
            index=models.Index(fields=[django.db.models.expressions.OrderBy(django.db.models.functions.base.Lower('name'), descending=True)], name='some_func_index'),
        ),
    ]

On SQLite3

$ python manage.py sqlmigrate app 0001
BEGIN;
--
-- Create model Foo
--
CREATE TABLE "app_foo" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "name" varchar(255) NOT NULL);
--
-- Create index some_func_index on field(s) OrderBy(Lower(F(name)), descending=True) of model foo
--
CREATE INDEX "some_func_index" ON "app_foo" (LOWER("name") DESC);
COMMIT;
$ python manage.py migrate app
Operations to perform:
  Apply all migrations: app
Running migrations:
  Applying app.0001_initial... OK
$ python manage.py dbshell
SQLite version 3.16.2 2017-01-06 16:32:41
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE IF NOT EXISTS "django_migrations" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "app" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "applied" datetime NOT NULL);
CREATE TABLE IF NOT EXISTS "app_foo" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "name" varchar(255) NOT NULL);
CREATE INDEX "some_func_index" ON "app_foo" (LOWER("name") DESC);
sqlite> .quit

On PostgreSQL

$ python manage.py sqlmigrate app 0001 --settings=testproject.pg_settings
BEGIN;
--
-- Create model Foo
--
CREATE TABLE "app_foo" ("id" serial NOT NULL PRIMARY KEY, "name" varchar(255) NOT NULL);
--
-- Create index some_func_index on field(s) OrderBy(Lower(F(name)), descending=True) of model foo
--
CREATE INDEX "some_func_index" ON "app_foo" (LOWER("name") DESC);
COMMIT;
$ python manage.py migrate app --settings=testproject.pg_settings
Operations to perform:
  Apply all migrations: app
Running migrations:
  Applying app.0001_initial... OK
$ python manage.py dbshell --settings=testproject.pg_settings
psql (9.6.2)
Type "help" for help.

django=> \dS+ app1_foo
                                                     Table "public.app1_foo"
 Column |          Type          |                       Modifiers                       | Storage  | Stats target | Description 
--------+------------------------+-------------------------------------------------------+----------+--------------+-------------
 id     | integer                | not null default nextval('app1_foo_id_seq'::regclass) | plain    |              | 
 name   | character varying(255) | not null                                              | extended |              | 
Indexes:
    "app1_foo_pkey" PRIMARY KEY, btree (id)
    "some_func_index" btree (lower(name::text) DESC)
sql = func_index.create_sql(Book, editor)
sql = sql.upper()
self.assertIn('LOWER', sql)

This comment has been minimized.

@apollo13

apollo13 May 7, 2017

Member

Given that the index is named title_lower_id this LOWER check could check against that as well as lower(xxx) -- please rename the index so there is only one match for LOWER (same for TITLE)

@felixxm

This comment has been minimized.

Member

felixxm commented May 8, 2017

I found this error:

======================================================================
ERROR: test_func_index (schema.tests.SchemaTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/django/tests/schema/tests.py", line 1771, in test_func_index
    sql = func_index.create_sql(Book, editor)
  File "/django/django/db/models/indexes.py", line 78, in create_sql
    sql_parameters = self.get_sql_create_template_values(model, schema_editor, using)
  File "/django/django/db/models/indexes.py", line 58, in get_sql_create_template_values
    expression = column_expression.resolve_expression(query)
  File "/django/django/db/models/functions/datetime.py", line 60, in resolve_expression
    field = copy.lhs.output_field
  File "/django/django/utils/functional.py", line 31, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
  File "/django/django/db/models/expressions.py", line 226, in output_field
    raise FieldError("Cannot resolve expression type, unknown output_field")
django.core.exceptions.FieldError: Cannot resolve expression type, unknown output_field

when I had added index with ExtractMonth to the test_func_index:

func_index = Index(fields=[ExtractMonth('pub_date')], name='pub_data_month_idx')
with connection.schema_editor() as editor:
    sql = func_index.create_sql(Book, editor)
@MarkusH

This comment has been minimized.

Member

MarkusH commented May 8, 2017

Thanks @apollo13 @felixxm . I think I took care of both your comments. The tests do need some major improvements.

sql = sql.upper()
self.assertIn('MONTH', sql)
self.assertIn('PUB_DATE', sql)
self.assertTrue('PUB_DATA_MONTH_IDX', sql)

This comment has been minimized.

@felixxm

felixxm May 8, 2017

Member

self.assertTrue -> self.assertIn above and below as well.

This comment has been minimized.

@MarkusH

MarkusH May 8, 2017

Member

🤦‍♂️ Thanks!

@MarkusH MarkusH force-pushed the MarkusH:ticket26167-func-indexes branch 2 times, most recently from 9f1972f to 4262296 May 8, 2017

@MarkusH MarkusH force-pushed the MarkusH:ticket26167-func-indexes branch from 4262296 to a593989 Jul 21, 2017

@MarkusH MarkusH changed the title from [WIP] Fixed #26167 -- Added support for functional indexes to Fixed #26167 -- Added support for functional indexes Jul 21, 2017

def __repr__(self):
return "{}({}, {})".format(
self.__class__.__name__, self.alias, self.target)
def as_sql(self, compiler, connection):
if self.index_col:
return "%s" % connection.ops.quote_name(self.alias), []

This comment has been minimized.

@felixxm

felixxm Jul 21, 2017

Member

I think that you can remove "%s" %, i.e.

return connection.ops.quote_name(self.alias), []
For example ``Index(fields=['headline', '-pub_date'])`` would create SQL with
``(headline, pub_date DESC)``. Index ordering isn't supported on MySQL. In that
case, a descending index is created as a normal index.
``Index(fields=[Lower('title').desc()])`` would create an index on the
lowercase value of the ``title`` field in descending order.

This comment has been minimized.

@felixxm

felixxm Jul 21, 2017

Member

Maybe it's also worth to add test and mention in doc that it's possible to create an index with math expressions, e.g.:

Index(fields=[F('height')*F('weight')], name='mul_idx')
.. code-block:: text
ERROR: functions in index expression must be marked IMMUTABLE

This comment has been minimized.

@felixxm

felixxm Jul 21, 2017

Member

There is a similar restriction on Oracle (see doc). Function must be declared as DETERMINISTIC.

@MarkusH

This comment has been minimized.

Member

MarkusH commented Jul 23, 2017

Thanks @felixxm. Updated.

@MarkusH MarkusH force-pushed the MarkusH:ticket26167-func-indexes branch 5 times, most recently from daf02f7 to 292c455 Jul 23, 2017

for index in indexes:
with self.subTest(index=index):
with self.assertRaisesRegex(ValueError, "Not creating expression index:.*"):

This comment has been minimized.

@atombrella

atombrella Jul 25, 2017

Contributor

assertRaisesRegex should be avoided, I believe, cc @timgraham

I've seen a pattern in the migration tests where self.assertIn is used to check for parts of the message.

This comment has been minimized.

@timgraham

timgraham Jul 25, 2017

Member

assertRaisesMessage uses assertIn so it works just as well without the need for the .*.

This comment has been minimized.

@MarkusH

MarkusH Jul 27, 2017

Member

Thanks. Fixed.

:PostgreSQL:
PostgreSQL requires that "*all functions and operators used in an index
definition must be 'immutable'*" --- `PostgreSQL documentation
<https://www.postgresql.org/docs/9.6/static/sql-createindex.html#AEN80057>`_.

This comment has been minimized.

@atombrella

atombrella Jul 25, 2017

Contributor

use current instead of 9.6

This comment has been minimized.

@MarkusH

MarkusH Jul 27, 2017

Member

Fixed.

from django.utils.encoding import force_bytes
__all__ = ['Index']
class ExpressionIndexNotSupported(ValueError):

This comment has been minimized.

@atombrella

atombrella Jul 25, 2017

Contributor

Inherit from django.db.NotSupportedError instead?

This comment has been minimized.

@MarkusH

MarkusH Jul 27, 2017

Member

Fixed

@MarkusH MarkusH force-pushed the MarkusH:ticket26167-func-indexes branch from 43c87b7 to 0cd5fb9 Jul 27, 2017

MarkusH added some commits May 7, 2017

Refs #26167 -- Used Ref over IndexCol
Reverts 0d77c47 and adds required
support to Ref instead.
@jarshwah

I've only done cursory code review, but it's looking nice! I haven't actually executed any of the code yet.

I'd encourage you to write tests as if you were trying to break Django. What happens if two indexes are created with the same name on the same model? Same name on different models? Should we namespace them perhaps? etc

I think this will be good to merge before cutoff. It's in really good shape, and it sounds like others have tested it quite well.

try:
sql = index.create_sql(model, self)
except ExpressionIndexNotSupported as e:
warnings.warn(str(e), RuntimeWarning)

This comment has been minimized.

@jarshwah

jarshwah Aug 7, 2017

Member

Why? Is this an error or not?

This comment has been minimized.

@MarkusH

MarkusH Aug 8, 2017

Member

While it's an error on platforms w/o support for expression indexes to create one, the presence of an index is often not required. Thus, I'm raising warning instead of blowing up. That's the only way I can imagine for 3rd party packages using expression indexes to use them while supporting all databases.

def __repr__(self):
return "{}({}, {})".format(
self.__class__.__name__, self.alias, self.target)
def as_sql(self, compiler, connection):
if self.index_col:
return connection.ops.quote_name(self.alias), []

This comment has been minimized.

@jarshwah

jarshwah Aug 7, 2017

Member

hmm alias generally refers to the table name - not the column name. You're changing the meaning of alias depending on the index_col.

Maybe just pass None, 'name', index_col=True ?

self.expressions = []
for field in self.fields:
if isinstance(field, str):
expression = F(field[1:]).desc() if field.startswith('-') else F(field)

This comment has been minimized.

@jarshwah

jarshwah Aug 7, 2017

Member

I believe we do something similar in order_by? Consider refactoring into a helper?

This comment has been minimized.

@MarkusH

MarkusH Aug 8, 2017

Member

All the occurrences of similar things in the order_by() and OrderBy() context (django.db.models.sql.query:add_ordering and django.db.models.sql.compiler:get_order_by) are based on using ASC / DESC over .asc() / .desc(). Further, the compiler related functions lookup table aliases and whatnot which we don't have in the index context.

This comment has been minimized.

@MarkusH

MarkusH Aug 8, 2017

Member

Lastly, the code doing ordering is already around. It's only updated.

This comment has been minimized.

@jarshwah

jarshwah Aug 8, 2017

Member

Yeah no need to worry about this. I didn't look up the other function, it was just from memory. Hard to refactor into common method so forget it.

if not any(f.name == name for f in self.model._meta.concrete_fields):
raise ValueError("Invalid reference to field '%s' on model '%s'" % (name, self.model._meta.label))
source_field = self.model._meta.get_field(name)
return Col(name, source_field, index_col=True)

This comment has been minimized.

@jarshwah

jarshwah Aug 7, 2017

Member

I think you should fix the args into here as mentioned above.

sql.index('WEIGHT') <
sql.index('+') <
sql.index('5')
)

This comment has been minimized.

@jarshwah

jarshwah Aug 7, 2017

Member

This test is cute <

@MarkusH MarkusH force-pushed the MarkusH:ticket26167-func-indexes branch from ff78154 to b4f6cbf Aug 8, 2017

@atombrella

A small style comment

@skipUnlessDBFeature('supports_expression_indexes')
def test_func_index_invalid_field(self):
func_index = Index(fields=[Lower('blub')], name='dolor_idx')
with self.assertRaisesMessage(ValueError, "Invalid reference to field 'blub' on model 'schema.Author'"):

This comment has been minimized.

@atombrella

atombrella Sep 1, 2017

Contributor

Use a msg-variable instead?

@atombrella

This comment has been minimized.

Contributor

atombrella commented Sep 8, 2017

@MarkusH I had a look in get_constraints and in PostgreSQL's backend the expression definition is part of this. Would it be an idea to include tests for this? The SQL for the index is only included in cases like this where you use an expression. However, a cast may be added to the SQL, e.g., lower(column) is lower(column::text) if column is not of type text. Oracle probably has something similar to pg_get_indexdef. SQLite doesn't seem like it does.

Please do rebase.

@bluetech

This comment has been minimized.

Contributor

bluetech commented Oct 12, 2017

Would be nice to have a test case using a subclass of Index, just to ensure it works. Example (IIUC):

from django.db import models, Func, F
from django.contrib.postgres.indexes import GistIndex


class Location(models.Model):
    lat = models.FloatField()
    lon = models.FloatField()

    class Meta:
        indexes = [
            GistIndex(fields=[Func(F('lat'), F('lon'), function='ll_to_earth')]),
        ]

should result in

CREATE INDEX idx_name ON app_location USING gist(ll_to_earth("lat", "lng"));
if not any(f.name == name for f in self.model._meta.concrete_fields):
raise ValueError("Invalid reference to field '%s' on model '%s'" % (name, self.model._meta.label))
source_field = self.model._meta.get_field(name)
return Col(None, source_field, index_col=True) # None refers to the table alias we don't have here

This comment has been minimized.

@charettes

charettes Jun 13, 2018

Member

Any thoughts about using a Col subclass here instead of this flag? You'd simply have to return IndexCol(None, source_field).

This comment has been minimized.

@Ian-Foote

Ian-Foote Jun 13, 2018

Contributor

I had a very similar requirement in my work on Check Constraints: 1a5ade0#diff-fe637a3ec6af9cbe2ee87293000e7c82R751

It's probably worthwhile checking if we can share some implementation details here.

@@ -686,22 +686,25 @@ class Col(Expression):
contains_column_references = True
def __init__(self, alias, target, output_field=None):
def __init__(self, alias, target, output_field=None, index_col=False):

This comment has been minimized.

@charettes

charettes Jun 13, 2018

Member

I think you should create a Col subclass for this purpose instead.

e.g.

class IndexCol(Col):
    def as_sql(self, compiler, connection):
        return connection.ops.quote_name(self.target.column), []

@atombrella atombrella referenced this pull request Jul 27, 2018

Merged

Fixed #29547 -- Added support for partial indexes. #10140

3 of 3 tasks complete
@schinckel

This comment has been minimized.

Contributor

schinckel commented Nov 9, 2018

I've been working on a CalculatedField, which has a significant overlap with this (in the sense of that the only part of my field that does not work is when you try to create an index on it).

https://github.com/schinckel/django-computed-field

Ignoring the fact I have to currently sniff up the stack to resolve the query (I'm thinking about trying to use Ref instead of F though, based on reading the stuff here), I wonder if this is a solution to this problem that could be considered. You could (hopefully) just use the computed field by name in the index stuff, and then it would handle all of the expression handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment