New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

django.contrib.postgres.fields.ArrayField #2485

Closed
wants to merge 27 commits into
base: master
from

Conversation

Projects
None yet
9 participants
@mjtamlyn
Member

mjtamlyn commented Mar 26, 2014

This is a first draft of Array fields. The basic field definition is there, with the required functionality to handle arrays of almost any type. I've also written the lookups/transforms specific to array fields.

Work still to do:

  • Docs
  • Form fields (naive and admin) and data cleaning
  • Handling dimensions

The last of these is a particularly interesting case. Postgres has a "casual relationship" with the definition of an array field. You can create integer[], integer[][], integer[3][4] etc, but postgres docs state that this is basically just documentation as it is not enforced at all. We have a couple of options here:

  • Force single dimensional, unbounded arrays always. This would be pretty boring.
  • Allow max_size=4 and do python side only validation. We'd still pass the correct [4] to postgres, but it won't enforce integrity.
  • Allow a complex dimensions flag to be passed allowing for any option. I think this isn't needed as if you want a 2-dimensional array you could do ArrayField(ArrayField(IntegerField())). This also makes the code path much easier as all the functions which delegate to the base_field don't have to worry about its dimensions.

In the absence of strong opinion otherwise, I'm going to do option 2.

Other notes for reviewers:

  • Related fields are banned. For M2M this is quite obviously necessary, but I've done so for FKs as well as they currently do not support referential integrity, which is what Django FKs try to enforce. Otherwise just use an integer.
  • Postgres uses 1-based indexing, but I'm converting this in the lookups from 0-based indexing. If someone is used to writing a lot of raw pg queries directly, this will be confusing, but to a normal python user we expect 0-based indexing everywhere.
  • At present I have not implemented contained_by, which is contains with the arguments reversed. It's basically a "is subset" operator. Thinking about it as I'm writing this, I think it does have use cases so I should add it in.
  • String based lookups (__iexact, startswith etc) continue to be accepted, even though they are largely useless. contains has been overloaded with a more sensible implementation. This is on the principle that date based fields accept them, and the query is functional (casts everything to text). Personally, I would like fields to only support the lookups which make sense on them now that is easily done, but this is a backwards incompatible change. I may open it up as a ticket when working on refactoring __year etc into transforms.
  • The approach for handling test models is copied from gis. As Anssi said on IRC, it might be nice if runtests didn't need to know about this, but it'll do for now.
  • There's a bit of hackery with the deconstruct method which means the __init__ accepts two formats for the base field. I wonder whether this could be avoided if there is a suitable hook in migrations.writer to allow me to pass a string containing the correct field definition for the base_field from deconstruct. This would make the migration files look less weird. @andrewgodwin is this sensible? Also should I have explicit tests that migrations work, and if so what would that look like?
@alex

View changes

Show outdated Hide outdated django/contrib/postgres/fields/array.py
@@ -76,6 +77,8 @@ def get_test_modules():
os.path.isfile(f) or
not os.path.exists(os.path.join(dirpath, f, '__init__.py'))):
continue
if not connection.vendor == 'postgresql' and f == 'postgres_tests':
continue

This comment has been minimized.

@alex

alex Mar 26, 2014

Member

There's got to be a better way to do this than hacks in teh test runner.

@alex

alex Mar 26, 2014

Member

There's got to be a better way to do this than hacks in teh test runner.

This comment has been minimized.

@mjtamlyn

mjtamlyn Mar 26, 2014

Member

Options are:

  • Implement a way for the test runner to ask test modules if they can be included for a certain db, possibly via a default app config? This would be specific to our test suite but require changes in django.apps.
  • Put everything in if statements
  • Make all the models have managed=True if the connection is incorrect (this might not work, it's just an idea)
  • The current hack
  • Your ideas here...
@mjtamlyn

mjtamlyn Mar 26, 2014

Member

Options are:

  • Implement a way for the test runner to ask test modules if they can be included for a certain db, possibly via a default app config? This would be specific to our test suite but require changes in django.apps.
  • Put everything in if statements
  • Make all the models have managed=True if the connection is incorrect (this might not work, it's just an idea)
  • The current hack
  • Your ideas here...
@alex

This comment has been minimized.

Show comment
Hide comment
@alex

alex Mar 26, 2014

Member

I'd prefer these to live in the django.db.backends.postgresql_psycopg2 namespace than in contrib, but for the most part this looks awesome -- thanks for working on this!

Member

alex commented Mar 26, 2014

I'd prefer these to live in the django.db.backends.postgresql_psycopg2 namespace than in contrib, but for the most part this looks awesome -- thanks for working on this!

@andrewgodwin

This comment has been minimized.

Show comment
Hide comment
@andrewgodwin

andrewgodwin Mar 26, 2014

Member

Option 2 for dimensions looks good.

As for deconstruction, what extra control would you like? I'd rather this stuff was more achievable from inside fields themselves. Looking over the diff, it looks like you'd want the ability to pass out whole field instances? That should work...

And for testing things with migrations, it's enough to just add migrations into a test app, and they'll get run at test time. If you want to explicitly test individual migration operations, you'll need something like I have in the "migrations" tests, where you swap in different values of MIGRATION_MODULES for certain tests and run the migrate command (or the machinery underlying it) directly.

Member

andrewgodwin commented Mar 26, 2014

Option 2 for dimensions looks good.

As for deconstruction, what extra control would you like? I'd rather this stuff was more achievable from inside fields themselves. Looking over the diff, it looks like you'd want the ability to pass out whole field instances? That should work...

And for testing things with migrations, it's enough to just add migrations into a test app, and they'll get run at test time. If you want to explicitly test individual migration operations, you'll need something like I have in the "migrations" tests, where you swap in different values of MIGRATION_MODULES for certain tests and run the migrate command (or the machinery underlying it) directly.

@mjtamlyn

This comment has been minimized.

Show comment
Hide comment
@mjtamlyn

mjtamlyn Mar 26, 2014

Member

Thanks Andrew, I hadn't realised that deconstruction was recursive. I've added a test that MigrationWriter.serialize does what I expect it to, in addition to the deconstruct/reconstruct test. I think that should be sufficient.

Member

mjtamlyn commented Mar 26, 2014

Thanks Andrew, I hadn't realised that deconstruction was recursive. I've added a test that MigrationWriter.serialize does what I expect it to, in addition to the deconstruct/reconstruct test. I think that should be sufficient.

@mjtamlyn mjtamlyn referenced this pull request Apr 1, 2014

Merged

Django 1.7 compatibility #34

@mjtamlyn

This comment has been minimized.

Show comment
Hide comment
@mjtamlyn

mjtamlyn Apr 21, 2014

Member

Most of the forms code is now present. The js in the admin needs improving, and the admin integration needs some tests. I need to look at how we've tested similar things in other areas to know exactly what to write here.

SimpleArrayField and SplitArrayField can be reviewed pretty well already though.

Member

mjtamlyn commented Apr 21, 2014

Most of the forms code is now present. The js in the admin needs improving, and the admin integration needs some tests. I need to look at how we've tested similar things in other areas to know exactly what to write here.

SimpleArrayField and SplitArrayField can be reviewed pretty well already though.

@dbrgn

View changes

Show outdated Hide outdated django/contrib/postgres/fields/array.py
@dbrgn

View changes

Show outdated Hide outdated django/contrib/postgres/fields/array.py
@alex

View changes

Show outdated Hide outdated django/contrib/postgres/fields/array.py
@alex

View changes

Show outdated Hide outdated django/contrib/postgres/fields/array.py
@BertrandBordage

View changes

Show outdated Hide outdated django/contrib/postgres/admin.py
@BertrandBordage

View changes

Show outdated Hide outdated django/contrib/postgres/admin.py
@BertrandBordage

View changes

Show outdated Hide outdated django/contrib/postgres/fields/array.py
@BertrandBordage

View changes

Show outdated Hide outdated django/contrib/postgres/fields/array.py
@BertrandBordage

View changes

Show outdated Hide outdated django/contrib/postgres/fields/array.py
@BertrandBordage

View changes

Show outdated Hide outdated django/contrib/postgres/forms/array.py
if value:
items = value.split(self.delimiter)
else:
items = []

This comment has been minimized.

@BertrandBordage

BertrandBordage May 4, 2014

Contributor

items = value.split(self.delimiter) if value else [] is slightly faster.

@BertrandBordage

BertrandBordage May 4, 2014

Contributor

items = value.split(self.delimiter) if value else [] is slightly faster.

string_concat(self.error_messages['item_invalid'], error.message),
code='item_invalid',
params={'nth': i},
))

This comment has been minimized.

@BertrandBordage

BertrandBordage May 4, 2014

Contributor

A list comprehension would be a little faster.

EDIT: Forgive me, I misread the code.

@BertrandBordage

BertrandBordage May 4, 2014

Contributor

A list comprehension would be a little faster.

EDIT: Forgive me, I misread the code.

string_concat(self.error_messages['item_invalid'], error.message),
code='item_invalid',
params={'nth': i},
))

This comment has been minimized.

@BertrandBordage

BertrandBordage May 4, 2014

Contributor

Maybe a list comprehension here too.

@BertrandBordage

BertrandBordage May 4, 2014

Contributor

Maybe a list comprehension here too.

This comment has been minimized.

@mjtamlyn

mjtamlyn May 5, 2014

Member

This sort of example I don't think is worth making a list comp - the code is likely to be less readable and the loop will rarely contain many elements so there's no significant benefit.

@mjtamlyn

mjtamlyn May 5, 2014

Member

This sort of example I don't think is worth making a list comp - the code is likely to be less readable and the loop will rarely contain many elements so there's no significant benefit.

This comment has been minimized.

@BertrandBordage

BertrandBordage May 5, 2014

Contributor

That's why I wrote "maybe" ;)

@BertrandBordage

BertrandBordage May 5, 2014

Contributor

That's why I wrote "maybe" ;)

This comment has been minimized.

@BertrandBordage

BertrandBordage May 9, 2014

Contributor

Forget it, I misread the double for-loop.

@BertrandBordage

BertrandBordage May 9, 2014

Contributor

Forget it, I misread the double for-loop.

string_concat(self.error_messages['item_invalid'], error.message),
code='item_invalid',
params={'nth': i},
))

This comment has been minimized.

@BertrandBordage

BertrandBordage May 4, 2014

Contributor

Maybe a list comprehension here too.

EDIT: Forget it, I misread the double for-loop.

@BertrandBordage

BertrandBordage May 4, 2014

Contributor

Maybe a list comprehension here too.

EDIT: Forget it, I misread the double for-loop.

@BertrandBordage

View changes

Show outdated Hide outdated django/contrib/postgres/forms/array.py
if id_:
final_attrs = dict(final_attrs, id='%s_%s' % (id_, i))
output.append(self.widget.render(name + '_%s' % i, widget_value, final_attrs))
return mark_safe(self.format_output(output))

This comment has been minimized.

@BertrandBordage

BertrandBordage May 4, 2014

Contributor

Couldn't this mark_safe lead to an XSS?

@BertrandBordage

BertrandBordage May 4, 2014

Contributor

Couldn't this mark_safe lead to an XSS?

This comment has been minimized.

@mjtamlyn

mjtamlyn May 16, 2014

Member

The contents of each nested widget has already been formatted appropriately during its render, and the format_output should ensure that it's not adding any new user data.

@mjtamlyn

mjtamlyn May 16, 2014

Member

The contents of each nested widget has already been formatted appropriately during its render, and the format_output should ensure that it's not adding any new user data.

This comment has been minimized.

@BertrandBordage

BertrandBordage May 16, 2014

Contributor

Great :)

@BertrandBordage

BertrandBordage May 16, 2014

Contributor

Great :)

@BertrandBordage

View changes

Show outdated Hide outdated django/contrib/postgres/forms/array.py
@BertrandBordage

This comment has been minimized.

Show comment
Hide comment
@BertrandBordage

BertrandBordage May 4, 2014

Contributor

Great work :) I really can't wait to see it in django!
My review was only formal, I didn't dig to understand how it really works.

Contributor

BertrandBordage commented May 4, 2014

Great work :) I really can't wait to see it in django!
My review was only formal, I didn't dig to understand how it really works.

NullableIntegerArrayModel.objects.create(field=[2, 3]),
NullableIntegerArrayModel.objects.create(field=[20, 30, 40]),
NullableIntegerArrayModel.objects.create(field=None),
]

This comment has been minimized.

@BertrandBordage

BertrandBordage May 9, 2014

Contributor

Why not using a bulk_create here? I may be a bit obsessed with performance, but I like when tests also are fast ;)

self.objs = NullableIntegerArrayModel.objects.bulk_create([
    NullableIntegerArrayModel(field=[1]),
    NullableIntegerArrayModel(field=[2]),
    NullableIntegerArrayModel(field=[2, 3]),
    NullableIntegerArrayModel(field=[20, 30, 40]),
    NullableIntegerArrayModel(field=None),
])
@BertrandBordage

BertrandBordage May 9, 2014

Contributor

Why not using a bulk_create here? I may be a bit obsessed with performance, but I like when tests also are fast ;)

self.objs = NullableIntegerArrayModel.objects.bulk_create([
    NullableIntegerArrayModel(field=[1]),
    NullableIntegerArrayModel(field=[2]),
    NullableIntegerArrayModel(field=[2, 3]),
    NullableIntegerArrayModel(field=[20, 30, 40]),
    NullableIntegerArrayModel(field=None),
])

This comment has been minimized.

@mjtamlyn

mjtamlyn May 10, 2014

Member

Bulk create bypasses some logic so I'd rather stick to the "safe" option.

@mjtamlyn

mjtamlyn May 10, 2014

Member

Bulk create bypasses some logic so I'd rather stick to the "safe" option.

@Stranger6667

View changes

Show outdated Hide outdated django/contrib/postgres/fields/array.py

mjtamlyn added some commits May 15, 2014

Remove the admin code for now.
It needs a better way of handing JS widgets in the admin as a whole
before it is easy to write. In particular there are serious issues
involving DateTimePicker when used in an array.
Add a test for nested array fields with different delimiters.
This will be a documented pattern so having a test for it is useful.

@mjtamlyn mjtamlyn changed the title from django.contrib.postgres.fields.ArrayField - WIP to django.contrib.postgres.fields.ArrayField May 16, 2014

@mjtamlyn

This comment has been minimized.

Show comment
Hide comment
@mjtamlyn

mjtamlyn May 16, 2014

Member

Ok, so I have removed the admin functionality for now. In order to do this nicely, it seems likely I will need to do a more thorough review of how javascript widgets in the admin are built in order to make this work nicely. However, model field, form fields and documentation are ready for review. I think this is a complete enough patch for initial inclusion.

Member

mjtamlyn commented May 16, 2014

Ok, so I have removed the admin functionality for now. In order to do this nicely, it seems likely I will need to do a more thorough review of how javascript widgets in the admin are built in order to make this work nicely. However, model field, form fields and documentation are ready for review. I think this is a complete enough patch for initial inclusion.

mjtamlyn added some commits May 16, 2014

Fixed #22648 -- Transform.output_type should respect overridden custo…
…m_lookup and custom_transform.

Previously, class lookups from the output_type would be used, but any
changes to custom_lookup or custom_transform would be ignored.
Add some more tests for multidimensional arrays.
Also fix slicing as much as it can be fixed.
Simplify SplitArrayWidget's data loading.
If we aren't including the variable size one, we don't need to search
like this.
@mjtamlyn

This comment has been minimized.

Show comment
Hide comment
@mjtamlyn

mjtamlyn May 22, 2014

Member

Committed in 6041626

Member

mjtamlyn commented May 22, 2014

Committed in 6041626

@mjtamlyn mjtamlyn closed this May 22, 2014

@BertrandBordage

This comment has been minimized.

Show comment
Hide comment
@BertrandBordage

BertrandBordage May 22, 2014

You can also use isinstance(value, (list, tuple)).

You can also use isinstance(value, (list, tuple)).

@pauloxnet

This comment has been minimized.

Show comment
Hide comment
@pauloxnet

pauloxnet Oct 2, 2014

Contributor

Wath about basi admin functionality for array field ?

Contributor

pauloxnet commented Oct 2, 2014

Wath about basi admin functionality for array field ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment