Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

django.contrib.postgres.fields.ArrayField #2485

Closed
wants to merge 27 commits into from

Conversation

mjtamlyn
Copy link
Member

This is a first draft of Array fields. The basic field definition is there, with the required functionality to handle arrays of almost any type. I've also written the lookups/transforms specific to array fields.

Work still to do:

  • Docs
  • Form fields (naive and admin) and data cleaning
  • Handling dimensions

The last of these is a particularly interesting case. Postgres has a "casual relationship" with the definition of an array field. You can create integer[], integer[][], integer[3][4] etc, but postgres docs state that this is basically just documentation as it is not enforced at all. We have a couple of options here:

  • Force single dimensional, unbounded arrays always. This would be pretty boring.
  • Allow max_size=4 and do python side only validation. We'd still pass the correct [4] to postgres, but it won't enforce integrity.
  • Allow a complex dimensions flag to be passed allowing for any option. I think this isn't needed as if you want a 2-dimensional array you could do ArrayField(ArrayField(IntegerField())). This also makes the code path much easier as all the functions which delegate to the base_field don't have to worry about its dimensions.

In the absence of strong opinion otherwise, I'm going to do option 2.

Other notes for reviewers:

  • Related fields are banned. For M2M this is quite obviously necessary, but I've done so for FKs as well as they currently do not support referential integrity, which is what Django FKs try to enforce. Otherwise just use an integer.
  • Postgres uses 1-based indexing, but I'm converting this in the lookups from 0-based indexing. If someone is used to writing a lot of raw pg queries directly, this will be confusing, but to a normal python user we expect 0-based indexing everywhere.
  • At present I have not implemented contained_by, which is contains with the arguments reversed. It's basically a "is subset" operator. Thinking about it as I'm writing this, I think it does have use cases so I should add it in.
  • String based lookups (__iexact, startswith etc) continue to be accepted, even though they are largely useless. contains has been overloaded with a more sensible implementation. This is on the principle that date based fields accept them, and the query is functional (casts everything to text). Personally, I would like fields to only support the lookups which make sense on them now that is easily done, but this is a backwards incompatible change. I may open it up as a ticket when working on refactoring __year etc into transforms.
  • The approach for handling test models is copied from gis. As Anssi said on IRC, it might be nice if runtests didn't need to know about this, but it'll do for now.
  • There's a bit of hackery with the deconstruct method which means the __init__ accepts two formats for the base field. I wonder whether this could be avoided if there is a suitable hook in migrations.writer to allow me to pass a string containing the correct field definition for the base_field from deconstruct. This would make the migration files look less weird. @andrewgodwin is this sensible? Also should I have explicit tests that migrations work, and if so what would that look like?


def index_transform_factory(index, base_field):

class IndexTransform(Transform):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dont' create new classes dynamically for each query like this, IndexTransform should be factored out and take some parameter to it's constructor (and then offer an __call__ or something), same with SliceTransform.

@alex
Copy link
Member

alex commented Mar 26, 2014

I'd prefer these to live in the django.db.backends.postgresql_psycopg2 namespace than in contrib, but for the most part this looks awesome -- thanks for working on this!

@andrewgodwin
Copy link
Member

Option 2 for dimensions looks good.

As for deconstruction, what extra control would you like? I'd rather this stuff was more achievable from inside fields themselves. Looking over the diff, it looks like you'd want the ability to pass out whole field instances? That should work...

And for testing things with migrations, it's enough to just add migrations into a test app, and they'll get run at test time. If you want to explicitly test individual migration operations, you'll need something like I have in the "migrations" tests, where you swap in different values of MIGRATION_MODULES for certain tests and run the migrate command (or the machinery underlying it) directly.

@mjtamlyn
Copy link
Member Author

Thanks Andrew, I hadn't realised that deconstruction was recursive. I've added a test that MigrationWriter.serialize does what I expect it to, in addition to the deconstruct/reconstruct test. I think that should be sufficient.

@mjtamlyn
Copy link
Member Author

Most of the forms code is now present. The js in the admin needs improving, and the admin integration needs some tests. I need to look at how we've tested similar things in other areas to know exactly what to write here.

SimpleArrayField and SplitArrayField can be reviewed pretty well already though.

vals = json.loads(value)
value = []
for val in vals:
value.append(self.base_field.to_python(val))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a list comprehension here? (And on line 106)

value = [self.base_field.to_python(val) vor val in vals]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even the faster map(self.base_field.to_python, vals), but that's more arguable.

@BertrandBordage
Copy link
Contributor

Great work :) I really can't wait to see it in django!
My review was only formal, I didn't dig to understand how it really works.

NullableIntegerArrayModel.objects.create(field=[2, 3]),
NullableIntegerArrayModel.objects.create(field=[20, 30, 40]),
NullableIntegerArrayModel.objects.create(field=None),
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using a bulk_create here? I may be a bit obsessed with performance, but I like when tests also are fast ;)

self.objs = NullableIntegerArrayModel.objects.bulk_create([
    NullableIntegerArrayModel(field=[1]),
    NullableIntegerArrayModel(field=[2]),
    NullableIntegerArrayModel(field=[2, 3]),
    NullableIntegerArrayModel(field=[20, 30, 40]),
    NullableIntegerArrayModel(field=None),
])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bulk create bypasses some logic so I'd rather stick to the "safe" option.

@mjtamlyn mjtamlyn changed the title django.contrib.postgres.fields.ArrayField - WIP django.contrib.postgres.fields.ArrayField May 16, 2014
@mjtamlyn
Copy link
Member Author

Ok, so I have removed the admin functionality for now. In order to do this nicely, it seems likely I will need to do a more thorough review of how javascript widgets in the admin are built in order to make this work nicely. However, model field, form fields and documentation are ready for review. I think this is a complete enough patch for initial inclusion.

self.base_field.set_attributes_from_name(name)

@property
def definition(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed somewhere?

return '%s[%s]' % (self.base_field.db_type(connection), size)

def get_prep_value(self, value):
if isinstance(value, list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list is sufficient here or should this be for every iterable?

return self.widget.is_hidden

def value_from_datadict(self, data, files, name):
regex = re.compile(name + '_([0-9]+).*')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that name does not have to be escaped? I guess it should be a valid Python identifier and thus be safe, but maybe it's worth leaving a comment here?

…m_lookup and custom_transform.

Previously, class lookups from the output_type would be used, but any
changes to custom_lookup or custom_transform would be ignored.
Also fix slicing as much as it can be fixed.
If we aren't including the variable size one, we don't need to search
like this.
@mjtamlyn
Copy link
Member Author

Committed in 6041626

@mjtamlyn mjtamlyn closed this May 22, 2014
@pauloxnet
Copy link
Contributor

pauloxnet commented Oct 2, 2014

What about basic admin functionality for array field ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants