Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #24483 -- Greedily consume choices iterators #4326

Closed

Conversation

davidszotten
Copy link
Contributor

If Field.choices is provided as an iterator, consume in __init__ instead
of using itertools.tee (which ends up holding everything in memory
anyway). Fixes bug where deconstruct was consuming iterator but
bypassing the call to tee.

@@ -411,13 +411,6 @@ def test_empty_iterator_choices(self):
self.assertEqual(WhizIterEmpty(c=None).c, None) # Blank value
self.assertEqual(WhizIterEmpty(c='').c, '') # Empty value

def test_charfield_get_choices_with_blank_iterator(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this test was only accidentally working, due to "empty" generators being truthy. E.g. calling get_choices with choices provided as an empty list raises an exception before this patch, and now "empty" generators have the same behaviour

@davidszotten davidszotten force-pushed the drop_choices_tee branch 2 times, most recently from acbfe36 to 1fe7e6c Compare March 15, 2015 19:24
@timgraham
Copy link
Member

buildbot, add to whitelist.

@timgraham
Copy link
Member

Could you try to add a test? (Hopefully something in model_fields that doesn't require keepdb.)

@davidszotten
Copy link
Contributor Author

sure, though one of the more obvious things to try is the bug(?) exposed by the test i deleted. didn't want to get into that rabbit-hole too.

Field.get_choices, first looks for choices, but if that's empty tries self.rel.to which of course breaks for fields where rel is None https://github.com/davidszotten/django/blob/drop_choices_tee/django/db/models/fields/__init__.py#L817

will have a look

If Field.choices is provided as an iterator, consume in __init__ instead
of using itertools.tee (which ends up holding everything in memory
anyway). Fixes bug where deconstruct was consuming iterator but
bypassing the call to `tee`.
@davidszotten
Copy link
Contributor Author

added a test that exposes the bug and breaks without the fix. the rest of the choice code should be plenty covered by existing tests

@timgraham
Copy link
Member

Thanks, I think we might need some more tests for the existing code before this patch just to help ensure we aren't causing a regression here. For example, the following patches seems to fix the problem reports in the ticket, but no tests fail in Django's test suite (seems there should be a test for that code if it's necessary):

diff --git a/django/db/models/fields/__init__.py b/django/db/models/fields/__init__.py
index 975653d..83243f3 100644
--- a/django/db/models/fields/__init__.py
+++ b/django/db/models/fields/__init__.py
@@ -395,7 +395,7 @@ class Field(RegisterLookupMixin):
             "unique_for_date": None,
             "unique_for_month": None,
             "unique_for_year": None,
-            "choices": [],
+#            "choices": [],
             "help_text": '',
             "db_column": None,
             "db_tablespace": settings.DEFAULT_INDEX_TABLESPACE,
@@ -405,7 +405,7 @@ class Field(RegisterLookupMixin):
         }
         attr_overrides = {
             "unique": "_unique",
-            "choices": "_choices",
+#            "choices": "_choices",
             "error_messages": "_error_messages",
             "validators": "_validators",
             "verbose_name": "_verbose_name",
@@ -414,8 +414,8 @@ class Field(RegisterLookupMixin):
         for name, default in possibles.items():
             value = getattr(self, attr_overrides.get(name, name))
             # Unroll anything iterable for choices into a concrete list
-            if name == "choices" and isinstance(value, collections.Iterable):
-                value = list(value)
+#            if name == "choices" and isinstance(value, collections.Iterable):
+#                value = list(value)
             # Do correct kind of comparison
             if name in equals_comparison:
                 if value != default:

@davidszotten
Copy link
Contributor Author

sure, something along those lines was my first fix. however, having been in the code that uses tee inside Field, i noticed that it wasn't needed, adds complexity, and, due to the usage patterns end up using at least as much memory as just consuming the iterator into a list and storing that

i'm not sure why deconstruct bypasses the choices property and goes straight for the _choices, the attr_overrides was added in the first commit of deconstruct. perhaps @andrewgodwin could shed some light?

@andrewgodwin
Copy link
Member

I believe it goes for _choices as that's usually much closer to what's passed in (some custom fields override choices() to return something entirely different) - the best variable for deconstruct is one that's assigned to directly in the constructor from the target variable and never touched.

@davidszotten
Copy link
Contributor Author

do you have an example (that we could turn into a test that would break with tim's inline patch)?

@timgraham
Copy link
Member

Here's a patch with some tests for deconstruction of Field.choices: #4346

As for this patch, please see ticket #2265 about why choices are evaluated lazily: "As for me i'm using generator to fetch choices list from DB on external host and advanges of lazy fetching should be seen quite well here." So I guess we might need another solution.

@andrewgodwin
Copy link
Member

I don't have an example to hand, I'm afraid - it's just something I dimly remember from the implementation last year.

The bigger problem is, of course, that deconstruct can never work with all dynamic iterators, like ones that pull from the database - thankfully, it is at least bounded. There's a potential call here that we could just special-case choices and blacklist it so it doesn't keep triggering changes, but I've been resisting special cases for ages...

@davidszotten
Copy link
Contributor Author

i need to have another look, but my understanding was that a choices iterator gets consumed at least once during the django startup sequence anyway (need to verify; but this may have changed since the original addition of tee several years ago), and if so, we may as well consume it in Field.__init__ and just store it as a list

@timgraham
Copy link
Member

Added a test to #4346 which is a custom field that overrides choices which breaks with this patch. Whether or not this case should be supported may be debatable.

@davidszotten
Copy link
Contributor Author

sure, crafting a test to trigger the behaviour is easy. my question was more: does this exist in the wild? i'm not familiar enough with the definition of "public api" to know whether overriding choices is considered ok or not, which as you say, is debatable

@timgraham
Copy link
Member

I think I'm okay with the change. If it breaks someone's code, we'll get a report about it and see what to do from there. Did you investigate enough to understand why this currently breaks with -k -v 2 though?

@davidszotten
Copy link
Contributor Author

autodetector = MigrationAutodetector(
is the reason it requires -v2. will have a look for why the -k is also required (at work during uk office hours; can reply but don't have time to dig though code while in the office atm, which is why a bunch of my answers say "will investigate" until i can find an evening to look more)

@davidszotten
Copy link
Contributor Author

ok. another data point: if a modelform that uses a field with iterator choices is defined, the iterator will be consumed in the modelform metaclass __new__ (i.e. at import time), which i think is why i was seeing these consumed at "startup time". However, this might be sufficiently later to warrant deferring the look-up (so the proposed patch in this pr is no good).

@davidszotten
Copy link
Contributor Author

so it seems to me that simply consuming any iterable in deconstruct can lead to unexpected behaviour. e.g. in the use-case mentioned in ticket #2265 it's not clear what the migrations framework should do. one option is an approach similar to the one for e.g. callable defaults, where we just reference the original function.

actually, while the docs (as we've discussed earlier) only require an "iterable", that might give people the (false) impression that a generator would be evaluated on access.

@timgraham
Copy link
Member

Field.choices didn't seem to support function references when I tested it (checks framework threw an error I think). I think your solution might be fine as fetching choices from an external database doesn't seem like a good idea these days, particularly as it would require generating new migrations each time those choices change.

@davidszotten
Copy link
Contributor Author

i guess the issue is that this sort of works if one isn't using migrations (though choices would still be set at import time, and not at request time)

@timgraham
Copy link
Member

I think we can make your change and see if anyone complains. If so, we could see if adding support for function references to Field.choices addresses the use case.

@davidszotten
Copy link
Contributor Author

ok. in that case, do you need anything else on this pr?

@davidszotten
Copy link
Contributor Author

hm. are we not at a point where this should be raised against master instead or 1.8.x?

@timgraham
Copy link
Member

I agree about fixing it in master and not 1.8.x at this point. The commit cherry picks cleanly though.

Merged in 80e3444, thanks!

@timgraham timgraham closed this Mar 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants