Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select_sublcasses does not work with prefetch_related #110

Open
Anislav opened this issue Feb 3, 2014 · 8 comments
Open

select_sublcasses does not work with prefetch_related #110

Anislav opened this issue Feb 3, 2014 · 8 comments

Comments

@Anislav
Copy link

Anislav commented Feb 3, 2014

Consider having the following example models:

from django.db import models
from model_utils.managers import InheritanceManager

class Post(models.Model):
    creation_date = models.DateTimeField(auto_now_add=True)
    objects = InheritanceManager()

class TextPost(Post):
    body = models.TextField()

class Photo(models.Model):
    image = models.ImageField(upload_to='photos')

class PhotoPost(Post):
    photos = models.ManyToManyField(Photo)

When I attempt to select all posts and prefetch PhotoPost's photos:

Post.objects.select_subclasses().prefetch_related('photos').all()

The following error is raised:

    (0.001) SELECT `photos_post`.`id`, `photos_post`.`creation_date`, `photos_textpost`.`post_ptr_id`, `photos_textpost`.`body`, `photos_photopost`.`post_ptr_id` FROM `photos_post` LEFT OUTER JOIN `photos_textpost` ON (`photos_post`.`id` = `photos_textpost`.`post_ptr_id`) LEFT OUTER JOIN `photos_photopost` ON (`photos_post`.`id` = `photos_photopost`.`post_ptr_id`) LIMIT 21; args=()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/models/query.py", line 77, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "/Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/models/query.py", line 102, in __iter__
    len(self)
  File "/Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/models/query.py", line 94, in __len__
    self._prefetch_related_objects()
  File "/Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/models/query.py", line 613, in _prefetch_related_objects
    prefetch_related_objects(self._result_cache, self._prefetch_related_lookups)
  File "/Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/models/query.py", line 1742, in prefetch_related_objects
    (attr, first_obj.__class__.__name__, lookup))
AttributeError: Cannot find 'photos' on TextPost object, 'photos' is an invalid parameter to prefetch_related()

Is prefetch_related supported in combination with select_sublcasses?
Note that I have tested this on Django 1.5 and 1.6 with the latest release of django-model-utils.

Disclaimer: the given example is reworked version of the following one: http://stackoverflow.com/questions/19511368/prefetch-related-with-django-1-5-django-model-utils

@kezabelle
Copy link
Collaborator

I wouldn't be entirely surprised to find that the InheritanceManager doesn't play nicely with prefetch_related; there are no testcases which demonstrate whether it should or should not work that I can see, and to the best of my understanding of how prefetch_related works, it makes certain assumptions, one of which is that the _result_cache is a homogenous set of objects, which is at odds with what InheritanceManager is all about.

I assume from your opening the ticket that .prefetch_related('photopost__photo') doesn't do the expected thing either?

@kezabelle
Copy link
Collaborator

If prefetch_related cannot be supported, or has caveats in how it must be used, we should probably clarify that position in the documentation, and in test cases which are marked as @expectedFailure where appropriate.

@Anislav
Copy link
Author

Anislav commented Feb 4, 2014

Yes, it does not work in the expected by me way when .prefetch_related('photopost__photo') is used. In this case too many SQL queries are executed which is not optimal. Look at the samples below:

When I execute the following:

posts = list(Post.objects.select_subclasses().prefetch_related('photopost__photos').all())

The following 3 SQL queries are executed:

(0.025) SELECT `photos_post`.`id`, `photos_post`.`creation_date`, `photos_textpost`.`post_ptr_id`, `photos_textpost`.`body`, `photos_photopost`.`post_ptr_id` FROM `photos_post` LEFT OUTER JOIN `photos_textpost` ON (`photos_post`.`id` = `photos_textpost`.`post_ptr_id`) LEFT OUTER JOIN `photos_photopost` ON (`photos_post`.`id` = `photos_photopost`.`post_ptr_id`); args=()
[DEBUG] Logger:django.db.backends, Date: 2014-02-04 08:40:01,155
File: /Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/backends/util.py, func/method: execute(), line: 51

    (0.008) SELECT `photos_post`.`id`, `photos_post`.`creation_date`, `photos_photopost`.`post_ptr_id` FROM `photos_photopost` INNER JOIN `photos_post` ON (`photos_photopost`.`post_ptr_id` = `photos_post`.`id`) WHERE `photos_photopost`.`post_ptr_id` IN (1, 2, 3); args=(1, 2, 3)
[DEBUG] Logger:django.db.backends, Date: 2014-02-04 08:40:01,157
File: /Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/backends/util.py, func/method: execute(), line: 51

    (0.000) SELECT (`photos_photopost_photos`.`photopost_id`) AS `_prefetch_related_val`, `photos_photo`.`id`, `photos_photo`.`image` FROM `photos_photo` INNER JOIN `photos_photopost_photos` ON (`photos_photo`.`id` = `photos_photopost_photos`.`photo_id`) WHERE `photos_photopost_photos`.`photopost_id` IN (2, 3); args=(2, 3)

It looks like the first query finds that there are Posts with ids 1, 2 & 3. Then the second one finds that only those with ids: 2 & 3 are of PhotoPost type. The last query selects the Photos for all PhotoPosts.
Although, the Photos are retrieved it seems that they are not cached because when I try to retrieve the Photos for one of the PhotoPosts, a new SQL query is executed:

>>> posts[1].photos.all()
[DEBUG] Logger:django.db.backends, Date: 2014-02-04 08:40:32,993
File: /Users/anislav/.virtualenvs/mysite/lib/python2.7/site-packages/django/db/backends/util.py, func/method: execute(), line: 51

    (0.001) SELECT `photos_photo`.`id`, `photos_photo`.`image` FROM `photos_photo` INNER JOIN `photos_photopost_photos` ON (`photos_photo`.`id` = `photos_photopost_photos`.`photo_id`) WHERE `photos_photopost_photos`.`photopost_id` = 2  LIMIT 21; args=(2,)
[<Photo: Photo object>, <Photo: Photo object>]

@kezabelle
Copy link
Collaborator

The fact they're not cached may, tentatively be related to the solution proposed in #78.

@Anislav
Copy link
Author

Anislav commented Feb 4, 2014

Probably, you are right regarding the Photos cache part of the problem.

Anyway, executing the following

posts = list(Post.objects.select_subclasses().prefetch_related('photopost__photos').all())

results in 3 SQL queries which is an issue from my point of view.

@carljm
Copy link
Collaborator

carljm commented Feb 4, 2014

Hi @Anislav, thanks for the report. I think @kezabelle is likely correct that #78 would fix the reverse-related cache, which seems like part of the fix needed here.

Regarding the three queries, I'm not sure what's going on there, would need to dig into it. Probably won't have time to do that soon, but if you figure out the cause I'd be happy to take a pull request!

@BrendaH
Copy link

BrendaH commented Oct 3, 2018

I also bumped into this problem with prefetch_related as mentioned in #110 (comment)

Django version: 1.11.16
django-model-utils version: 3.1.2
Python version: 3.5
Windows machine

Looks like the prefetch cache is not where one would expect it.
With Post.objects.select_subclasses().prefetch_related('photopost__photos').all() we would expect all Post objects to have prefetched their PhotoPosts and their Photos. So looping over all PhotoPosts (as returned by select_subclasses) we would expect their Photos to be prefetched and photopost_instance.photos.all() not to result in extra queries. However, the actual prefetch result seems to be stored in photopost_instance.photoposts.photos.all() (so with the extra photoposts in there). Using this last way of accessing the photos does not result in extra queries, but it is a bit weird to have to do so.

I suspect that this happens because prefetch_related is handled after the iterator has converted the found instances to the appropriate child classes.

Unfortunately I do not have time (or the know-how) to make a PR to fix this. (I was hoping this package would be plug-and-play to save me time. ;-) ) So for now I'm going with the weird but working fix of using the extra classname when looking up the prefetched objects. I just wanted to make it known that the bug still exists and what might be causing it.

@vdboor
Copy link

vdboor commented Jan 23, 2024

I've bumped into this as well. From what I can see, this is hard to fix.

As workaround, you could override QuerySet._prefetch_related_objects() to implement some manual prefetching. In example, split the subclasses by model type, and perform manual prefetches for each of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants