Skip to content

S3BotoStorage performance issues #195

Closed
facconi opened this Issue Jan 18, 2013 · 17 comments

9 participants

@facconi
facconi commented Jan 18, 2013

I am using easy-thumbnails 1.1 with THUMBNAIL_DEFAULT_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

I have noticed that the time consumed by the get_thumbnailer is 4 time slower then using a local storage.

In my API I have to return about 20 thumbnails, and I build the with the following code:

try:
    avatar = request.build_absolute_uri(
        get_thumbnailer(friendship.friend.get_profile().avatar)['iPhoneAvatar'].url
    )
except InvalidImageFormatError:
    self.logger.error("Error loading avatar for user %s" % friendship.friend.username)
    avatar = None

Am I doing anything wrong? Is there any way to optimize the performances?

@SmileyChris
Owner
@facconi
facconi commented Jan 18, 2013

Yes, I have done multiple tests using the curl for my API and unix's time command.
The API runs 4 time slower when using the s3boto storage for easy_thumbnails.

@SmileyChris
Owner

There's nothing I can see off the top of my head that you're doing wrong. Perhaps you could try some code profiling between local and remote storage to see where the extra time is getting spent.

@mlewis
mlewis commented Feb 6, 2013

A note on this: I am having a similar problem. Whenever it hits "storage.exists." Checking if something exists on S3 (at least via s3boto) seems to require going through every single file, which is crazy when you have 10s of thousands of thumbnails. Anyone come up with a decent solution? I was thinking of sort of mirroring so that the thumbnail is generated locally and on S3, then when I need to see if the thumbnail exists I just check local and assume that it has been uploaded to S3. Thoughts?

EDIT:

I ended up just overriding the exists method in the storage like so:

class ThumbnailS3BotoStorage(S3BotoStorage):
    def exists(self, filename):
        try:
            self.open(filename, 'rb')
            return True
        except:
            return False

This seems to work like a charm so far.

@fernandogrd

I had similar issues with amazon S3. In my case, I got lot of extra queries.
I found out it was easy thumbnail checking against database since the direct check against file, but it was not one query per image, seems like a lot of repeated queries per image (around 20).

@bashu
bashu commented Mar 21, 2013

Yeah, the same problem! Looks like sorl.thumbnail works 4 times faster with remote storages (reducing number of queries by a smarter caching using memcache or redis).

@SmileyChris
Owner

I'd be happy to put something like ThumbnailS3BotoStorage in easy-thumbnails if it is going to be slow to get that in upstream (but it does really belong there).

Regarding the multi query issue - it'd be good to narrow that down and figure out where the problem lies. I'm definitely for decreasing the number of queries required!

@mlewis
mlewis commented Mar 21, 2013

I agree that it does not belong in the easy-thumbnails codebase. Adding it to the docs might help, though?

@epicserve

@SmileyChris, I'm having a similar multiple query issues. I have an app called media with a model called photo. Below is the Photo model.

In the admin list display view I have it display Photo._admin_thumbnail. On a page with 52 photos that have all had their thumbnails pregenerated using easy_thumbnails.files.generate_thumbnails, when the page first loads Django Debug Toolbar says the page took 1.84 minutes to load and had 422 queries. When I reload the page DDT says the page took 5.96 seconds with 160 queries.

I should also note that I've verified that easy_thumbnails.files.generate_thumbnails is working by looking on S3 to see if the thumbnail files have been created and I also looked in easy_thumbnails_source and easy_thumbnails_thumbnail tables to make sure all the required records had been created.

One possible solution I'm considering is modifying my Photo._admin_thumbnail method so it get's the url from a redis cache. Another solution that I've thought of is to try and modify the admin class for the Photo so it uses ModelAdmin.get_queryset, that has a query that will join the easy_thumbnail tables in order to cut down on the number of queries.

I'm using:

  • easy-thumbnails==1.2
  • Django==1.5.1
class Photo(BaseFileModelMixin, AuthorsModelMixin, TaxonomyModelMixin, PublishModelMixin):
    """Photo model"""

    def get_photo_path(instance, filename):
        alt_filename = instance.slug
        if not alt_filename:
            alt_filename = slugify_filename(filename)[0]
        return get_file_upload_path(instance, filename, MEDIA_PHOTO_BASE_DIR, alt_filename)

    title = models.CharField(max_length=255, blank=True, help_text='If the title is left blank, the photos filename will be used.')
    slug = models.SlugField(max_length=255, blank=True, help_text='The slug is a URL-friendly version of the title and is auto-populated.')
    photo = ThumbnailerImageField(upload_to=get_photo_path, max_length=255)
    caption = models.TextField(blank=True)

    class Meta:
        ordering = ('-published', )

    def save(self, force_insert=False, force_update=False, **kwargs):

        if not self.title:
            self.title = self.photo.name.rsplit('.', 1)[0]
            self.slug = slugify(self.title).replace('_', '-')

        super(Photo, self).save(force_insert, force_update, **kwargs)

    def __unicode__(self):
        return self.caption_truncated

    def get_absolute_url(self):
        return reverse_date_detail_view('photo_detail', self.published, self.slug)

    def related_label(self):
        thumbnail_url = self.photo['admin_thumbnail_small'].url
        return """<img src="%s" />""" % thumbnail_url

    def delete(self, *args, **kwargs):

        for ps in self.photos_set.all():
            # if the photo being deleted is the only photo in the photo set then
            # delete the photo set
            if ps.photos.count() == 1:
                ps.delete()

        super(Photo, self).delete(*args, **kwargs)

    @property
    def caption_truncated(self):
        caption = self.caption.strip() if self.caption else None
        if caption and len(caption) >= 10:
            return self.caption[:80]  # truncate to 80 characters
        else:
            return '%s' % self.photo_filename

    @property
    def photo_filename(self):
        return str(self.photo.file).split('/')[-1]

    def _admin_thumbnail(self):
        try:
            thumbnail_url = self.photo['admin_thumbnail'].url
            return """<img src="%s" />""" % thumbnail_url
        except InvalidImageFormatError:
            return """<img src="http://www.placehold.it/120x80&text=Invalid+Image" />"""
    _admin_thumbnail.short_description = "Thumbnail"
    _admin_thumbnail.allow_tags = True

    def _filename(self):
        return """<a href="%s">%s</a>""" % (self.photo.url, self.photo_filename)
    _filename.short_description = "Filename"
    _filename.allow_tags = True
@epicserve

Also if it helps ... here is a screenshot of my DDT SQL panel.

screen shot 2013-05-16 at 11 44 34 am

@epicserve

I'm not sure this is the greatest and most elegant solution, however this is something I hacked together that seems to work okay. If I go this route, I'll probably modify my signal that creates thumbnail aliases so it puts the information in the correct cache key and then set the timeout for something like two months or whatever. I also need to add a way to update the cache if the image gets updated.

I think the best solution would be to make it an option in easy-thumbnails so you could use Django's cache instead of the database.

def get_easy_thumb_alias_url(model_instance, field, alias_key, cache_timeout=300):
    """
    Returns the url for a thumbnail instance and returns false if there was an issue.

    :param model_instance: The instance of a Django model
    :param field: The name of the ThumbnailerImageField
    :param alias_key: The name of the alias_key

    Usage:

        get_easy_thumb_alias_url(object, 'photo', 'admin_thumbnail')

    """
    app_model = "{0}.{1}".format(model_instance._meta.app_label, model_instance._meta.object_name).lower()
    cache_key = 'easy_thumb_alias_cache_%s.%s_%s' % (app_model, field, model_instance.id)
    thumbnail_cache = cache.get(cache_key, {})
    if thumbnail_cache and 'admin_thumbnail' in thumbnail_cache:
        return '%s%s' % (settings.MEDIA_URL, thumbnail_cache[alias_key])

    try:
        thumb = getattr(model_instance, field)[alias_key]
        thumbnail_cache.update({alias_key: thumb.name})
        cache.set(cache_key, thumbnail_cache, cache_timeout)
        return '%s%s' % (settings.MEDIA_URL, thumbnail_cache[alias_key])
    except InvalidImageFormatError:
        return False


class Photo(BaseFileModelMixin, AuthorsModelMixin, TaxonomyModelMixin, PublishModelMixin):
    """Photo model"""

    title = models.CharField(max_length=255, blank=True, help_text='If the title is left blank, the photos filename will be used.')
    slug = models.SlugField(max_length=255, blank=True, help_text='The slug is a URL-friendly version of the title and is auto-populated.')
    photo = ThumbnailerImageField(upload_to=get_photo_path, max_length=255)
    caption = models.TextField(blank=True)

    ...

    def _admin_thumbnail(self):
        thumb_url = get_easy_thumb_alias_url(self, 'photo', 'admin_thumbnail')

        if thumb_url:
            return """<img src="%s" />""" % thumb_url
        else:
            return """<img src="http://www.placehold.it/120x80&text=Invalid+Image" />"""
    _admin_thumbnail.short_description = "Thumbnail"
    _admin_thumbnail.allow_tags = True
@epicserve

I've done some more troubleshooting and these are my troubleshooting steps ...

Step 1. Delete all media from my S3 testing bucket

Step 2. Drop my db and then run syncdb

Step 3. Run an importer that imports some images. The importer also runs easy_thumbnails.files.generate_all_aliases on all images imported.

Step 4. Double check the db to make sure the thumbnails were created

my_app=> select count(*) from media_photo;
count
-------
  95

my_app=> select count(*) from easy_thumbnails_source;
count
-------
  95
(1 row)

Step 5. View my S3 testing bucket to make sure the thumbnails were created.

Step 6. View /admin/media/photo/ in the admin, which takes about 10 minutes to load the page. Also, the python process running the Django runserver grows from 46 MB to 1.22 GB after the page loaded. Django Debug Toolbar shows 855 queries to easy_thumbnail tables, which means that's 9 queries per image. The following are the queries for one image.

SELECT "easy_thumbnails_source"."id", "easy_thumbnails_source"."storage_hash", "easy_thumbnails_source"."name", "easy_thumbnails_source"."modified" FROM "easy_thumbnails_source" WHERE ("easy_thumbnails_source"."name" = 'img/photo/2013/05/17/21a66e0822-0dc6f6d57b79438f9e2023ad4eae126d-a62a20f24f4b463691d51c4ccc871171-3.jpg' AND "easy_thumbnails_source"."storage_hash" = '63c77af23a93fcbac67418e6938048ca' )
SELECT "easy_thumbnails_source"."id", "easy_thumbnails_source"."storage_hash", "easy_thumbnails_source"."name", "easy_thumbnails_source"."modified" FROM "easy_thumbnails_source" WHERE ("easy_thumbnails_source"."name" = 'img/photo/2013/05/17/21a66e0822-0dc6f6d57b79438f9e2023ad4eae126d-a62a20f24f4b463691d51c4ccc871171-3.jpg' AND "easy_thumbnails_source"."storage_hash" = '63c77af23a93fcbac67418e6938048ca' )
SELECT "easy_thumbnails_thumbnail"."id", "easy_thumbnails_thumbnail"."storage_hash", "easy_thumbnails_thumbnail"."name", "easy_thumbnails_thumbnail"."modified", "easy_thumbnails_thumbnail"."source_id" FROM "easy_thumbnails_thumbnail" WHERE ("easy_thumbnails_thumbnail"."source_id" = 29 AND "easy_thumbnails_thumbnail"."name" = 'img/photo/2013/05/17/21a66e0822-0dc6f6d57b79438f9e2023ad4eae126d-a62a20f24f4b463691d51c4ccc871171-3.jpg.120x80_q80_crop-scale.jpg' AND "easy_thumbnails_thumbnail"."storage_hash" = '63c77af23a93fcbac67418e6938048ca' )
SELECT "easy_thumbnails_source"."id", "easy_thumbnails_source"."storage_hash", "easy_thumbnails_source"."name", "easy_thumbnails_source"."modified" FROM "easy_thumbnails_source" WHERE ("easy_thumbnails_source"."name" = 'img/photo/2013/05/17/21a66e0822-0dc6f6d57b79438f9e2023ad4eae126d-a62a20f24f4b463691d51c4ccc871171-3.jpg' AND "easy_thumbnails_source"."storage_hash" = '63c77af23a93fcbac67418e6938048ca' )
SELECT "easy_thumbnails_source"."id", "easy_thumbnails_source"."storage_hash", "easy_thumbnails_source"."name", "easy_thumbnails_source"."modified" FROM "easy_thumbnails_source" WHERE ("easy_thumbnails_source"."name" = 'img/photo/2013/05/17/21a66e0822-0dc6f6d57b79438f9e2023ad4eae126d-a62a20f24f4b463691d51c4ccc871171-3.jpg' AND "easy_thumbnails_source"."storage_hash" = '63c77af23a93fcbac67418e6938048ca' )
SELECT "easy_thumbnails_thumbnail"."id", "easy_thumbnails_thumbnail"."storage_hash", "easy_thumbnails_thumbnail"."name", "easy_thumbnails_thumbnail"."modified", "easy_thumbnails_thumbnail"."source_id" FROM "easy_thumbnails_thumbnail" WHERE ("easy_thumbnails_thumbnail"."source_id" = 29 AND "easy_thumbnails_thumbnail"."name" = 'img/photo/2013/05/17/21a66e0822-0dc6f6d57b79438f9e2023ad4eae126d-a62a20f24f4b463691d51c4ccc871171-3.jpg.120x80_q80_crop-scale.png' AND "easy_thumbnails_thumbnail"."storage_hash" = '63c77af23a93fcbac67418e6938048ca' )
SELECT "easy_thumbnails_thumbnail"."id", "easy_thumbnails_thumbnail"."storage_hash", "easy_thumbnails_thumbnail"."name", "easy_thumbnails_thumbnail"."modified", "easy_thumbnails_thumbnail"."source_id" FROM "easy_thumbnails_thumbnail" WHERE ("easy_thumbnails_thumbnail"."source_id" = 29 AND "easy_thumbnails_thumbnail"."name" = 'img/photo/2013/05/17/21a66e0822-0dc6f6d57b79438f9e2023ad4eae126d-a62a20f24f4b463691d51c4ccc871171-3.jpg.120x80_q80_crop-scale.png' AND "easy_thumbnails_thumbnail"."storage_hash" = '63c77af23a93fcbac67418e6938048ca' )
SELECT "easy_thumbnails_thumbnail"."id", "easy_thumbnails_thumbnail"."storage_hash", "easy_thumbnails_thumbnail"."name", "easy_thumbnails_thumbnail"."modified", "easy_thumbnails_thumbnail"."source_id" FROM "easy_thumbnails_thumbnail" WHERE ("easy_thumbnails_thumbnail"."source_id" = 29 AND "easy_thumbnails_thumbnail"."name" = 'img/photo/2013/05/17/21a66e0822-0dc6f6d57b79438f9e2023ad4eae126d-a62a20f24f4b463691d51c4ccc871171-3.jpg.120x80_q80_crop-scale.png' AND "easy_thumbnails_thumbnail"."storage_hash" = '63c77af23a93fcbac67418e6938048ca' )
UPDATE "easy_thumbnails_thumbnail" SET "modified" = '2013-05-17 20:42:34.534960+00:00' WHERE "easy_thumbnails_thumbnail"."id" = 247

Step 7. Reloaded the view /admin/media/photo/, which still takes awhile but only about 6 seconds. The python process grows from 46 MB to 115 MB. The Django Debug Toolbar shows 285 queries to easy_thumbnail tables, which means that's 3 queries per image. The following are the queries for one image.

SELECT "easy_thumbnails_source"."id", "easy_thumbnails_source"."storage_hash", "easy_thumbnails_source"."name", "easy_thumbnails_source"."modified" FROM "easy_thumbnails_source" WHERE ("easy_thumbnails_source"."name" = 'img/photo/2013/05/17/21a66e0822-0dc6f6d57b79438f9e2023ad4eae126d-a62a20f24f4b463691d51c4ccc871171-3.jpg' AND "easy_thumbnails_source"."storage_hash" = '63c77af23a93fcbac67418e6938048ca' )
SELECT "easy_thumbnails_source"."id", "easy_thumbnails_source"."storage_hash", "easy_thumbnails_source"."name", "easy_thumbnails_source"."modified" FROM "easy_thumbnails_source" WHERE ("easy_thumbnails_source"."name" = 'img/photo/2013/05/17/21a66e0822-0dc6f6d57b79438f9e2023ad4eae126d-a62a20f24f4b463691d51c4ccc871171-3.jpg' AND "easy_thumbnails_source"."storage_hash" = '63c77af23a93fcbac67418e6938048ca' )
SELECT "easy_thumbnails_thumbnail"."id", "easy_thumbnails_thumbnail"."storage_hash", "easy_thumbnails_thumbnail"."name", "easy_thumbnails_thumbnail"."modified", "easy_thumbnails_thumbnail"."source_id" FROM "easy_thumbnails_thumbnail" WHERE ("easy_thumbnails_thumbnail"."source_id" = 29 AND "easy_thumbnails_thumbnail"."name" = 'img/photo/2013/05/17/21a66e0822-0dc6f6d57b79438f9e2023ad4eae126d-a62a20f24f4b463691d51c4ccc871171-3.jpg.120x80_q80_crop-scale.jpg' AND "easy_thumbnails_thumbnail"."storage_hash" = '63c77af23a93fcbac67418e6938048ca' )
@camilonova

I'm having the same issue trying @mlewis suggestion for the exist method increases the amount of petitions a lot.

Also i had tried @epicserve patch but i dont see much difference, there is any explicit way to see the improvement?

Looking at this, seems very reasonable to use the cache and not the database @SmileyChris any reason for not doing it?

@rofrankel

+1, and FWIW I get this with django-cumulus rather than S3Boto, so it's not specific to the S3 library.

@tsurantino

Any chance this can be re-reviewed? I really like working with easy_thumbnails but hesitate based on my remote performance and am considering switching to sorl with redis instead.

@SmileyChris
Owner

@tsurantino Some changes have already been made to master based on epicserve's fixes. Can you see if that's working faster for you?

@SmileyChris
Owner

I'm going to close this as some optimisation changes from epicserve have been now merged to the latest release. The underlying issue of using a faster cache for remote requests is acknowledged however, and if someone wants to have a crack at adding that, be my guest :)

@Gwildor Gwildor referenced this issue in antonagestam/collectfast Apr 1, 2014
Closed

Set preload_metadata on the fly #30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.