Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError 'utf8' codec GZIP, S3, Cloudfront #404

Open
nkeilar opened this issue May 21, 2013 · 5 comments
Open

UnicodeDecodeError 'utf8' codec GZIP, S3, Cloudfront #404

nkeilar opened this issue May 21, 2013 · 5 comments

Comments

@nkeilar
Copy link

nkeilar commented May 21, 2013

I'm trying to get GZIP working with django-storage, using s3 with cloudfront. Everything is working but GZIP. I did have it working with the compressor css file, but the js file never seems to compress.

Similar/related to:

Comments of interest

pip freeze

-e git+https://github.com/madteckhead/django_compressor.git@8df09263cfad1f5705b164ff637449aff2d48d1a#egg=django_compressor-dev
django-storages==1.1.8

which is django compressor 1.3 develop
also tried django-storages==1.1.5 first

CommandError: An error occured during rendering /home/user/workspace/noshlyfs/templates/theme_bootstrap/less_base.html: UnicodeDecodeError while processing '/home/user/workspace/noshlyfs/static_root/cache/js/a1576bd8c653.js' with charset utf-8: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte

Storages

MediaS3BotoStorage = lambda: S3MediaStorage(location='media')
StaticS3BotoStorage = lambda: CachedS3StaticStorage(location='static')

def domain(url):
    return urlparse.urlparse(url).hostname

class S3MediaStorage(S3BotoStorage):
    """
    Subclasses :class:`storages.backends.s3boto.S3BotoStorage` and
    sets base location for files to ``/media``.
    """
    def __init__(self, *args, **kwargs):
        kwargs["location"] = "media"
        kwargs['bucket'] = settings.AWS_STORAGE_BUCKET_NAME
        kwargs['custom_domain'] = domain(settings.MEDIA_URL)
        super(S3MediaStorage, self).__init__(*args, **kwargs)

        def isfile(self, name):
            return self.exists(name)

    def isdir(self, name):
        # That's some inefficient implementation...
        # If there are some files having 'name' as their prefix, then
        # the name is considered to be a directory
        if not name: # Empty name is a directory
            return True

        if self.isfile(name):
            return False

        name = self._normalize_name(self._clean_name(name))
        dirlist = self.bucket.list(self._encode_name(name))

        # Check whether the iterator is empty
        for item in dirlist:
            return True
        return False

    def move(self, old_file_name, new_file_name, allow_overwrite=False):

        if self.exists(new_file_name):
            if allow_overwrite:
                self.delete(new_file_name)
            else:
                raise "The destination file '%s' exists and allow_overwrite is False" % new_file_name

        old_key_name = self._encode_name(self._normalize_name(self._clean_name(old_file_name)))
        new_key_name = self._encode_name(self._normalize_name(self._clean_name(new_file_name)))

        k = self.bucket.copy_key(new_key_name, self.bucket.name, old_key_name)

        if not k:
            raise "Couldn't copy '%s' to '%s'" % (old_file_name, new_file_name)

        self.delete(old_file_name)

    def makedirs(self, name):
        pass

    def rmtree(self, name):
        name = self._normalize_name(self._clean_name(name))
        dirlist = self.bucket.list(self._encode_name(name))
        for item in dirlist:
            item.delete()

class S3StaticStorage(S3BotoStorage):
    """
    Subclasses :class:`storages.backends.s3boto.S3BotoStorage` and
    sets base location for files to ``/static``.
    """
    def __init__(self, *args, **kwargs):
        kwargs["location"] = "static"
        super(S3StaticStorage, self).__init__(*args, **kwargs)

class CachedS3BotoStorage(S3BotoStorage):
    """
    S3 storage backend that saves the files locally, too.
    See http://django_compressor.readthedocs.org/en/latest/remote-storages/
    """
    def __init__(self, *args, **kwargs):
        super(CachedS3BotoStorage, self).__init__(*args, **kwargs)
        self.local_storage = get_storage_class(
            "compressor.storage.CompressorFileStorage")()

    def save(self, name, content):
        name = super(CachedS3BotoStorage, self).save(name, content)
        #self.local_storage._save(name, content) << this line
        return name

class CachedS3StaticStorage(CachedS3BotoStorage):
    """
    Mix of the :class:`S3MediaStorage` and :class:`CachedS3BotoStorage`,
    saves files in ``/static`` subdirectory
    """
    def __init__(self, *args, **kwargs):
        kwargs["location"] = "static"
        kwargs['bucket'] = settings.AWS_STORAGE_BUCKET_NAME
        kwargs['custom_domain'] = domain(settings.STATIC_URL)
        super(CachedS3StaticStorage, self).__init__(*args, **kwargs)

settings.py

LOCAL_DEV = True
DEBUG = False
TEMPLATE_DEBUG = DEBUG
SERVE_MEDIA = DEBUG
THUMBNAIL_DEBUG = DEBUG
USE_LOCAL_MEDIA = False

SOUTH_TESTS_MIGRATE = False # To disable migrations and use syncdb instead
SKIP_SOUTH_TESTS = True # To disable South's own unit tests

COMPRESS_ENABLED = True

...

MEDIA_ROOT = os.path.join(PROJECT_ROOT, "media_root")
STATIC_ROOT = os.path.join(PROJECT_ROOT, "static_root")

if USE_LOCAL_MEDIA:
    MEDIA_URL = "/media/"
    STATIC_URL = "/static/"

    COMPRESS_STORAGE = 'compressor.storage.GzipCompressorFileStorage'
    ADMIN_MEDIA_PREFIX = STATIC_URL + "grappelli/"

else:
    AWS_IS_GZIPPED = True
    DEFAULT_FILE_STORAGE = 'apps.huntedhive_contrib.utils.s3utils.MediaS3BotoStorage'
    STATICFILES_STORAGE = 'apps.huntedhive_contrib.utils.s3utils.StaticS3BotoStorage'
    AWS_ACCESS_KEY_ID = "XXXX"
    AWS_SECRET_ACCESS_KEY = "XXXXXXXX"
    AWS_STORAGE_BUCKET_NAME = "XXXX_development"
    AWS_BUCKET_NAME = "XXXX_development"
    AWS_S3_CUSTOM_DOMAIN = "XXXX.cloudfront.net" 
    #AWS_S3_SECURE_URLS = True #must set to false if using an alias on cloudfront
    STATIC_S3_PATH = "static"
    MEDIA_S3_PATH = "media"

    CLOUDFRONT_DOMAIN = AWS_S3_CUSTOM_DOMAIN
    S3_URL = 'http://%s.s3.amazonaws.com/' % AWS_STORAGE_BUCKET_NAME
    STATIC_URL = '//%s/%s/' % (CLOUDFRONT_DOMAIN, STATIC_S3_PATH)
    MEDIA_URL = '//%s/%s/' % (CLOUDFRONT_DOMAIN, MEDIA_S3_PATH)

    COMPRESS_ROOT = STATIC_ROOT
    COMPRESS_URL = '//%s/%s/' % (CLOUDFRONT_DOMAIN, STATIC_S3_PATH)
    COMPRESS_STORAGE = STATICFILES_STORAGE

    THUMBNAIL_BACKEND = 'apps.huntedhive_contrib.utils.thumbnail_backend.MyThumbnailBackend'
    THUMBNAIL_STORAGE = DEFAULT_FILE_STORAGE
    ADMIN_MEDIA_PREFIX = STATIC_URL + "grappelli/"

running
python manage.py compress --force

When I have 'this line' commented or uncommented I get:

CommandError: An error occured during rendering /home/madteckhead/workspace/noshlyfs/templates/theme_bootstrap/less_base.html: UnicodeDecodeError while processing '/home/user/workspace/noshlyfs/static_root/cache/js/a1576bd8c653.js' with charset utf-8: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte
@nkeilar
Copy link
Author

nkeilar commented May 21, 2013

Should gzipped files be uploaded with a .gz extension. I just ask because for me they aren't with the setup above.

Seems when with this type of backend

class CachedS3BotoStorage(S3BotoStorage):
    """
    S3 storage backend that saves the files locally, too.
    See http://django_compressor.readthedocs.org/en/latest/remote-storages/
    """
    def __init__(self, *args, **kwargs):
        super(CachedS3BotoStorage, self).__init__(*args, **kwargs)
        self.local_storage = get_storage_class(
            "compressor.storage.CompressorFileStorage")()

    def save(self, name, content):
        name = super(CachedS3BotoStorage, self).save(name, content)
        self.local_storage._save(name, content)
        return name

the compress files are saved in without the gz extension. For some reason this then causes

CommandError: An error occured during rendering /home/user/workspace/noshlyfs/templates/theme_bootstrap/less_base.html: UnicodeDecodeError while processing '/home/user/workspace/noshlyfs/static_root/cache/js/a1576bd8c653.js' with charset utf-8: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte

@nkeilar
Copy link
Author

nkeilar commented May 22, 2013

Seems this is only happening with the python manage.py compress --force command. If the gzip files are generated dynamically and sent to s3 in request using the approach from above plus modified to use #100 (comment).

Not sure what is up with the command, but while I've got things working I believe it is still a bug so I'll leave this issue open.

@vinaytota
Copy link

I spent some time getting this to work, what it took was this modification to the CachedS3BotoStorage class. Note the difference in the save method.

class CachedS3BotoStorage(S3BotoStorage):
    """
    S3 storage backend that saves the files locally, too.
    """
    def __init__(self, *args, **kwargs):
        super(CachedS3BotoStorage, self).__init__(*args, **kwargs)
        self.local_storage = get_storage_class(
            "compressor.storage.CompressorFileStorage")()

    def save(self, name, content):
        non_gzipped_file_content = content.file
        name = super(CachedS3BotoStorage, self).save(name, content)
        content.file = non_gzipped_file_content
        self.local_storage._save(name, content)
        return name

The problem is that the _save method of S3BotoStorage modifies content.file to be gzipped as a side effect of its invocation when AWS_IS_GZIPPED = True. When the file is then saved locally after being saved to S3, the content is now gzipped, and when compressor tries to read it from local disk it doesn't know what to do with it. In general these sort of side effect changes are bad, having _save in S3BotoStorage not modify 'content' would be ideal, but at least there's a work around. I will submit a pull request to update the docs to use this version of CachedS3BotoStorage.

alanjds added a commit to alanjds/django-cached-s3-storage that referenced this issue Mar 4, 2014
This fixes the problem when compressing after pushing to S3

Implemented based on this comment on the issue:
Details: django-compressor/django-compressor#404 (comment)
@alanjds
Copy link

alanjds commented Mar 5, 2014

@vinaytota I spent some time trying your fix, and no luck here. Ended doing a hack to uncompress gzip-encoded files if needed, before consumption. See here: ulyssesv/django-cached-s3-storage#2

Kinda spartan, but finally worked, and very clear. No more "UnicodeDecodeError 'utf8' codec" errors on my side. Is there something nasty that I had not noticed on this approach?

ntucker added a commit to ntucker/django-cached-s3-storage that referenced this issue Jul 28, 2014
This fixes the problem when compressing after pushing to S3

Implemented based on this comment on the issue:
Details: django-compressor/django-compressor#404 (comment)
@mrcoles
Copy link

mrcoles commented Apr 23, 2015

@vinaytota why not just do the local_storage call first? Are there other side-effects I might be missing?

    def save(self, name, content):
        self.local_storage._save(name, content)
        return super(CachedS3BotoStorage, self).save(name, content)

EDIT: Looks like my proposed simplification leads to an empty file getting stored on S3, so stick with @vinaytota’s approach:

    def save(self, name, content):
        non_gzipped_file_content = content.file
        name = super(CachedS3BotoStorage, self).save(name, content)
        content.file = non_gzipped_file_content
        self.local_storage._save(name, content)
        return name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants