CompilerFilter fails with UnicodeDecodeError #311

Closed
anttihirvonen opened this Issue Sep 11, 2012 · 4 comments

Comments

Projects
None yet
4 participants
Contributor

anttihirvonen commented Sep 11, 2012

I'm having some problems when trying to compress CSS/JS files with YUICompressor. The files are generated by LESS/CoffeeScript compilers, and the following exception happens with both filetypes. Everything goes smoothly in development where I'm using only CSSAbsoluteFilter, but when trying to compress the files with aforementioned compressor, the following happens (inside CompilerFilter):

> $VENV/lib/python2.7/site-packages/compressor/filters/base.py(120)input()
-> filtered, err = proc.communicate(self.content.encode('utf8'))
(Pdb) type(self.content)
<type 'str'>
(Pdb) s
UnicodeDecodeError: UnicodeD...ge(128)')
> $VENV/lib/python2.7/site-packages/compressor/filters/base.py(120)input()
-> filtered, err = proc.communicate(self.content.encode('utf8'))
(Pdb) 

Resulting error:
ascii' codec can't decode byte 0xc3 in position 8005: ordinal not in range(128)

As you can see, the content that was passed into CompilerFilter is a bytetstring, not an unicode one, and when the bytestring is encoded, Python tries to convert it first to Unicode with ascii codec, and this fails when there are non-ascii characters inside the string.

I guess this issue has something to do with precompilers, since ordinary JS/CSS files don't seem to produce this error (I tried with different characters). Maybe at line https://github.com/jezdez/django_compressor/blob/develop/compressor/base.py#L167 the resulting output has to be converted to unicode before passing it to filters?

This issue might also be related to #110.

I'm running into this same issue when trying to use this SCSS snippet:

https://github.com/simmo/symbolset

Any reason this hasn't been pulled yet?

gldnspud commented Apr 2, 2013

+1 for merging jezdez#312

In the meantime, my team will need to maintain a separate repo with that patch applied.

@jezdez jezdez closed this Apr 3, 2013

gldnspud commented Apr 3, 2013

I just discovered an issue with this patch, at least in my case, where if COMPRESS_ENABLED=False, any files generated by the precompile phase come out encoded in what appears to be UTF-32:

In [1]: with open('jquery.duplicate-remove.0972d31ed96d.js', 'rb') as f:
   ...:     bytes = f.read()

In [2]: bytes[:64]
Out[2]: '/\x00\x00\x00*\x00\x00\x00!\x00\x00\x00\n\x00\x00\x00*\x00\x00\x00 \x00\x00\x00D\x00\x00\x00u\x00\x00\x00p\x00\x00\x00l\x00\x00\x00i\x00\x00\x00c\x00\x00\x00a\x00\x00\x00t\x00\x00\x00e\x00\x00\x00 \x00\x00\x00'

In [3]: bytes.decode('utf32')[:64]
Out[3]: u'/*!\n* Duplicate / Remove: a jQuery Plugin\n* @author: Trevor Morr'

This causes the files to be interpreted incorrectly by both Chrome and Firefox. Here's an example from Chrome's console:

Uncaught SyntaxError: Invalid regular expression: missing /

Without #312, I don't get this behavior, but then there is a UnicodeDecodeError during compression.

I'll try to isolate this into a failing test within django_compressor.

@jezdez #383 is a fix that is working for us so far, but need some guidance to get it to pass in Travis since I can't replicate the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment