Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question:UnicodeDecodeError at /admin/blog/author/import/ #72

Closed
hebijiandai opened this issue Mar 6, 2014 · 31 comments
Closed

question:UnicodeDecodeError at /admin/blog/author/import/ #72

hebijiandai opened this issue Mar 6, 2014 · 31 comments

Comments

@hebijiandai
Copy link

Sir,thanks a lot to develop this great repository.
I have little problem to solve , as below
the code:

#in models.py
class Author(models.Model):
    author = models.CharField('作者', max_length=50)
    title = models.CharField('标题', max_length=150)
    qualification = models.ForeignKey(Qualification)
    mark = models.ManyToManyField(Mark)
    blog = models.TextField('博客内容')
    time = models.DateField('写作日期')

    def __unicode__(self):
        return unicode(self.author)

    class Meta:
        ordering = ['time']
#in admin.py
class AuthorAdmin(ImportExportModelAdmin, admin.ModelAdmin):
    search_fields = ('author', 'title', 'mark', 'blog')
    list_display = ('author', 'title', 'time')

When I export the data,
1
but the csv file appear to be the retortion,
2
when I modified the second row author data and import,

3
it cause the error,
4

how I it be smoothly modified and import?thanks.
Allenlee

@bmihelac
Copy link
Member

bmihelac commented Mar 6, 2014

What tablib version do you use? 0.9.11 is very old and many fixes has been added to master branch after it was released.

@hebijiandai
Copy link
Author

Sir,I download it from pip and the latest version is 0.9.11,I also download it from github,the zip files, also is the 0.9.11,so the problem still exist,is there any other tablib version?thanks for your concern.

@bmihelac
Copy link
Member

bmihelac commented Mar 6, 2014

You can install development version with

pip install -e git+https://github.com/kennethreitz/tablib.git#egg=tablib

@bmihelac
Copy link
Member

bmihelac commented Mar 6, 2014

Development version of tablib shows itself as 0.9.11.
Please paste whole traceback.
If you import exported file (without changing anything), does it work?
Note, that if it is tablib issue, I am cannot help.

@hebijiandai
Copy link
Author

Sir the emported file is also not correct.I paste the import process's traceback:

Environment:


Request Method: POST
Request URL: http://127.0.0.1:8000/admin/blog/mark/import/

Django Version: 1.6.2
Python Version: 2.7.6
Installed Applications:
('blog',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'suit',
 'django.contrib.admin',
 'south',
 'django_select2',
 'suit_redactor',
 'suit_ckeditor',
 'import_export',
 'reversion')
Installed Middleware:
('django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware')


Traceback:
File "C:\Python27\lib\site-packages\django\core\handlers\base.py" in get_response
  114.                     response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "C:\Python27\lib\site-packages\django\utils\decorators.py" in _wrapped_view
  99.                     response = view_func(request, *args, **kwargs)
File "C:\Python27\lib\site-packages\django\views\decorators\cache.py" in _wrapped_view_func
  52.         response = view_func(request, *args, **kwargs)
File "C:\Python27\lib\site-packages\django\contrib\admin\sites.py" in inner
  198.             return view(request, *args, **kwargs)
File "C:\Python27\lib\site-packages\django_import_export-0.2.2.dev0-py2.7.egg\import_export\admin.py" in import_action
  181.                 dataset = input_format.create_dataset(data)
File "C:\Python27\lib\site-packages\django_import_export-0.2.2.dev0-py2.7.egg\import_export\formats\base_formats.py" in create_dataset
  79.         self.get_format().import_set(data, in_stream)
File "build\bdist.win32\egg\tablib\formats\_csv.py" in import_set
  41.     for i, row in enumerate(rows):
File "build\bdist.win32\egg\tablib\packages\unicodecsv\__init__.py" in next
  55.         return [unicode(s, self.encoding) for s in row]

Exception Type: UnicodeDecodeError at /admin/blog/mark/import/
Exception Value: 'utf8' codec can't decode byte 0xb5 in position 0: invalid start byte

@bmihelac
Copy link
Member

bmihelac commented Mar 7, 2014

Exception shows path starting with "C:\Python27\lib\site-packages" while installed tablib shows path "/Users/allenlee/PycharmProjects/".

Can you try to export and import using tablib (without using django-import-export) where is error comeing from.
http://docs.python-tablib.org/en/latest/

@hebijiandai
Copy link
Author

Hello Sir,I sometimes use OS-Windows to modify the code.So the path will not be consistent.

I test the tablib

#coding:utf-8
import tablib

data = tablib.Dataset(headers=['name', 'address', 'sex'])

data.append(['Lucy', 'ACT Australian Capital Territory', 'female'])
data.append(['Allen', ' NSW New South Wales', 'male'])
data.append(['何必见戴', ' SA  South Australia', 'male'])

print data.csv

and the output is ok

name,address,sex
Lucy,ACT Australian Capital Territory,female
Allen, NSW New South Wales,male
何必见戴, SA  South Australia,male

I think it maybe the unicode error?When I only use english and digital it will be ok to import and export.But when I use chinese words it will cause the error.

@bmihelac
Copy link
Member

bmihelac commented Mar 7, 2014

Can you append data directly from model (this data that was corrupted) and
save file to disk.

regards

@hebijiandai
Copy link
Author

Hello Sir,I try it in the django console,it is ok to export,and the Chinese words can be correctly appear.

import tablib
from blog.models import Author

data = tablib.Dataset()
data.headers = ['id', 'author', 'title', 'qualification', 'mark', 'blog', 'time']
Author_list = Author.objects.all()
for Author_unit in Author_list:
    data.append([Author_unit.id, Author_unit.author, Author_unit.title,
                 Author_unit.qualification, Author_unit.mark, Author_unit.blog, Author_unit.time])

print data.csv

output is

id,author,title,qualification,mark,blog,time
2,ffff,dddd,Angola,<django.db.models.fields.related.ManyRelatedManager object at 0x03F4BBB0>,<p>dfadsfffdff</p>,2014-03-03
1,Footman,ffff,Nicaragua,<django.db.models.fields.related.ManyRelatedManager object at 0x03F4BF30>,<p>dsfdfffdf</p>,2014-03-07
3,何必见戴,何必见戴blog,Afghanistan,<django.db.models.fields.related.ManyRelatedManager object at 0x03F1DCF0>,<p>fffffffffffffff</p>,2014-03-07

but when I use the import-export,the table is:
1

@bmihelac
Copy link
Member

bmihelac commented Mar 7, 2014

can you write this file to disk and diff with exported file from admin

regards

@hebijiandai
Copy link
Author

Sir I add the code:

newFile=open('D:\Author.csv','w')
newFile.write(data.csv)
newFile.close()

And the result:
1
the same to the csv file I export from django-import-export

@bmihelac
Copy link
Member

bmihelac commented Mar 7, 2014

I just made a test by adding "何必见戴" string to Book instance in a example app, that is bundled with django-import-export. Both import and export worked without any errors. Please, try reproducing issue with import-export example app.

@hebijiandai
Copy link
Author

OK,thanks a lot for your concern.

@hebijiandai
Copy link
Author

Sorry...I try the tests app from your repository.
qq20140307-1 2x
qq20140307-4 2x
is it my os problem?Now I use mac to test
And when import~

UnicodeDecodeError at /admin/core/book/import/
'utf8' codec can't decode byte 0xc4 in position 0: invalid continuation byte
Request Method: POST
Request URL:    http://127.0.0.1:8000/admin/core/book/import/
Django Version: 1.6.2
Exception Type: UnicodeDecodeError
Exception Value:    
'utf8' codec can't decode byte 0xc4 in position 0: invalid continuation byte
Exception Location: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tablib/packages/unicodecsv/__init__.py in next, line 55
Python Executable:  /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
Python Version: 2.7.6
Python Path:    
['..',
 '/Users/allenlee/Downloads/django-import-export-master/tests',
 '/Library/Frameworks/Python.framework/Versions/2.7/lib/python27.zip',
 '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7',
 '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-darwin',
 '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac',
 '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/plat-mac/lib-scriptpackages',
 '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk',
 '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-old',
 '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload',
 '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages',
 '/Library/Python/2.7/site-packages']
Server time:    Fri, 7 Mar 2014 08:43:01 -0600
Environment:


Request Method: POST
Request URL: http://127.0.0.1:8000/admin/core/book/import/

Django Version: 1.6.2
Python Version: 2.7.6
Installed Applications:
[u'django.contrib.admin',
 u'django.contrib.auth',
 u'django.contrib.contenttypes',
 u'django.contrib.sessions',
 u'django.contrib.sites',
 u'import_export',
 u'core']
Installed Middleware:
('django.middleware.common.CommonMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware')


Traceback:
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
  114.                     response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/django/utils/decorators.py" in _wrapped_view
  99.                     response = view_func(request, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/django/views/decorators/cache.py" in _wrapped_view_func
  52.         response = view_func(request, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/django/contrib/admin/sites.py" in inner
  198.             return view(request, *args, **kwargs)
File "../import_export/admin.py" in import_action
  181.                 dataset = input_format.create_dataset(data)
File "../import_export/formats/base_formats.py" in create_dataset
  79.         self.get_format().import_set(data, in_stream)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tablib/formats/_csv.py" in import_set
  41.     for i, row in enumerate(rows):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tablib/packages/unicodecsv/__init__.py" in next
  55.         return [unicode(s, self.encoding) for s in row]

Exception Type: UnicodeDecodeError at /admin/core/book/import/
Exception Value: 'utf8' codec can't decode byte 0xc4 in position 0: invalid continuation byte

@hebijiandai
Copy link
Author

Sir,I think over this question in django

UnicodeDecodeError at /admin/core/book/import/
'utf8' codec can't decode byte 0xc4 in position 0: invalid continuation byte

then I use vim set fileencoding,I found the csv standard of file exported from the django-import-export is UTF-8,I search it on internet,I also found that if i use utf-8 standard file to import,it's OK.

Maybe I use the OS in Chinese language so it cause the problem.Would you please modify the project:If the encode of import-file or export-file is not utf-8,first convert it to utf-8,then process other code?

Thanks for your concern.

@smcoll
Copy link

smcoll commented Mar 31, 2014

i'm running into the same problem with a csv. It appears that for Python 2.7, "from_encoding" is not relevant to an import, as CSV.is_binary() returns as True, during the check in ImportMixin.import_action().

@bmihelac
Copy link
Member

bmihelac commented Apr 1, 2014

OK, we need failing test for this.

@smcoll
Copy link

smcoll commented Apr 11, 2014

This is not a test, but a step towards one: the file i was working with at the time choked on these characters: 0x96, 0xae, 0x99, 0xa0. 0x96 is an en-dash in windows (U+2013). i discovered that i was able to read the file using encoding "Windows-1252". Or by using 'ignore' and 'replace' when decoding as utf-8, which have their own drawbacks.

>>> with open('my.csv', 'rb') as f:
...     csv = f.read()
>>> csv.decode("utf-8")
...
Traceback (most recent call last):
  [...]
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 33423: invalid start byte
>>> csv.decode('ascii')
Traceback (most recent call last):
  [...]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x96 in position 33423: ordinal not in range(128)
>>> csv.decode('Windows-1252')
Out[7]: u'"Title","First Name",[...]'
>>> csv.decode('utf-8', 'ignore')
Out[8]: u'"Title","First Name",[...]'
>>> csv.decode('utf-8', 'replace')
Out[9]: u'"Title","First Name",[...]'```

@chris-maestro
Copy link

reader = csv.DictReader(io.StringIO(unicode(my_file.read()), newline=None))

The issue is reading it in universal newline mode. This line represents a way that is compatible with unicode.

@bmihelac
Copy link
Member

bmihelac commented Jul 9, 2014

pull request with and test would be welcomed :)

@bmihelac
Copy link
Member

bmihelac commented Jul 9, 2014

or at least failing test

@traverseda
Copy link

Not sure if this is relevant. http://pastebin.com/ivXprEhp

Name: tablib
Version: 0.10.0
Location: /home/alex/GIS/src/tablib
Requires: 
---
Name: django-import-export
Version: 0.2.6
Location: /home/alex/GIS/lib/python2.7/site-packages
Requires: tablib, diff-match-patch

Python 2.7.8

Seems to me that this problem should just show up whenever you try to import data? Export works fine for me.

@bmihelac
Copy link
Member

I would really like to fix this but need to be able to reproduce this issue.

Creating file with bundled example application cd tests && ./manage.py runserver and creating failing import file from it should be enough. Adding failing test case will be great.

@traverseda
Copy link

Can not reproduce in example app. It is a mystery.

@traverseda
Copy link

I can confirm that it had something to do with the unicode method on my model. Which is really very strange, since it seems to show up alright in the admin interface proper.

Wish I was working on something open source right now, so I could just show you the code.

My unicode method looked like

return("{},{}".format(self.company_name,self.pk))

and now it looks like

return self.company_name

Everything seems to be working. I'm going to guess that this probably wasn't the same error everyone else was having. But hopefully this will help other people.

Be forewarned, sometimes your models unicode method with break django-import-export it hard to diagnose ways.

@bmihelac
Copy link
Member

@traverseda, should't it be:

return(u"{},{}".format(self.company_name,self.pk))

Note u prefix.

@traverseda
Copy link

That explains it. I am not a clever man. Thanks. Why it wasn't throwing an error earlier (in the admin interface, or my other export script) I don't know.

@bmihelac
Copy link
Member

Regarding errors and debugging, django-import-export will display traceback when some error occur allowing to insert import pdb; pdb.set_trace(); and inspect errors.

@pawelnowak
Copy link

Hi guys,

I've stumbled upon the exact same issue in one of my company's projects. We're using python3 and I'm 100% sure there's no u' missing on any model's __ str__ method. Yet, the UnicodeDecodeError exception still occurs when trying to import a csv file containing utf-8 encoded characters in the django admin.

It turns out that adding the encoding param with value of utf-8 to the TempFolderStorage.open's method seems to solve the problem, at least in our environment.

Here's a quick'n'dirty fix in case anyone googles this issue trying to find a solution, like I did a few hours ago.

First we subclass the TempFolderStorage class, adding said param:

import tempfile
from import_export.tmp_storages import TempFolderStorage

class Utf8TempFolderStorage(TempFolderStorage):

    def open(self, mode='r'):
        if self.name:
            return open(self.get_full_path(), mode, encoding='utf-8')
        else:
            tmp_file = tempfile.NamedTemporaryFile(delete=False)
            self.name = tmp_file.name
            return tmp_file

Then we point the new class as IMPORT_EXPORT_TMP_STORAGE_CLASS in settings.py:

 IMPORT_EXPORT_TMP_STORAGE_CLASS = 'path.to.storages.Utf8TempFolderStorage'

@bmihelac
Copy link
Member

@pawelnowak this should be bug introduced in 0.2.9 - it is explained here:
#347

can you please check if fix works for you?

@bmihelac
Copy link
Member

Fixed in 0.3. please reopen if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants