Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error "'ascii' codec can't encode character '\xe9' in position 5: ordinal not in range(128)" thrown on file with unicode data #36

Closed
gvenzl opened this issue Mar 20, 2020 · 3 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@gvenzl
Copy link
Collaborator

gvenzl commented Mar 20, 2020

When a file includes unicode (or probably non-ascii) data, csv2db throws:

Loading file allCountries.1000.txt.gz
Error while loading file: allCountries.1000.txt.gz
'ascii' codec can't encode character '\xe9' in position 5: ordinal not in range(128)
@gvenzl gvenzl self-assigned this Mar 20, 2020
@gvenzl gvenzl added the bug Something isn't working label Mar 20, 2020
@gvenzl gvenzl added this to the v1.5.1 milestone Mar 20, 2020
@cofin
Copy link

cofin commented Mar 20, 2020

FYI - I ran into this issue yesterday. In addition to setting the character encoding on the file open, i also had to set the Oracle connection encoding like so:

conn = cx_Oracle.connect(user,
                                     password,
                                     db_name,
                                     encoding="UTF-8", encoding="UTF-8")

@gvenzl
Copy link
Collaborator Author

gvenzl commented Mar 21, 2020

Thanks @cofin,

Indeed it's an issue with the database connection encoding which defaults to ASCII on Oracle while it defaults to UTF-8 on every other major database (opened an ER for that one to change as well).

gvenzl added a commit that referenced this issue Mar 22, 2020
All other databases use UTF-8 by default.

Bug fix for bug #36

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>
@gvenzl
Copy link
Collaborator Author

gvenzl commented Mar 22, 2020

Fixed.

@gvenzl gvenzl closed this as completed Mar 22, 2020
@gvenzl gvenzl mentioned this issue Mar 22, 2020
gvenzl added a commit that referenced this issue Mar 22, 2020
* Introduce .gitattributes

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Remove tabs in .gitattribute

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Add password prompt

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Correct path in unit test test file

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Introduce interactive password

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Move version number forward.

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Introduce tests for different databases

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Include server authentication and protocol in DB2 connect string.

Otherwise the driver tries to authenticate against a local user instead of the one on the database server.

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Add unicode tests.

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Rename for new test case

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Add load test for Oracle.

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Convert data list to tuple at array append.

Sql Server requires a tuple or dictionary for insert.
Rather than converting the line straight to a tuple and lose functionality like list.pop(), conversion is now happening at the very last time that the row values are touched (when appending it to the input_array.

Bug fix for bug #35

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Use UTF-8 connection encoding for Oracle (default: ascii)

All other databases use UTF-8 by default.

Bug fix for bug #36

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Enhance error reporting (ER #37)

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Separate functional tests from loading tests.

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Fix text description prints

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Using `psycopg2.extras.execute_batch()` for Postgres (ER #39)

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* v.1.5.1 release

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

* Fix doc string for executemany

Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants