Error "'ascii' codec can't encode character '\xe9' in position 5: ordinal not in range(128)" thrown on file with unicode data #36

gvenzl · 2020-03-20T03:39:47Z

When a file includes unicode (or probably non-ascii) data, csv2db throws:

Loading file allCountries.1000.txt.gz
Error while loading file: allCountries.1000.txt.gz
'ascii' codec can't encode character '\xe9' in position 5: ordinal not in range(128)

The text was updated successfully, but these errors were encountered:

cofin · 2020-03-20T15:42:05Z

FYI - I ran into this issue yesterday. In addition to setting the character encoding on the file open, i also had to set the Oracle connection encoding like so:

conn = cx_Oracle.connect(user,
                                     password,
                                     db_name,
                                     encoding="UTF-8", encoding="UTF-8")

gvenzl · 2020-03-21T21:26:55Z

Thanks @cofin,

Indeed it's an issue with the database connection encoding which defaults to ASCII on Oracle while it defaults to UTF-8 on every other major database (opened an ER for that one to change as well).

All other databases use UTF-8 by default. Bug fix for bug #36 Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

gvenzl · 2020-03-22T00:48:37Z

Fixed.

* Introduce .gitattributes Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Remove tabs in .gitattribute Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Add password prompt Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Correct path in unit test test file Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Introduce interactive password Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Move version number forward. Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Introduce tests for different databases Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Include server authentication and protocol in DB2 connect string. Otherwise the driver tries to authenticate against a local user instead of the one on the database server. Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Add unicode tests. Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Rename for new test case Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Add load test for Oracle. Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Convert data list to tuple at array append. Sql Server requires a tuple or dictionary for insert. Rather than converting the line straight to a tuple and lose functionality like list.pop(), conversion is now happening at the very last time that the row values are touched (when appending it to the input_array. Bug fix for bug #35 Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Use UTF-8 connection encoding for Oracle (default: ascii) All other databases use UTF-8 by default. Bug fix for bug #36 Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Enhance error reporting (ER #37) Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Separate functional tests from loading tests. Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Fix text description prints Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Using `psycopg2.extras.execute_batch()` for Postgres (ER #39) Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * v.1.5.1 release Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com> * Fix doc string for executemany Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

gvenzl self-assigned this Mar 20, 2020

gvenzl added the bug Something isn't working label Mar 20, 2020

gvenzl added this to the v1.5.1 milestone Mar 20, 2020

gvenzl added a commit that referenced this issue Mar 22, 2020

Use UTF-8 connection encoding for Oracle (default: ascii)

6c6ca68

All other databases use UTF-8 by default. Bug fix for bug #36 Signed-off-by: Gerald Venzl <gerald.venzl@gmail.com>

gvenzl closed this as completed Mar 22, 2020

gvenzl mentioned this issue Mar 22, 2020

v1.5.1 #40

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error "'ascii' codec can't encode character '\xe9' in position 5: ordinal not in range(128)" thrown on file with unicode data #36

Error "'ascii' codec can't encode character '\xe9' in position 5: ordinal not in range(128)" thrown on file with unicode data #36

gvenzl commented Mar 20, 2020

cofin commented Mar 20, 2020

gvenzl commented Mar 21, 2020

gvenzl commented Mar 22, 2020

Error "'ascii' codec can't encode character '\xe9' in position 5: ordinal not in range(128)" thrown on file with unicode data #36

Error "'ascii' codec can't encode character '\xe9' in position 5: ordinal not in range(128)" thrown on file with unicode data #36

Comments

gvenzl commented Mar 20, 2020

cofin commented Mar 20, 2020

gvenzl commented Mar 21, 2020

gvenzl commented Mar 22, 2020