How do I set encoding to utf8? #47

Proteusiq · 2017-12-04T13:05:07Z

I have a DataFrame with columns holding non-ASCII characters. I would to chance session encofing to something similar to udaExec.connect(....,charset='utf8') in teradata and cnxn.setencoding(encoding='utf-8') in pyodbc.

I am able to load data but having funny characters with:

with giraffez.BulkLoad("BDA.Table", print_error_table=True) as load:
    load.columns = df.columns.tolist()
    for row in df.values.tolist():
        load.put(row)

How do I set encoding to UTF-8? Is it done by giraffez.encoders? Is there how-to code example?

The text was updated successfully, but these errors were encountered:

ktravis · 2018-01-12T13:10:32Z

The session encoding used is already UTF8, which giraffez requires. Have you encountered an issue with loading UTF8 encoded data?

Proteusiq · 2018-01-13T18:28:28Z

Yes. I get weird characters that I reproduced with teradata and pyodbc when encoding is ASCII. It seams that the default is ASCII and not UTF-8

ktravis · 2018-01-13T19:20:49Z

I would be happy to help diagnose further if you can provide the steps to reproduce the issue you're seeing. Please make sure to include all relevant configuration information, such as python version, giraffez version, Teradata client library and server version. Thank you.

ktravis · 2018-01-29T22:44:20Z

Closing without further details. I would be happy to reopen and address this if you can provide an example.

ausiddiqui · 2019-01-03T22:47:50Z

Wanted to reopen this as I'm having the same issue.

giraffez 2.0.24
macOS 10.14.2 / Ubuntu 16
python 3.5 (on macOS) / python 3.6 (on Ubuntu box)
Teradata ODBC / TTU 16.10
Teradata Server 15.10
SQL client: dbeaver / Aqua Data Studio (tried both)
JDBC for SQL client: TeraJDBC_16.20.00.02

import giraffez as g
import pandas as pd

with g.Cmd() as cmd:
    cmd.execute("""
CREATE SET TABLE mydb.mytable ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      location VARCHAR(250) CHARACTER SET UNICODE NOT CASESPECIFIC
PRIMARY INDEX ( location );""")

df = pd.DataFrame({'location': ['Düsseldorf','Marché Saint-Germain','İstanbul']})
df.to_csv('location.csv', index=False)

with g.BulkLoad(table='mydb.mytable') as load:
    load.from_file('location.csv', table='mydb.mytable', delimiter=",", null='')

select *
from mydb.mytable

location             
---------------------
DÃ¼sseldorf          
MarchÃ© Saint-Germain
Ä°stanbul

Have the same result if I use the dataframe to_list and iterate over the rows as the OP had done. The character set of my connection to TD when I do the select * is utf-8 and I'm using a GUI client. I am able to see these specific unicode characters and non-latin characters in other tables where the variable's character set is UNICODE in the create table statement on the same database. So I believe this is isolated and not related to either my SQL client or the TD server itself.

ktravis closed this as completed Jan 29, 2018

ausiddiqui mentioned this issue Jan 5, 2019

UTF8 / Unicode encoding for csv/dataframe load not storing correctly in DB #68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I set encoding to utf8? #47

How do I set encoding to utf8? #47

Proteusiq commented Dec 4, 2017 •

edited

ktravis commented Jan 12, 2018

Proteusiq commented Jan 13, 2018

ktravis commented Jan 13, 2018 •

edited

ktravis commented Jan 29, 2018 •

edited

ausiddiqui commented Jan 3, 2019

How do I set encoding to utf8? #47

How do I set encoding to utf8? #47

Comments

Proteusiq commented Dec 4, 2017 • edited

ktravis commented Jan 12, 2018

Proteusiq commented Jan 13, 2018

ktravis commented Jan 13, 2018 • edited

ktravis commented Jan 29, 2018 • edited

ausiddiqui commented Jan 3, 2019

Proteusiq commented Dec 4, 2017 •

edited

ktravis commented Jan 13, 2018 •

edited

ktravis commented Jan 29, 2018 •

edited