Skip to content
This repository has been archived by the owner on Nov 19, 2021. It is now read-only.

How do I set encoding to utf8? #47

Closed
Proteusiq opened this issue Dec 4, 2017 · 5 comments
Closed

How do I set encoding to utf8? #47

Proteusiq opened this issue Dec 4, 2017 · 5 comments

Comments

@Proteusiq
Copy link

Proteusiq commented Dec 4, 2017

I have a DataFrame with columns holding non-ASCII characters. I would to chance session encofing to something similar to udaExec.connect(....,charset='utf8') in teradata and cnxn.setencoding(encoding='utf-8') in pyodbc.

I am able to load data but having funny characters with:

with giraffez.BulkLoad("BDA.Table", print_error_table=True) as load:
    load.columns = df.columns.tolist()
    for row in df.values.tolist():
        load.put(row)

How do I set encoding to UTF-8? Is it done by giraffez.encoders? Is there how-to code example?

@ktravis
Copy link
Collaborator

ktravis commented Jan 12, 2018

The session encoding used is already UTF8, which giraffez requires. Have you encountered an issue with loading UTF8 encoded data?

@Proteusiq
Copy link
Author

Yes. I get weird characters that I reproduced with teradata and pyodbc when encoding is ASCII. It seams that the default is ASCII and not UTF-8

@ktravis
Copy link
Collaborator

ktravis commented Jan 13, 2018

I would be happy to help diagnose further if you can provide the steps to reproduce the issue you're seeing. Please make sure to include all relevant configuration information, such as python version, giraffez version, Teradata client library and server version. Thank you.

@ktravis
Copy link
Collaborator

ktravis commented Jan 29, 2018

Closing without further details. I would be happy to reopen and address this if you can provide an example.

@ktravis ktravis closed this as completed Jan 29, 2018
@ausiddiqui
Copy link

Wanted to reopen this as I'm having the same issue.

giraffez 2.0.24
macOS 10.14.2 / Ubuntu 16
python 3.5 (on macOS) / python 3.6 (on Ubuntu box)
Teradata ODBC / TTU 16.10
Teradata Server 15.10
SQL client: dbeaver / Aqua Data Studio (tried both)
JDBC for SQL client: TeraJDBC_16.20.00.02

import giraffez as g
import pandas as pd

with g.Cmd() as cmd:
    cmd.execute("""
CREATE SET TABLE mydb.mytable ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      location VARCHAR(250) CHARACTER SET UNICODE NOT CASESPECIFIC
PRIMARY INDEX ( location );""")

df = pd.DataFrame({'location': ['Düsseldorf','Marché Saint-Germain','İstanbul']})
df.to_csv('location.csv', index=False)

with g.BulkLoad(table='mydb.mytable') as load:
    load.from_file('location.csv', table='mydb.mytable', delimiter=",", null='')
select *
from mydb.mytable

location             
---------------------
Düsseldorf          
Marché Saint-Germain
Ä°stanbul            

Have the same result if I use the dataframe to_list and iterate over the rows as the OP had done. The character set of my connection to TD when I do the select * is utf-8 and I'm using a GUI client. I am able to see these specific unicode characters and non-latin characters in other tables where the variable's character set is UNICODE in the create table statement on the same database. So I believe this is isolated and not related to either my SQL client or the TD server itself.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants