How do I set encoding to utf8? #47
Comments
The session encoding used is already UTF8, which giraffez requires. Have you encountered an issue with loading UTF8 encoded data? |
Yes. I get weird characters that I reproduced with teradata and pyodbc when encoding is ASCII. It seams that the default is ASCII and not UTF-8 |
I would be happy to help diagnose further if you can provide the steps to reproduce the issue you're seeing. Please make sure to include all relevant configuration information, such as python version, giraffez version, Teradata client library and server version. Thank you. |
Closing without further details. I would be happy to reopen and address this if you can provide an example. |
Wanted to reopen this as I'm having the same issue. giraffez 2.0.24 import giraffez as g
import pandas as pd
with g.Cmd() as cmd:
cmd.execute("""
CREATE SET TABLE mydb.mytable ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
location VARCHAR(250) CHARACTER SET UNICODE NOT CASESPECIFIC
PRIMARY INDEX ( location );""")
df = pd.DataFrame({'location': ['Düsseldorf','Marché Saint-Germain','İstanbul']})
df.to_csv('location.csv', index=False)
with g.BulkLoad(table='mydb.mytable') as load:
load.from_file('location.csv', table='mydb.mytable', delimiter=",", null='') select *
from mydb.mytable
location
---------------------
Düsseldorf
Marché Saint-Germain
Ä°stanbul Have the same result if I use the dataframe to_list and iterate over the rows as the OP had done. The character set of my connection to TD when I do the select * is utf-8 and I'm using a GUI client. I am able to see these specific unicode characters and non-latin characters in other tables where the variable's character set is UNICODE in the create table statement on the same database. So I believe this is isolated and not related to either my SQL client or the TD server itself. |
I have a DataFrame with columns holding non-ASCII characters. I would to chance session encofing to something similar to udaExec.connect(....,charset='utf8') in teradata and cnxn.setencoding(encoding='utf-8') in pyodbc.
I am able to load data but having funny characters with:
How do I set encoding to UTF-8? Is it done by giraffez.encoders? Is there how-to code example?
The text was updated successfully, but these errors were encountered: