-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latin Extended-A characters being converted to ASCII on INSERT #39
Comments
I'll have to check if this is some bug in turbodbc. I have a test case for selecting unicode characters: The complimentary test for inserts sadly lacks unicode characters, that's something I'll have to add: Even with this, it is possible that turbodbc is missing some MS SQL oddities (as every ODBC driver seems to have peculiarities of some sort or another). I don't have access to an MS SQL database myself, so if I don't find anything with the existing databases, debugging will be very hard for me. |
In any case, thanks for reporting! |
Thanks Michael! By the way, Also, just in case you weren't aware, MS SQL Server is available on Ubuntu (and other Linux flavours) right now. https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-setup-ubuntu . Here's the main page: https://www.microsoft.com/en-us/sql-server/sql-server-vnext-including-Linux |
Thanks for the numbers and associated praise :-). I was not aware of MS SQL on Linux, and even for free! I think I have a new project... |
On the question of handling Unicode, you might want to read Michael Kleehammer's thoughts on this. He recently did a fairly major update of pyodbc to handle all the different RDBMS's and drivers, adding "encoding" parameters to the I notice that turbodbc doesn't use SQLDriverConnectW, SQLPrepareW, or SQLExecDirectW. I'm guessing this might have something to do with this issue. Here's an example of its use in pyodbc: |
Just FYI, here's the log from unixODBC when making the database connection:
|
I've installed MSSQL today and can reproduce issues with Unicode and stuff. This may require a more thorough investigation. |
Just digging around with this issue a little left me with some new funny stuff. Trying to insert Unicode (UTF-8 encoding) stuff into MSSQL failed with both Microsoft's ODBC driver and FreeTDS. If I query Unicode from the database, however, it works with Microsoft's ODBC driver. It also works with FreeTDS, but only if I set the So the driver definitely plays a role. But still the available MSSQL drivers seem to have a problem with UTF-8 encoded strings. |
Yes, Microsoft SQL Server drivers don't really do utf-8. They use UCS2 instead (i.e. two bytes, fixed width), most of the time, and use the "wide" versions of the ODBC functions SQLExecDirect, etc. See here for some more info. |
There also seems to be some truncation happening with unicode characters. >>> cursor.execute("SELECT N'hęīhœøōõ'")
>>> cursor.fetchall()
[['hęīhœ']] However, that does not happen with an explicit cast: >>> cursor.execute("SELECT CAST(N'hęīhœøōõ' AS NVARCHAR)")
>>> cursor.fetchall()
[['hęīhœøōõ']] pyodbc returns the correct string in both cases. |
There's no way around the wide character functions for MSSQL, I fear. I'll try to figure out how to safely determine what the driver's preferred way for transferring text is. Many drivers seem to cope with UTF-8 just fine, so I don't want to force them to use 2-byte UCS-2 when most characters they will be transferring is just plain old ASCII. |
@keitherskine I have started implementation for unicode support for MSSQL and other databases that do not really support UTF-8. You can find the code in the The build still breaks on old compilers thanks to overly optimistic usage of standard library features not implemented in GCC 4.8. Retrieving wide characters works, and so does using unicode characters in SQL commands. Parameter support is still missing. To benefit from the new things, you need to pass the parameter It's still work in progress, but I hope there is something releasable by the end of the week. |
I just merged all the unicode fixes to master. Here's how to use it:
|
Hi @MathMagique , has version 1.1 been released to PyPi yet? I tried installing the latest version of turbodbc using |
For building from scratch you need |
And it has not been released yet ;-) |
Thank you @xhochy for your suggestion. I managed to create a build in the end. But for the sake of the other kids at the back of the class like me, it might be helpful to update the build instructions to start off as follows:
After that, the build process should run smoothly, it did for me. |
I've been trying the new build, but I don't seem to be able to insert any string into a table, even ascii. Here is my setup:
Here's the code I'm running options = turbodbc.make_options(prefer_unicode=True)
conn = turbodbc.connect(dsn='xxx', uid='xxx', pwd='xxx', database='testdb', turbodbc_options=options)
crsr = conn.cursor()
string = 'c'
crsr.execute("INSERT INTO dbo.turbodbc VALUES (?, ?)", (string, string)) This is the result:
It doesn't seem to matter whether I set |
One other thing on the build instructions. It says tests can be executed with |
@keitherskine Considering the build instructions: That would be a valuable pull request. As for the py.test stuff, you need to The integration tests work on travis using Ubuntu 14 and OSX. And they also work with my local setup that uses a Ubuntu 16.04 VM and Microsoft's ODBC driver. The error you experience could be caused by a too old version of pybind11. Pybind11 only recently started to add support for C++'a |
Thank you for the feedback @MathMagique . I managed to get the As suggested, I have raised a PR for the docs update #61. |
Hi @MathMagique , just wanted to make sure you were aware that pybind released version 2.1.0 yesterday, so it may be possible to release v1.1 of turbodbc now. |
Hey @keitherskine! Thanks for the note. I was aware of it already. I am hoping to include #57 in the release, but I am planning for a release this weekend. |
Excellent, many thanks @MathMagique ! I look forward to trying out turbodbc on Windows. |
If I insert a Latin Extended-A character into my MS SQL Server database, it appears to be getting converted to its ASCII equivalent character.
Here's my setup:
Python 3.4.4, running on CentOS 6.6
MS SQL Server 2008 R2
Using the Microsoft ODBC Driver 11 for SQL Server on Linux with unixODBC 2.3.2
If I query for the hex equivalent of the two columns in the table, I get:
0x6320E72063
and0x63002000E70020006300
. As you can see, the third test character has been converted from c-with-an-acute-accent to plain old c, in both columns. I can kind-of understand this would happen in the first (ascii) column because the column can contain only single-byte characters, but the second "unicode" column should be able to handle this and hence contain0x63002000E70020000701
, i.e. end with0701
not6300
(note the little-endianness).Somewhere along the line, the "Latin Extended-A" character \u0107 is being translated to \u0063, even if the target column is a unicode column. Would this be happening within turbodbc? Or somewhere else? Is turbodbc designed to work only with single-byte characters?
The text was updated successfully, but these errors were encountered: