ANSI Client clobbered data ::SQLTables #2336
Date: 2010-03-14 03:40:27 +0100
Last updated: 2011-09-16 15:04:41 +0200
Date: 2010-03-14 03:40:27 +0100
I am using an ANSI ODBC client (not Unicode) with latest version of MonetDB ODBC driver 1.36.01.01 2/24/2010 and I am getting clobbed resultset comparing demo database loaded with VOC dataset to itself by using Zidsoft CompareData app.
Does the ODBC driver not support ANSI clients?
Date: 2010-03-15 18:03:09 +0100
The MonetDB ODBC driver only support Unicode and UTF-8. That is to say, the wide character calls are expected to use UCS-16, and the narrow character calls are expected to use UTF-8. The driver does not support any other character sets (well, ASCII since it is a proper subset of UTF-8).
Date: 2010-03-15 18:28:28 +0100
My app is ANSI so I am using narrow calls. So that's not the cause of this issue. I see the cause of the problem: the driver is returning incorrect information for ::SQLTables character resultset columns:
The driver is returning zero for SQL_DESC_OCTET_LENGTH and SQL_DESC_LENGTH for ::SQLTables character resultsetset columns instead of actual values.
Date: 2010-04-15 21:55:07 +0200
If by ANSI you mean one of the ISO-8859 versions (e.g. ISO-8859-1 aka "latin 1") then that is definitely a problem. Those are not compatible with UTF-8.
What are the types of pRow->buffer_length and pRow->column_size?
Is your application 32 bit or 64 bit?
The MonetDB ODBC driver may not be completely compliant with the 64-bit definition of the ODBC interface. I'm hoping that can be fixed in the next feature release (not in a bug fix release since it's an API change).
Date: 2010-04-15 22:42:37 +0200
I am running the 32-bit version of my app on Windows XP SP3 32-bit and using Windows characters set 1252 (US-EN).
I am using ODBC types for column_size and buffer_length which in 32-bit version of my app are SQUINTEGER and SQLINTEGER.
My app is tested successfully with 30-40 different DBMS ODBC drivers without encountering this issue http://www.zidsoft.com/dbmsnotes.html
The driver is returning zero for SQL_DESC_OCTET_LENGTH and SQL_DESC_LENGTH
Date: 2010-04-16 10:07:58 +0200
I have been able to also get the value 0 from the call to SQLColAttribute on column 1 after a call to SQLTables.
Date: 2010-04-16 14:43:07 +0200
The driver needs to return valid values (not 0 and certainly usually greater than 1) for SQL_DESC_LENGTH and SQL_DESC_OCTET_LENGTH for the resultset columns as per ODBC descriptions for these.
So for SQLTables resultset table names these would be for example, 128, 128 assuming that the maximum length for a table name in MonetDB is 128 character. Similar info should be returned by the driver for SQLStatistics, SQLPrimaryKeys and SQLForeignKesy, SQLGetTypeInfo and all other ODBC catalog functions for these attributes.
Date: 2010-04-16 15:46:14 +0200
I read that too, but I'm not sure I can come to the same conclusions. I'm struggling to understand what the specification means.
The columns SQLTables returns are all varchar, without mention how large they should be. The first column is the catalog name. The values here can be NULL if the data source does not support catalogs. Since MonetDB indeed doesn't support catalogs, the value returned is indeed NULL.
The description in SQLTables doesn't say what size varchar column to return, but presumably one that is large enough to hold the data. In this case, varchar(1) is plenty long enough (for this particular column).
For SQLColAttribute, the description of SQL_DESC_LENGTH says that, for variable-length data (which varchar is), the size returned is the "actual character length" (we can ignore the fixed-length case). What is the actual character length of a NULL value? I can argue 0.
The problem for SQL_DESC_OCTET_LENGTH is that, for variable-length character types, it returns the maximum length in bytes. Again, what is the maximum length of a column that is always going to be NULL? The query that SQLTables does under the hood uses cast(null as varchar(1)) for the catalog column.
There is, however, a problem in MonetDB. The server returns a single column length value, and for varchar columns that value is the actual length of the longest value in the column (0 if all values are NULL or if there are no rows). (The value returned is the declared length for fixed-length columns, i.e. 50 for char(50).) In other words, the client does not know the maximum allowed length of the column if the longest value is shorter. So if a column is declared varchar(50) but only contains NULLs, the value returned by the server is 0. This means that determining SQL_DESC_LENGTH is relatively easy, but determining SQL_DESC_OCTET_LENGTH is not currently possible for varchar columns.
Another problem is that SQL_DESC_LENGTH counts characters and SQL_DESC_OCTET_LENGTH counts bytes. These are not interchangeable. MonetDB uses the UTF-8 encoding, which is a variable-length encoding. A character can take from 1 to 6 bytes. Should we then multiply the value by 6? If a column is declared as e.g. varchar(10), that means that it can store up to 10 characters in MonetDB.
The problem we're having here doesn't really have anything to do with SQLTables as such. The implementation does a SQLExecDirect call under the hood, and SQLColAttribute doesn't know (or care) how the result set was created. The problem has to do with getting metainformation about the result set. In other words, your suggestion of using 128 characters (or is that bytes?) doesn't help. That information is not passed on to SQLColAttribute, and anyway, the same problems would exist for hand-created SQL queries.
I do agree that all metainformation-returning queries should return compatible information.
Date: 2010-04-16 16:16:28 +0200
You can ignore the catalog column of the resultset since MonetDB does not support catalogs and if the driver correctly reports that catalog are not supported by the DBMS.
Example column declared as varchar(30) the driver would return
SQL_DESC_LENGTH and SQL_DESC_OCTET_LENGTH are used by the ODBC client to create and allocate buffers for the resultset data and without correct information for these attributes, the ODBC client can not correctly retrieve data from the database
Date: 2010-04-16 18:17:26 +0200
What does "the actual character length for a variable-length data type" mean? I thought it meant the length of the (longest) actual value in the column. Since this is from SQLColAttribute, there are actual values in the result set to talk about. From your comment I understand you have a different interpretation (namely the maximum allowed length for the column). Is there anywhere a clearer definition of these attributes?
The definition of SQL_DESC_OCTET_LENGTH is perhaps a little clearer, although I could interpret that either way as well: "For variable-length character or binary types, this is the maximum length in bytes". Which maximum? The maximum according to the column definition or the maximum in the current result set?
In addition to SQL_DESC_LENGTH and SQL_DESC_OCTET_LENGTH, there is also SQL_DESC_DISPLAY_SIZE which has a similar issue. Does that value depend on the actual content of the column (as returned by the query) or does that depend on the definition of the column?
I can see that the width of the columns returned by SQLTables could depend on the limitations of the server for the type of information (schema name length, etc.) Unfortunately, that doesn't help currently since, as mentioned, the information is not passed on.
As to ANSI vs. Unicode. We only support UTF-8 in the narrow calls and UTF-16 in the wide calls. As I understand it, the term ANSI refers to a Windows code page which is close to ISO-8859-1, but in any case not compatible with UTF-8.
I suppose I could change the code so that the narrow interface is CP-1252 and if you want to use characters that don't occur in that code page you're forced to use the wide interface. I could also differentiate between the calls with or without A at the end (e.g. SQLTables vs. SQLTablesA) where the calls with A use the "ANSI" code page and the calls without use UTF-8. But I don't really like this idea.
(My weekend is starting now, so I probably won't respond before Monday.)
Date: 2010-04-16 18:52:39 +0200
When I am getting resultset description from the driver no data is retrieved (I don't want to retrieve any data yet, the table could have 4 million rows or no rows, but client only wants to know the resultset column descriptions). This is usually done by preparing the select statement "select * from"
Remember I don't want the driver to retrieve the resultset I only want to know what is resultset description is. So the database can have 1000 tables and all the tables are empty, but the client should still be able to determine the resultset description for each table by preparing "select * from
create table test (col1 varchar(30))
I should be able to determine that col1 resultset column length is 30 and octet length is 30 (ANSI) even though the table is empty by calling SQLColAttribute and SQL_DESC_LENGTH, SQL_DESC_OCTET_LENGTH
Whether the table has data or not is not-relevant to the resultset description.
Since the server uses UTF8 then when transferring data to an ANSI client (narrow calls) the driver needs to convert from UTF8 to the codepage returned by GetACP() call. The conversion of course could fail if the client codepage has no equivalent to the UTF8 character and in this case the driver would report a conversion error for only that piece of data.
Similarly for wide clients, the driver would convert from UTF8 to UTF16 (UCS16, actually) and report any conversion errors (less likely).
Date: 2010-05-04 09:32:09 +0200
This bug was previously known as tracker item 2970087 at http://sourceforge.net/support/tracker.php?aid=2970087
Date: 2011-08-22 20:21:27 +0200
A number of recent changes in the Aug2011 branch have hopefully solved the problems described here. The server now returns the size of the underlying datatype to the client (the ODBC driver) which passes it on to the application. Also, the various lengths that can be requested from SQLColAttribute (SQL_DESC_LENGTH, SQL_DESC_OCTET_LENGTH, SQL_DESC_DISPLAY_SIZE) now result in the values as described in the ODBC documentation. In particular, the octet length for character types (char, varchar, clob) is 6 times the value of the SQL_DESC_LENGTH value since the latter is number of characters, and UTF-8 encoding can take up to 6 bytes for a Unicode code point (character).
I will call this bug resolved, although you can always reopen the bug if you don't agree with the resolution.
Date: 2011-09-16 15:04:41 +0200
The Aug2011 version has been released.
The text was updated successfully, but these errors were encountered: