Panic when binding `varchar`/`string` types: slice bounds out of range [:668] with capacity 256 #165

hawkaa · 2021-10-22T07:31:38Z

Thank you for an excellent library! I can really see how much hard work that is put in to making ODBC available for golang users. I was wondering if you could help us out a little with reading variable length strings from Spark/Databricks.

We currently run our backend services in golang. Where we previously have been using PostgreSQL and pq, we now need to connect to a Spark SQL endpoint (via databricks). They offer drivers as both JDBC and ODBC and the latter is indeed what we need. We have been successful in connecting and querying our Spark SQL endpoints with unixodbc, the mentioned Simba ODBC driver, and this package.

However, I have run into an issue reading columns with the STRING type. Here's the code to reproduce:

	log.Println("connecting to spark endpoint")
	dsn := fmt.Sprintf("Driver=%s;HOST=...;SparkServerType=3;PORT=443;ssl=1;AuthMech=3;ThriftTransport=2;transportMode=http;httpPath=...;Schema=default;UID=...;PWD=%s;", driver, pwd)
	db, err := sql.Open("odbc", dsn)
	if err != nil {
		return errors.Wrap(err, "unable to connect to spark endpoint")
	}

	log.Println("selecting first row of transaction table")
	rows, err := db.QueryContext(ctx, "select data from default.transactions limit 1")
	if err != nil {
		return errors.Wrap(err, "unable to query first row from the transactions table")
	}
	log.Println("calling next on rows")
	rows.Next()
	log.Println("BOOM!")

This panics with the following error message:

panic: runtime error: slice bounds out of range [:668] with capacity 256

goroutine 1 [running]:
github.com/alexbrainman/odbc.(*BindableColumn).Value(0xc00009a000, 0xc000129d70, 0x4049952)
	/Users/hakon/go/pkg/mod/github.com/alexbrainman/odbc@v0.0.0-20210605012845-39f8520b0d5f/column.go:259 +0x1b5
github.com/alexbrainman/odbc.(*Rows).Next(0xc00009c000, {0xc000092020, 0x1, 0xc00013e000})
	/Users/hakon/go/pkg/mod/github.com/alexbrainman/odbc@v0.0.0-20210605012845-39f8520b0d5f/rows.go:35 +0xef
database/sql.(*Rows).nextLocked(0xc00009e000)
	/usr/local/go/src/database/sql/sql.go:2967 +0x111
database/sql.(*Rows).Next.func1()
	/usr/local/go/src/database/sql/sql.go:2945 +0x2f
database/sql.withLock({0x4120290, 0xc00009e030}, 0xc000129e68)
	/usr/local/go/src/database/sql/sql.go:3396 +0x8c
database/sql.(*Rows).Next(0xc00009e000)
	/usr/local/go/src/database/sql/sql.go:2944 +0x6f
github.com/duneanalytics/hakon-experimenting/code/004-odbc-spark-debugging/internal/testbench.Run({0x4120d50, 0xc0000180b0}, {0x7ffeefbff9cd, 0x2f}, {0x7ffeefbff9ad, 0x1c})
	/Users/hakon/code/hakon-experimenting/code/004-odbc-spark-debugging/internal/testbench/testbench.go:34 +0x254
main.main()
	/Users/hakon/code/hakon-experimenting/code/004-odbc-spark-debugging/cmd/main.go:24 +0xcc
exit status 2

The default.transactions table contains a data column which has the type string. This is what describe default.transactions return. This is the only valid data type for strings in Spark.

I have seen other issues mention this in the past. Particularly #98 is relevant, where @joshuasprow mentions that VARCHAR columns is by default assigned 256 bytes, while for some databases the length can be much greater than that. I believe that is the issue we are running into as well.

I think I have nailed the problem down a little.

When binding columns, a call to [C.SQLDescribeColW](

odbc/api/zapi_unix.go

Line 44 in 39f8520

    
           r := C.SQLDescribeColW(C.SQLHSTMT(statementHandle), C.SQLUSMALLINT(columnNumber), (*C.SQLWCHAR)(unsafe.Pointer(columnName)), C.SQLSMALLINT(bufferLength), (*C.SQLSMALLINT)(nameLengthPtr), (*C.SQLSMALLINT)(dataTypePtr), (*C.SQLULEN)(columnSizePtr), (*C.SQLSMALLINT)(decimalDigitsPtr), (*C.SQLSMALLINT)(nullablePtr))

is made. This inputs a statement handle, and a column number, and outputs data about the column, including its length. The length returned for the data column is 256. This length is then passed on to NewVariableWidthColumn which rightfully creates a buffer for that column of size 256.

When calling rows.Next(), it reads the first results and tries to fit my 668 length character string into the buffer which is only 256 long which makes the program panic.

I have tried doing this on both OSX and Linux (with unixodbc) and got the same issue. Using unixodbc and the same driver with pyodbc works as expected.

Now my question to you whether you have some pointers on how to solve my problem?

Have I configured my driver wrong?
Are there any input parameters I can use to override the default behaviour for strings without length?
Is this something that can be fixed in this repository? In case, I could be able to open a PR to fix it, but I would need some help and pointers to do so.

Thank you!

Håkon

The text was updated successfully, but these errors were encountered:

hawkaa · 2021-10-22T08:18:55Z

I actually fixed it with a small hack: duneanalytics@ae0055c

But I am unsure whether this is a good idea and whether this is something we can make generic.

alexbrainman · 2021-10-24T06:16:12Z

Hello @hawkaa

When binding columns, a call to [C.SQLDescribeColW](

odbc/api/zapi_unix.go

Line 44 in 39f8520

r := C.SQLDescribeColW(C.SQLHSTMT(statementHandle), C.SQLUSMALLINT(columnNumber), (*C.SQLWCHAR)(unsafe.Pointer(columnName)), C.SQLSMALLINT(bufferLength), (*C.SQLSMALLINT)(nameLengthPtr), (*C.SQLSMALLINT)(dataTypePtr), (*C.SQLULEN)(columnSizePtr), (*C.SQLSMALLINT)(decimalDigitsPtr), (*C.SQLSMALLINT)(nullablePtr))

is made. This inputs a statement handle, and a column number, and outputs data about the column, including its length. The length returned for the data column is 256. This length is then passed on to NewVariableWidthColumn which rightfully creates a buffer for that column of size 256.
When calling rows.Next(), it reads the first results and tries to fit my 668 length character string into the buffer which is only 256 long which makes the program panic.

Your explanation sounds reasonable. Your ODBC driver uses variable size buffers for SQL_VARCHAR columns, so SQLBindCol ODBC API cannot be used for such columns. SQLBindCol API is memory efficient, because it does not require to allocate buffers every time you read column result.

Your change duneanalytics@ae0055c looks reasonable in your situation. Your approach is sound, it just requires more ODBC calls and more memory allocation / deallocation.

I don't want to accept your change into main repo, because, I believe, that your driver SQLDescribeCol API implementation is incorrect - it returns 256 as column size, and then returns more than 256 of bytes.

Alex

hawkaa · 2021-10-24T20:22:26Z

Thank you for a detailed answer, @alexbrainman !

I don't want to accept your change into main repo, because, I believe, that your driver SQLDescribeCol API implementation is incorrect - it returns 256 as column size, and then returns more than 256 of bytes.

Of course that change does not belong in this repository. I just wanted to show you how I worked around the problem. I mean, technically, we could have expanded the API of this library with some bindVarcharColumns setting or similar, but as you say, there's a bug in the underlying driver. I will make sure to report that.

Thank you again!

hawkaa closed this as completed Oct 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic when binding `varchar`/`string` types: slice bounds out of range [:668] with capacity 256 #165

Panic when binding `varchar`/`string` types: slice bounds out of range [:668] with capacity 256 #165

hawkaa commented Oct 22, 2021

hawkaa commented Oct 22, 2021

alexbrainman commented Oct 24, 2021

hawkaa commented Oct 24, 2021

Panic when binding varchar/string types: slice bounds out of range [:668] with capacity 256 #165

Panic when binding varchar/string types: slice bounds out of range [:668] with capacity 256 #165

Comments

hawkaa commented Oct 22, 2021

hawkaa commented Oct 22, 2021

alexbrainman commented Oct 24, 2021

hawkaa commented Oct 24, 2021

Panic when binding `varchar`/`string` types: slice bounds out of range [:668] with capacity 256 #165

Panic when binding `varchar`/`string` types: slice bounds out of range [:668] with capacity 256 #165