Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when binding varchar/string types: slice bounds out of range [:668] with capacity 256 #165

Closed
hawkaa opened this issue Oct 22, 2021 · 3 comments

Comments

@hawkaa
Copy link

hawkaa commented Oct 22, 2021

Dear @alexbrainman ,

Thank you for an excellent library! I can really see how much hard work that is put in to making ODBC available for golang users. I was wondering if you could help us out a little with reading variable length strings from Spark/Databricks.

We currently run our backend services in golang. Where we previously have been using PostgreSQL and pq, we now need to connect to a Spark SQL endpoint (via databricks). They offer drivers as both JDBC and ODBC and the latter is indeed what we need. We have been successful in connecting and querying our Spark SQL endpoints with unixodbc, the mentioned Simba ODBC driver, and this package.

However, I have run into an issue reading columns with the STRING type. Here's the code to reproduce:

	log.Println("connecting to spark endpoint")
	dsn := fmt.Sprintf("Driver=%s;HOST=...;SparkServerType=3;PORT=443;ssl=1;AuthMech=3;ThriftTransport=2;transportMode=http;httpPath=...;Schema=default;UID=...;PWD=%s;", driver, pwd)
	db, err := sql.Open("odbc", dsn)
	if err != nil {
		return errors.Wrap(err, "unable to connect to spark endpoint")
	}

	log.Println("selecting first row of transaction table")
	rows, err := db.QueryContext(ctx, "select data from default.transactions limit 1")
	if err != nil {
		return errors.Wrap(err, "unable to query first row from the transactions table")
	}
	log.Println("calling next on rows")
	rows.Next()
	log.Println("BOOM!")

This panics with the following error message:

panic: runtime error: slice bounds out of range [:668] with capacity 256

goroutine 1 [running]:
github.com/alexbrainman/odbc.(*BindableColumn).Value(0xc00009a000, 0xc000129d70, 0x4049952)
	/Users/hakon/go/pkg/mod/github.com/alexbrainman/odbc@v0.0.0-20210605012845-39f8520b0d5f/column.go:259 +0x1b5
github.com/alexbrainman/odbc.(*Rows).Next(0xc00009c000, {0xc000092020, 0x1, 0xc00013e000})
	/Users/hakon/go/pkg/mod/github.com/alexbrainman/odbc@v0.0.0-20210605012845-39f8520b0d5f/rows.go:35 +0xef
database/sql.(*Rows).nextLocked(0xc00009e000)
	/usr/local/go/src/database/sql/sql.go:2967 +0x111
database/sql.(*Rows).Next.func1()
	/usr/local/go/src/database/sql/sql.go:2945 +0x2f
database/sql.withLock({0x4120290, 0xc00009e030}, 0xc000129e68)
	/usr/local/go/src/database/sql/sql.go:3396 +0x8c
database/sql.(*Rows).Next(0xc00009e000)
	/usr/local/go/src/database/sql/sql.go:2944 +0x6f
github.com/duneanalytics/hakon-experimenting/code/004-odbc-spark-debugging/internal/testbench.Run({0x4120d50, 0xc0000180b0}, {0x7ffeefbff9cd, 0x2f}, {0x7ffeefbff9ad, 0x1c})
	/Users/hakon/code/hakon-experimenting/code/004-odbc-spark-debugging/internal/testbench/testbench.go:34 +0x254
main.main()
	/Users/hakon/code/hakon-experimenting/code/004-odbc-spark-debugging/cmd/main.go:24 +0xcc
exit status 2

The default.transactions table contains a data column which has the type string. This is what describe default.transactions return. This is the only valid data type for strings in Spark.

I have seen other issues mention this in the past. Particularly #98 is relevant, where @joshuasprow mentions that VARCHAR columns is by default assigned 256 bytes, while for some databases the length can be much greater than that. I believe that is the issue we are running into as well.

I think I have nailed the problem down a little.

When binding columns, a call to [C.SQLDescribeColW](

r := C.SQLDescribeColW(C.SQLHSTMT(statementHandle), C.SQLUSMALLINT(columnNumber), (*C.SQLWCHAR)(unsafe.Pointer(columnName)), C.SQLSMALLINT(bufferLength), (*C.SQLSMALLINT)(nameLengthPtr), (*C.SQLSMALLINT)(dataTypePtr), (*C.SQLULEN)(columnSizePtr), (*C.SQLSMALLINT)(decimalDigitsPtr), (*C.SQLSMALLINT)(nullablePtr))
is made. This inputs a statement handle, and a column number, and outputs data about the column, including its length. The length returned for the data column is 256. This length is then passed on to NewVariableWidthColumn which rightfully creates a buffer for that column of size 256.

When calling rows.Next(), it reads the first results and tries to fit my 668 length character string into the buffer which is only 256 long which makes the program panic.

I have tried doing this on both OSX and Linux (with unixodbc) and got the same issue. Using unixodbc and the same driver with pyodbc works as expected.

Now my question to you whether you have some pointers on how to solve my problem?

  1. Have I configured my driver wrong?
  2. Are there any input parameters I can use to override the default behaviour for strings without length?
  3. Is this something that can be fixed in this repository? In case, I could be able to open a PR to fix it, but I would need some help and pointers to do so.

Thank you!

Håkon

@hawkaa
Copy link
Author

hawkaa commented Oct 22, 2021

I actually fixed it with a small hack: duneanalytics@ae0055c

But I am unsure whether this is a good idea and whether this is something we can make generic.

@alexbrainman
Copy link
Owner

Hello @hawkaa

When binding columns, a call to [C.SQLDescribeColW](

r := C.SQLDescribeColW(C.SQLHSTMT(statementHandle), C.SQLUSMALLINT(columnNumber), (*C.SQLWCHAR)(unsafe.Pointer(columnName)), C.SQLSMALLINT(bufferLength), (*C.SQLSMALLINT)(nameLengthPtr), (*C.SQLSMALLINT)(dataTypePtr), (*C.SQLULEN)(columnSizePtr), (*C.SQLSMALLINT)(decimalDigitsPtr), (*C.SQLSMALLINT)(nullablePtr))

is made. This inputs a statement handle, and a column number, and outputs data about the column, including its length. The length returned for the data column is 256. This length is then passed on to NewVariableWidthColumn which rightfully creates a buffer for that column of size 256.
When calling rows.Next(), it reads the first results and tries to fit my 668 length character string into the buffer which is only 256 long which makes the program panic.

Your explanation sounds reasonable. Your ODBC driver uses variable size buffers for SQL_VARCHAR columns, so SQLBindCol ODBC API cannot be used for such columns. SQLBindCol API is memory efficient, because it does not require to allocate buffers every time you read column result.

Your change duneanalytics@ae0055c looks reasonable in your situation. Your approach is sound, it just requires more ODBC calls and more memory allocation / deallocation.

I don't want to accept your change into main repo, because, I believe, that your driver SQLDescribeCol API implementation is incorrect - it returns 256 as column size, and then returns more than 256 of bytes.

Alex

@hawkaa
Copy link
Author

hawkaa commented Oct 24, 2021

Thank you for a detailed answer, @alexbrainman !

I don't want to accept your change into main repo, because, I believe, that your driver SQLDescribeCol API implementation is incorrect - it returns 256 as column size, and then returns more than 256 of bytes.

Of course that change does not belong in this repository. I just wanted to show you how I worked around the problem. I mean, technically, we could have expanded the API of this library with some bindVarcharColumns setting or similar, but as you say, there's a bug in the underlying driver. I will make sure to report that.

Thank you again!

@hawkaa hawkaa closed this as completed Oct 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants