You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding accessing delta lake tables by python. Similar to this SQLAlchemy example, I tried to save something in a BINARY column like:
example_bytes=Column(BINARY) # also tried Binary, LargeBinary, ...
but that crashes with:
sqlalchemy.exc.StatementError: (builtins.AttributeError) module 'databricks.sql' has no attribute 'Binary'
I'm wondering if databricks-python-sql doesn't support binary column yet? Other basic types work fine. Also executing raw sql by the connector works fine. (I'm using hive metastore if this is relevant.)
The text was updated successfully, but these errors were encountered:
Following up on this now that we've implemented SQLAlchemy 2 on main. We have a much better understanding of how SQLAlchemy handles types now and there are some limitations of Databricks itself that make this unfeasible for the moment. Namely, databricks-sql-connector doesn't have a way to bind parameter values of BINARY type. sqla needs every data type to bindable for parameterisation. Since DBR doesn't support binding BINARY values, our dialect doesn't either. This same limitation applies to ARRAY, MAP, and STRUCT types.
That said, where there is a will there is a way. While DBR doesn't support parameterisation of these types, you can write a string SQL query that includes them. DBR understands hexadecimal strings and can translate them to BINARY types during ingestion. And pysql will deliver them as io.Buffer's during egress.
So if this is necessary for your use-case, you can implement a workaround by using your own custom SQLAlchemy type class (which extends TypeDecorator). You'll need to implement the process_bind_param, process_literal_param, and process_result_value methods for this custom type in such a way that the ...param methods accept an io.Buffer and convert it to a hexadecimal string and render it inline with the query text.
We've marked this as a FutureFeature to be added to this dialect here. If DBR adds support for binding these parameter types then it would be very easy to implement as part of our repository. If not, then inlining these values could work. Although would not be strictly safe from SQL injection, since an arbitrary value would be rendered in the resulting SQL string.
Although this may not be as risky, since hexadecimal strings cannot include any of the dangerous SQL commands: the hexadecimal charset doesn't include ; or enough consonants to spell DROP, UPDATE, CREATE etc.
I have a question regarding accessing delta lake tables by python. Similar to this SQLAlchemy example, I tried to save something in a
BINARY
column like:but that crashes with:
I'm wondering if databricks-python-sql doesn't support binary column yet? Other basic types work fine. Also executing raw sql by the connector works fine. (I'm using hive metastore if this is relevant.)
The text was updated successfully, but these errors were encountered: