Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BINARY column type support #108

Open
bebaek opened this issue Apr 24, 2023 · 1 comment
Open

BINARY column type support #108

bebaek opened this issue Apr 24, 2023 · 1 comment
Labels
enhancement New feature or request sqlalchemy

Comments

@bebaek
Copy link

bebaek commented Apr 24, 2023

I have a question regarding accessing delta lake tables by python. Similar to this SQLAlchemy example, I tried to save something in a BINARY column like:

    example_bytes = Column(BINARY)  # also tried Binary, LargeBinary, ...

but that crashes with:

sqlalchemy.exc.StatementError: (builtins.AttributeError) module 'databricks.sql' has no attribute 'Binary'

I'm wondering if databricks-python-sql doesn't support binary column yet? Other basic types work fine. Also executing raw sql by the connector works fine. (I'm using hive metastore if this is relevant.)

@susodapop
Copy link
Contributor

Following up on this now that we've implemented SQLAlchemy 2 on main. We have a much better understanding of how SQLAlchemy handles types now and there are some limitations of Databricks itself that make this unfeasible for the moment. Namely, databricks-sql-connector doesn't have a way to bind parameter values of BINARY type. sqla needs every data type to bindable for parameterisation. Since DBR doesn't support binding BINARY values, our dialect doesn't either. This same limitation applies to ARRAY, MAP, and STRUCT types.

That said, where there is a will there is a way. While DBR doesn't support parameterisation of these types, you can write a string SQL query that includes them. DBR understands hexadecimal strings and can translate them to BINARY types during ingestion. And pysql will deliver them as io.Buffer's during egress.

So if this is necessary for your use-case, you can implement a workaround by using your own custom SQLAlchemy type class (which extends TypeDecorator). You'll need to implement the process_bind_param, process_literal_param, and process_result_value methods for this custom type in such a way that the ...param methods accept an io.Buffer and convert it to a hexadecimal string and render it inline with the query text.

We've marked this as a FutureFeature to be added to this dialect here. If DBR adds support for binding these parameter types then it would be very easy to implement as part of our repository. If not, then inlining these values could work. Although would not be strictly safe from SQL injection, since an arbitrary value would be rendered in the resulting SQL string.

Although this may not be as risky, since hexadecimal strings cannot include any of the dangerous SQL commands: the hexadecimal charset doesn't include ; or enough consonants to spell DROP, UPDATE, CREATE etc.

@susodapop susodapop added enhancement New feature or request sqlalchemy labels Oct 24, 2023
@susodapop susodapop changed the title Binary sqlalchemy column errors out BINARY column type support Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request sqlalchemy
Projects
None yet
Development

No branches or pull requests

2 participants