New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inserting Data into Databricks via the databricks-sql-python library (Leveraging SQLALCHEMY) #299
Comments
Thanks for the detailed write-up. There are a few things happening here. First up, you cannot use Second, it looks like you're using the old SQLAlchemy 1.x syntax for your model code. The dialect included in databricks-sql-connector>=3.0.0 is built for SQLAlchemy 2.x exclusively. It may work but we can't guarantee it. You can see an example of the new syntax in our e2e tests here. Third, the actual exception is a syntax error. SQLAlchemy is writing an invalid SQL statement by omitting the column names in the INSERT. I'm not clear how this is happening as it's not something we observe in the 1000+ INSERT test cases that we run during development. But I wonder if you may be using an older SQLAlchemy version below 2.0.0. Which SQLAlchemy version do you have installed? |
I tried reproducing this error locally with:
and am not able to make SQLAlchemy emit an INSERT statement that omits the column names. Can you provide a reproducible example? FWIW: I don't think the Excel file has any bearing on this behaviour. In my attempts to reproduce I used both an Excel file as input and a randomly generated pandas dataframe. It worked as expected in both cases. |
Thanks @susodapop . I was running SQLAlchemy==1.4.49. I upgraded to 2.0.22, but I'm still facing the same issue... I think the issue is that within the excel file there is no "id" column. I want SQLAlchemy to add the "Model" to Databricks. Have Databricks + SQLAlchemy issue an id (Primary Key), and then return that id for all the future tables which I can use as a Foreign Key. This works for a PostgreSQL database with the exact same data.
''' Thank you, |
Just getting back to this after traveling. Can you please provide a runnable reproduction? I have attempted to reproduce the issue going off the code you provided but the code works. I don't have access to your excel file (and as indicated above, I don't think the excel file is really the problem). Without reproduction steps we're blocked on implementing a fix. |
Hi @susodapop here is a runnable reproduceable code. You'll just need to add your Databricks placeholders, and location to the excel file (attached)
|
Issue Description: Inserting Data into Databricks via the databricks-sql-python library (Leveraging SQLALCHEMY)
Error Message:
sql
Overview:
I'm encountering an issue when attempting to insert records into a Databricks database using SQLAlchemy. The error suggests that the id column is not specified in the INSERT statement, leading to a ServerOperationError.
For what it's worth, this works perfectly fine when inserting into a PostgreSQL database.
Steps to Reproduce:
Expected Behavior:
I expect the records to be inserted successfully into the Databricks database, with the auto-incrementing id column being generated by the database.
Environment:
Python Version: 3.11.4
Databricks-sql-python: 3.0.1
I have verified that a similar approach works for a PostgreSQL database but fails in Databricks.
The issue seems to be related to the auto-incrementing primary key behavior.
Code Snippet:
python
Note:
I have also reached out to the Databricks community for assistance.
Thank you,
Brent
The text was updated successfully, but these errors were encountered: