schema doesn't change from default #2
Comments
nope the workaround doesn't work. this is just broken and I have no idea how to fix it |
Databricks connection urls are a bit different from other database urls but what we do here is for adherence to the sqlalchemy spec. Maybe the implementation is non-intuitive or confusing (or even wrong) but the value for the Generally if you want a specific schema (databricks calls them databases) you have to pass it to methods like this |
We might actually want to add a |
Can you try doing something like
as a workaround? |
I'll try that. In the meantime this is what I've done that's been working...in the MetaData object I can specify a schema and then use that to tell sqlalchemy to use the engine but override the schema. I reopened this if that's alright. It keeps it on my radar. That's how I'm understanding it. This aspect of sqlalchemy I'm learning just now. Before I've done creating an engine and then purely doing from sqlalchemy import MetaData
from sqlalchemy import Table
from sqlalchemy.orm import Session
from sqlalchemy.ext.automap import automap_base
# generate metadata and be able to tell which schema to use
meta=MetaData(bind=engine,schema="analytics_layer")
# below is just because none of the tables in this schema have primary keys I live in hell
# rough and ready to get it to work. I think I can make a composite key
table_of_interest = Table("table_in_database", meta,Column('fake_primary_key_column', String, primary_key=True),extend_existing=True, autoload=True,autoload_with=engine)
# I can create a base with the metadata object
Base=automap_base(metadata=meta)
# reflecting the tables from that metdata object. specifying the reflection options
# because this layer has hundreds of tables
Base.prepare(engine, reflect=True,, reflection_options = {"only": ["table_in_database"]})
Table_Of_Interest= Base.classes.table_in_database
# Since Base is made with the schema I believe if I were to create tables on it then it would go in the right schema.
# I am doing read only so I haven't really tested that. I believe creating tables in this layer
# would get me promoted to customer. This is an even quicker/dirtier method that I'm actually using since the process I'm doing right now doesn't require anything too complicated by way of the ORM # generate metadata and be able to tell which schema to use
meta=MetaData(bind=engine,schema="analytics_layer")
Table_Of_Interest = Table("table_in_database",meta,autoload=True,autoload_with=engine)
session = Session(bind=engine)
query = session.query(Table_Of_Interest).distinct().limit(10) edit: fix code blocks |
When I used pyhive underneath this would have worked fine because pyhive calls Apparently the databricks sql connector doesnt do this but since the code isn't public I can't request a change. I did ask if they have plans on making the code public. |
The code for [edit] Also the code includes a [edit2] Just for giggles I created https://github.com/susodapop/databricks-sql-connector-mirror which mirrors the codebase so you can examine it in github. [edit3] I set up the versions in github so you can also observe the change from one version to the next |
Thanks for mirroring it. I didn't have any trouble seeing the code, I just wasn't able to PR an upstream change since it's not maintained publicly. I reached out to DB though and they said it will be made public "eventually" so we'll see. |
We might be able to override this method though, which I just found: https://github.com/zzzeek/sqlalchemy/blob/fc5c54fcd4d868c2a4c7ac19668d72f506fe821e/lib/sqlalchemy/engine/interfaces.py#L533 or using the |
Also from history:
so I assume that when 2.0 is released we can pass the catalog+schema and it will set those on creation. |
Yes I believe so. I'm developing the same behaviour in an unrelated context using the v2 beta and it works like you describe. Prior to the v2 release of |
just figured this before finding this thread. And for Unity Catalog, i'm doing note: if you're not using plain sql, but using sqlAlchemy helpers, you can also pass the schema name as a named argument. (EDIT: Which @crflynn already referred to) |
@susodapop @crflynn hi, in this version:
According to Databricks, they are planning to opensource the python-sql-connector lib |
@susodapop @crflynn I don't know if you noticed, but python-sql-connector 2.0.1 is out |
sorry if this is a really dumb/misguided question. if it is, I am hoping you could point me to a place where I can learn more.
I set up an engine like
engine = create_engine( f"databricks+connector://token:{token}@{host}:443/{my_schema}", connect_args={"http_path": http_path} )
but
engine.table_names()
and anything I try to do with that engine have the default schema tables.I have to workaround by doing a schema translation, but that can't be the right way to do this, or is it?
engine1 = engine.execution_options(schema_translate_map={ "default": my_schema })
edit: whoops put some private data in there
The text was updated successfully, but these errors were encountered: