schema doesn't change from default #2

jonathan-dufault-kr · 2022-01-26T19:15:12Z

sorry if this is a really dumb/misguided question. if it is, I am hoping you could point me to a place where I can learn more.

I set up an engine like

engine = create_engine( f"databricks+connector://token:{token}@{host}:443/{my_schema}", connect_args={"http_path": http_path} )

but engine.table_names() and anything I try to do with that engine have the default schema tables.

I have to workaround by doing a schema translation, but that can't be the right way to do this, or is it?

engine1 = engine.execution_options(schema_translate_map={ "default": my_schema })

edit: whoops put some private data in there

The text was updated successfully, but these errors were encountered:

jonathan-dufault-kr · 2022-01-26T20:24:04Z

nope the workaround doesn't work. this is just broken and I have no idea how to fix it

crflynn · 2022-01-26T21:52:17Z

Databricks connection urls are a bit different from other database urls but what we do here is for adherence to the sqlalchemy spec. Maybe the implementation is non-intuitive or confusing (or even wrong) but the value for the <database> is ignored by the connector, which only cares about the http_path and the token.

Generally if you want a specific schema (databricks calls them databases) you have to pass it to methods like this engine.table_names(schema="myschema")

crflynn · 2022-01-26T22:19:36Z

We might actually want to add a USE SCHEMA <schema> call here on creating the connection. It might be that the sql connector doesn't do this, but the connector is not open source so I can't request a change there. I will take a deeper look.

crflynn · 2022-01-26T22:39:40Z

Can you try doing something like

engine.execute("use myschema")

as a workaround?

jonathan-dufault-kr · 2022-01-27T13:48:53Z

I'll try that. In the meantime this is what I've done that's been working...in the MetaData object I can specify a schema and then use that to tell sqlalchemy to use the engine but override the schema.

I reopened this if that's alright. It keeps it on my radar.

That's how I'm understanding it. This aspect of sqlalchemy I'm learning just now. Before I've done creating an engine and then purely doing pd.read_sql(text(sql_query_text),engine), and I was using an odbc connection. With switching to databricks I'm trying to learn the sqlalchemy/pythonic way of doing things, so you're getting a double-whammy.

from sqlalchemy import MetaData
from sqlalchemy import Table
from sqlalchemy.orm import Session
from sqlalchemy.ext.automap import automap_base


# generate metadata and be able to tell which schema to use
meta=MetaData(bind=engine,schema="analytics_layer")

# below is just because none of the tables in this schema have primary keys I live in hell 
# rough and ready to get it to work. I think I can make a composite key
table_of_interest = Table("table_in_database", meta,Column('fake_primary_key_column', String, primary_key=True),extend_existing=True, autoload=True,autoload_with=engine)

# I can create a base with the metadata object
Base=automap_base(metadata=meta)


# reflecting the tables from that metdata object. specifying the reflection options
# because this layer has hundreds of tables
Base.prepare(engine, reflect=True,, reflection_options = {"only": ["table_in_database"]})
Table_Of_Interest= Base.classes.table_in_database

# Since Base is made with the schema I believe if I were to create tables on it then it would go in the right schema.
# I am doing read only so I haven't really tested that. I believe creating tables in this layer
# would get me promoted to customer.

This is an even quicker/dirtier method that I'm actually using since the process I'm doing right now doesn't require anything too complicated by way of the ORM

# generate metadata and be able to tell which schema to use
meta=MetaData(bind=engine,schema="analytics_layer")

Table_Of_Interest = Table("table_in_database",meta,autoload=True,autoload_with=engine)
session = Session(bind=engine)

query = session.query(Table_Of_Interest).distinct().limit(10)

edit: fix code blocks
edits 2 and 3: still fixing code blocks

crflynn · 2022-01-27T15:14:15Z

When I used pyhive underneath this would have worked fine because pyhive calls use schema on connection here: https://github.com/dropbox/PyHive/blob/d199a1bd55c656b5c28d0d62f2d3f2e6c9a82a54/pyhive/hive.py#L256

Apparently the databricks sql connector doesnt do this but since the code isn't public I can't request a change. I did ask if they have plans on making the code public.

susodapop · 2022-03-16T17:03:52Z

The code for databricks-sql-connector isn't hosted on a public Git repository. But it's still Apache licensed (open source) and accessible because it's hosted on pypi. If you pip install databricks-sql-connector you can see all of its source code in your site-packages directory.

[edit] Also the code includes a HISTORY file. You can download it the .tar.gz file from here, unzip.

[edit2] Just for giggles I created https://github.com/susodapop/databricks-sql-connector-mirror which mirrors the codebase so you can examine it in github.

[edit3] I set up the versions in github so you can also observe the change from one version to the next

crflynn · 2022-03-16T17:36:47Z

Thanks for mirroring it. I didn't have any trouble seeing the code, I just wasn't able to PR an upstream change since it's not maintained publicly. I reached out to DB though and they said it will be made public "eventually" so we'll see.

crflynn · 2022-03-16T17:42:51Z

We might be able to override this method though, which I just found: https://github.com/zzzeek/sqlalchemy/blob/fc5c54fcd4d868c2a4c7ac19668d72f506fe821e/lib/sqlalchemy/engine/interfaces.py#L533

or using the on_connect callback: https://docs.sqlalchemy.org/en/14/core/internals.html#sqlalchemy.engine.Dialect.on_connect

crflynn · 2022-03-16T22:23:46Z

Also from history:

v2.0.0b - February 8, 2022
- Initial beta release of V2. V2 is an internal re-write of large parts of the connector to use Databricks edge features. All public APIs from V1 remain.
- Added Unity Catalog support (pass catalog and / or  schema key word args to the .connect method to select initial schema and catalog)

so I assume that when 2.0 is released we can pass the catalog+schema and it will set those on creation.

susodapop · 2022-03-17T16:28:58Z

Yes I believe so. I'm developing the same behaviour in an unrelated context using the v2 beta and it works like you describe. Prior to the v2 release of databricks-sql-connector I've been automatically pre-pending USE f{schema} to every call, which is inelegant but has done the trick.

ebarault · 2022-04-01T08:50:54Z

just figured this before finding this thread.
I'm also doing USE f{schema}.

And for Unity Catalog, i'm doing USE CATALOG xxx.

note: if you're not using plain sql, but using sqlAlchemy helpers, you can also pass the schema name as a named argument. (EDIT: Which @crflynn already referred to)

ebarault · 2022-04-07T08:24:53Z

@susodapop @crflynn hi,
pip install python-sql-connector==2.0.0b1

in this version:

honor the value passed in for database/schema: 👍
honor the new 3L namespace used in Unity Catalog 👍
allow to specify the catalog 👎

According to Databricks, they are planning to opensource the python-sql-connector lib

ebarault · 2022-04-27T12:29:01Z

@susodapop @crflynn I don't know if you noticed, but python-sql-connector 2.0.1 is out

jonathan-dufault-kr closed this as completed Jan 26, 2022

jonathan-dufault-kr reopened this Jan 27, 2022

crflynn mentioned this issue May 6, 2022

upgrade to connector 2.0.x #4

Merged

crflynn closed this as completed in #4 May 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schema doesn't change from default #2

schema doesn't change from default #2

jonathan-dufault-kr commented Jan 26, 2022 •

edited

jonathan-dufault-kr commented Jan 26, 2022

crflynn commented Jan 26, 2022 •

edited

crflynn commented Jan 26, 2022

crflynn commented Jan 26, 2022

jonathan-dufault-kr commented Jan 27, 2022 •

edited

crflynn commented Jan 27, 2022

susodapop commented Mar 16, 2022 •

edited

crflynn commented Mar 16, 2022

crflynn commented Mar 16, 2022 •

edited

crflynn commented Mar 16, 2022

susodapop commented Mar 17, 2022 •

edited

ebarault commented Apr 1, 2022 •

edited

ebarault commented Apr 7, 2022 •

edited

ebarault commented Apr 27, 2022 •

edited

schema doesn't change from default #2

schema doesn't change from default #2

Comments

jonathan-dufault-kr commented Jan 26, 2022 • edited

jonathan-dufault-kr commented Jan 26, 2022

crflynn commented Jan 26, 2022 • edited

crflynn commented Jan 26, 2022

crflynn commented Jan 26, 2022

jonathan-dufault-kr commented Jan 27, 2022 • edited

crflynn commented Jan 27, 2022

susodapop commented Mar 16, 2022 • edited

crflynn commented Mar 16, 2022

crflynn commented Mar 16, 2022 • edited

crflynn commented Mar 16, 2022

susodapop commented Mar 17, 2022 • edited

ebarault commented Apr 1, 2022 • edited

ebarault commented Apr 7, 2022 • edited

ebarault commented Apr 27, 2022 • edited

jonathan-dufault-kr commented Jan 26, 2022 •

edited

crflynn commented Jan 26, 2022 •

edited

jonathan-dufault-kr commented Jan 27, 2022 •

edited

susodapop commented Mar 16, 2022 •

edited

crflynn commented Mar 16, 2022 •

edited

susodapop commented Mar 17, 2022 •

edited

ebarault commented Apr 1, 2022 •

edited

ebarault commented Apr 7, 2022 •

edited

ebarault commented Apr 27, 2022 •

edited