Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating DuckDB #162

Closed
RAbraham opened this issue Aug 7, 2021 · 5 comments
Closed

Integrating DuckDB #162

RAbraham opened this issue Aug 7, 2021 · 5 comments

Comments

@RAbraham
Copy link
Contributor

RAbraham commented Aug 7, 2021

Hi,
I'm interested in using DuckDB(https://duckdb.org/) as a data source for Logica and I was wondering what would be the steps to integrate it. It's like SQLite for analytics and is a subset of the PostgreSQL syntax.

We could use sqlalchemy but 1) I don't know how to proceed after and 2) I think the ORM design of sqlalchemy may make it slow if we are returning a lot of data(just an hypothesis based on duckdb/duckdb#305)

So the question is two fold.

  1. If we use the sqlalchemy interface, the setup code would be something like this.
# Install Logica.
!pip install logica
!pip install duckdb
!pip install duckdb-engine



# Connect to the database.
from logica import colab_logica
from sqlalchemy import create_engine

engine = create_engine('duckdb:///:memory:');
connection = engine.connect();
colab_logica.SetDbConnection(connection)

but how do I use it after that?

  1. If I don't want to use the sqlachemy interface, how would I go about doing that? The general usage is:
import duckdb
con = duckdb.connect(database=':memory:', read_only=False) #can also use a file like sqlite
# create a table
con.execute("CREATE TABLE items(item VARCHAR, value DECIMAL(10,2), count INTEGER)")
# insert two items into the table
con.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")

# retrieve the items again
con.execute("SELECT * FROM items")
print(con.fetchall())
# [('jeans', 20.0, 1), ('hammer', 42.2, 2)]
@EvgSkv
Copy link
Owner

EvgSkv commented Aug 9, 2021

For now all of the engines are hard-coded, so do it properly we'll need to change Logica's code. It shouldn't be too hard, as code is reasonably modularized.

We'd need to add dialect class for duckdb here:
https://github.com/EvgSkv/logica/blob/main/compiler/dialects.py

Add a library here:
https://github.com/EvgSkv/logica/tree/main/compiler/dialect_libraries

And to make colab call it properly handle it here:
https://github.com/EvgSkv/logica/blob/main/colab_logica.py#L139

If you want to try doing it in your colab you can probably just use Python freedom to assign appropriate classes/dictionary members.

You can also run PostgreSQL-dialect queries on it. E.g. in my colab the following cell runs after you set connection to DuckDB with SQLAlchemy:

%%logica Q

@Engine("psql");

Q(x, x * x) :- x == 1 | x == 2 | x == 3;

However more complex quries fail, as it appears DuckDB uses it's own syntax for arrays and structs.

Let me know if you have further questions, or if you're interested contributing DuckDB support to Logica! :-)

@RAbraham
Copy link
Contributor Author

RAbraham commented Aug 9, 2021

Thanks for that info :). I had some follow up questions.

  • Can I avoid using sqlalchemy and directly interface logica with duckdb? I think there are some performance constraints if I use sqlalchemy when a large number of rows are returned.
  • As complex queries fail as DuckDB uses it's own syntax for arrays and structs, is this something I can fix and adapt in the dialects.py or some other file or will it be a hard limitation for duckdb due to logica's design?

@EvgSkv
Copy link
Owner

EvgSkv commented Aug 12, 2021

Can I avoid using sqlalchemy and directly interface logica with duckdb? I think there are some performance constraints if I use sqlalchemy when a large number of rows are returned.

Yes, we can use any Python API.

As complex queries fail as DuckDB uses it's own syntax for arrays and structs, is this something I can fix and adapt in the dialects.py or some other file or will it be a hard limitation for duckdb due to logica's design?

I believe it can all be solved in dialects.py. Maybe we will need to factor out more flexibility into dialects.py, I'll be happy to help if it comes to that.

@RAbraham
Copy link
Contributor Author

that's great :). I'll come back to this when I get some bandwidth but we can close this for now.

@EvgSkv
Copy link
Owner

EvgSkv commented Aug 13, 2021

Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants