Skip to content

A minimally working version of dbt-duckdb that uses sqlglot for transpiling #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jwills
Copy link
Collaborator

@jwills jwills commented Oct 13, 2022

This allows a user to use dbt-duckdb with a dbt project that was written for another database; there is more testing I need to do to verify that this works (e.g. I don't think it handles ephemeral and/or incremental materializations correctly yet.)

… to transpile SQL from another database to DuckDB
@Bl3f
Copy link

Bl3f commented May 11, 2023

Hey @jwills, do you still think this is a good idea to introduce this in the adapter? I've forked the code and tried it and it works (there are just a few changes). I'd love to contribute by doing a PR but to be honest right now there isn't a lot of additional stuff so I don't want to create something useless.

Tho, I still have to try it with incremental/ephemeral.

@jwills
Copy link
Collaborator Author

jwills commented May 11, 2023

Hey @Bl3f , thanks for the ping! I am still deeply curious about the pros/cons of doing this kind of thing-- in particular, do we have a good mechanism to detect when a sqlglot translation doesn't actually work and are able to provide some details on how to go about fixing it via e.g. a cross-platform macro?

Like I am open to it as a really compelling mechanism for doing e.g. fast unit and integration tests of dbt projects w/o needing to invoke the full warehouse, but I was a little wary about providing a sub-par developer experience that would lead to a lot of support requests/bugs for me (as a solo and otherwise unemployed open-source developer) to triage and fix.

@jwills
Copy link
Collaborator Author

jwills commented May 15, 2023

hrm, I'm wondering if the advent of https://github.com/duckdb/duckdb/pull/7171/files in the next DuckDB release means we can go for it here 🤔

@Bl3f
Copy link

Bl3f commented May 20, 2023

Hey @jwills, sorry I've been late to come back to this 🙈.

I agree with you regarding the concerns about implementing the translation like this in the adapter. This is really deep in the code and the if else depending on the emulate param is critical. Which could led to user having the impression it's the responsibility of the adapter to correctly translate everything.

On the other side if integrated like this as a user it looks magical, to be able with a single parameter to be able to run a BigQuery / Snowflake project everywhere. On my side I'm working also on some kind of larger standard to make it real because if we are honest in order to make it work correctly you need to have all your dbt sources in the DuckDB database which in the end implies more than just a parameter.

Regarding sqlglot translation it can lead to 2 kind of errors I guess:

  • Errors because sqlglot can't translate
  • Errors because the sqlglot translation did something unexpected

Regarding the scalar UDF I'd say that this is a good news, but I think that in the end apart from give the possibility to user to add UDFs to fix on-the-fly issue I think as you said you don't want to open the pandora box and never take responsibility for it.

While I write the solution might be in the configuration not to ask for a dialect but to ask for a Python function to run as a translation and by default people will use sqlglot. This way it makes clear to everyone that this is not the adapter responsibility?

@tekumara
Copy link

tekumara commented Jun 4, 2023

FYI I've been chipping away at converting snowflake sql to duckdb over here using sqlglot. See also the tests. I've found a bunch of edge cases so far, and there's probably more I'm yet to find.

@dioptre
Copy link

dioptre commented Jun 14, 2023

Can we merge this? Seems that if "emulate" is not set it seems pretty innocuous?

@jwills
Copy link
Collaborator Author

jwills commented Jun 15, 2023

Hey @dioptre I have an updated version of this work in this branch: https://github.com/jwills/dbt-duckdb/tree/jwills_snowflake_compat

I've been hesitant to merge ^^ as I'm unsure how well it would work in the general case and I don't have a plan (yet) for how I would allow someone to work around bugs/limitations by overriding e.g. the dialect that sqlglot uses for the read-side/write-side.

@davidgasquez
Copy link

What is the latest state of this? Just out of curiosity. Was looking if someone had tried doing this and here I am! 😅

@jwills
Copy link
Collaborator Author

jwills commented Feb 23, 2024

This is the best effort i am aware of to do this seriously (i.e., create a Snowflake emulator on DuckDB using sqlglot + a bunch of other stuff): https://github.com/tekumara/fakesnow

I am periodically tempted to work on this stuff again, though I go back and forth on whether sqlglot would actually help in solving the problem: e.g., I have gotten https://github.com/dpguthrie/snowflake-dbt-demo-project to run against dbt-duckdb without using sqlglot at all, instead defining the missing data types and SQL functions that Snowflake's SQL dialect uses (and that sqlglot doesn't support translating AFAICT.)

The rub is that doing this kind of emulation well would involve a level of effort and a support burden that I can't realistically support as a solo developer; it's also not the sort of thing that I would ever find "fun" to do (like, I'm not sure there is an amount of money you could pay me to get me to work on it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants