-
Notifications
You must be signed in to change notification settings - Fork 102
A minimally working version of dbt-duckdb that uses sqlglot for transpiling #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… to transpile SQL from another database to DuckDB
Hey @jwills, do you still think this is a good idea to introduce this in the adapter? I've forked the code and tried it and it works (there are just a few changes). I'd love to contribute by doing a PR but to be honest right now there isn't a lot of additional stuff so I don't want to create something useless. Tho, I still have to try it with incremental/ephemeral. |
Hey @Bl3f , thanks for the ping! I am still deeply curious about the pros/cons of doing this kind of thing-- in particular, do we have a good mechanism to detect when a sqlglot translation doesn't actually work and are able to provide some details on how to go about fixing it via e.g. a cross-platform macro? Like I am open to it as a really compelling mechanism for doing e.g. fast unit and integration tests of dbt projects w/o needing to invoke the full warehouse, but I was a little wary about providing a sub-par developer experience that would lead to a lot of support requests/bugs for me (as a solo and otherwise unemployed open-source developer) to triage and fix. |
hrm, I'm wondering if the advent of https://github.com/duckdb/duckdb/pull/7171/files in the next DuckDB release means we can go for it here 🤔 |
Hey @jwills, sorry I've been late to come back to this 🙈. I agree with you regarding the concerns about implementing the translation like this in the adapter. This is really deep in the code and the if else depending on the emulate param is critical. Which could led to user having the impression it's the responsibility of the adapter to correctly translate everything. On the other side if integrated like this as a user it looks magical, to be able with a single parameter to be able to run a BigQuery / Snowflake project everywhere. On my side I'm working also on some kind of larger standard to make it real because if we are honest in order to make it work correctly you need to have all your dbt sources in the DuckDB database which in the end implies more than just a parameter. Regarding sqlglot translation it can lead to 2 kind of errors I guess:
Regarding the scalar UDF I'd say that this is a good news, but I think that in the end apart from give the possibility to user to add UDFs to fix on-the-fly issue I think as you said you don't want to open the pandora box and never take responsibility for it. While I write the solution might be in the configuration not to ask for a dialect but to ask for a Python function to run as a translation and by default people will use sqlglot. This way it makes clear to everyone that this is not the adapter responsibility? |
Can we merge this? Seems that if "emulate" is not set it seems pretty innocuous? |
Hey @dioptre I have an updated version of this work in this branch: https://github.com/jwills/dbt-duckdb/tree/jwills_snowflake_compat I've been hesitant to merge ^^ as I'm unsure how well it would work in the general case and I don't have a plan (yet) for how I would allow someone to work around bugs/limitations by overriding e.g. the dialect that sqlglot uses for the read-side/write-side. |
What is the latest state of this? Just out of curiosity. Was looking if someone had tried doing this and here I am! 😅 |
This is the best effort i am aware of to do this seriously (i.e., create a Snowflake emulator on DuckDB using sqlglot + a bunch of other stuff): https://github.com/tekumara/fakesnow I am periodically tempted to work on this stuff again, though I go back and forth on whether sqlglot would actually help in solving the problem: e.g., I have gotten https://github.com/dpguthrie/snowflake-dbt-demo-project to run against dbt-duckdb without using sqlglot at all, instead defining the missing data types and SQL functions that Snowflake's SQL dialect uses (and that sqlglot doesn't support translating AFAICT.) The rub is that doing this kind of emulation well would involve a level of effort and a support burden that I can't realistically support as a solo developer; it's also not the sort of thing that I would ever find "fun" to do (like, I'm not sure there is an amount of money you could pay me to get me to work on it.) |
This allows a user to use dbt-duckdb with a dbt project that was written for another database; there is more testing I need to do to verify that this works (e.g. I don't think it handles ephemeral and/or incremental materializations correctly yet.)