-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
draft: allow users to augment their jinja context with python objects #5274
Conversation
This is a bit of a proposal and proof of concept at the same time. It gives a handle to people over their jinja context, allowing them to inject python objects in there under a `extra_jinja_context` namespace. Writing Jinja is ok when you're writing a lot of SQL with a bit of logic in it, but highly suboptimal when writing complex logic with a bit of string outputs. Clearly python is superior to Jinja in many ways: * access to the full python standard lib * access to external/powerful libs * Turing complete, object oriented * testable using sane mechanisms Use cases are limitless. Bind, hook, trigger, generate SQL. In order of legitimacy: * string processing, what you'd typically use Jinja macros for, but maybe you prefer python functions over jinja macros. Our main use case is creating a `generate_incremental_load_date_bounds()` where we'll look at `vars` to offer different loading modes (catchup, date range, date list, offsets, ...) * custom logging similar to [this](https://github.com/dbt-labs/dbt-event-logging) but much more flexible, we happen to use BigQuery and this particular approach doesn't work for BigQuery * custom logic for hooks * trigger external things (webhooks!) * ... * I'm introducing `dbt_config.py`, a new place where python logic can be injected into dbt-core, it can be powerful, but can lead to environment issues / complexity. People need to not abuse this file. No `import pandas as pd` in there please! Where should this live? `~/.dbt/`? * Seems like it'd be great to do this at the project level too, so a project can be packaged with the extra jinja context it needs to operate. `macros/my_macros.py` anyone!? Conceptually that makes a dbt project also a python app in some ways and that may not be ideal. * interoperability: people sharing projects may need to also share dbt_config.py and put it in their pythonpath. It's not that outrageous but raises the complexity of the project / env setup. This kind of complexity already exist with stuff like `profiles.yml`
|
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin. CLA has not been signed by users: @mistercrunch |
|
@mistercrunch Thanks for the PR, and for the thoughtful accompanying writeup! I like this idea better than giving people the ability to write custom Jinja filters, or hooking into other Jinja-specific "extensions." There's a very old issue proposing just that: #480. I find myself agreeing with the two most-recent comments there: This feels like a case for plugins! I'll leave some more detailed thoughts below. I'm indebted to @jwills @gshank @nathaniel-may for talking through this with me yesterday. Maybe this PR actually wants to be a discussion? Why not?
This is the name of the game. The limitation of Jinja is one of the best things about it :) At definite risk of mixing my mythological metaphors, custom Python code within the dbt project environment opens a Pandora's box of anti-pattern practices, and books us a one-way passage over the river [One other quick note about Python models: It will of course be possible to write and use So what then?Here's my take, with which you're more than welcome to disagree: I think what you're describing is actually a dbt plugin, written in Python, which has the ability to register and expose some of its methods in the user's Jinja context. That plugin would be installed alongside We have a pattern for that today, in the form of adapter plugins. We've understood for a long time that the work of translating between dbt <> Another Analytical Database is a much bigger lift than translating business/transformation logic into Jinja-SQL. Adapter authors need the ability to write and test their functionality in a real programming language, and in many cases to access data platform APIs that simply aren't exposed via SQL. (BigQuery is the most prominent example.) It's also a recognition that the "adapter maintainer" persona is a very different one from "dbt user" / project code writer. There are many fewer adapter maintainers; we (as Concretely, we allow adapter maintainers to write whatever Python code they need, and to register some of them as class methods and members on the How could this be better?Down the line, I could see a generalized pattern for supporting dbt plugins that don't actually want to be database adapters. They could register custom namespaces, instead of over-loading the In the meantime: What do you think of forking of |
|
I'm going to close this pr since we're not going to merge these changes. Please continue the conversation if you're so inclined. |
|
Hey @mistercrunch - did you ever continue with something like this? I just proposed a similar thing (#8000) and agree that it would be very useful to have this sort of flexibility within dbt. |
This is a bit of a proposal and proof of concept at the same time.
It gives a handle to people over their jinja context, allowing them
to inject python objects in there under a
extra_jinja_contextnamespace.
how?
Assuming a file
~/.dbt/dbt_config.pyassuming a model
test.sqlwhy?
Writing Jinja is ok when you're writing a lot of SQL with a bit of logic
in it, but highly suboptimal when writing complex logic with a bit of
string outputs. Clearly python is superior to Jinja in many ways:
use cases
Use cases are limitless. Bind, hook, trigger, generate SQL.
In order of legitimacy:
maybe you prefer python functions over jinja macros. Our main use
case is creating a
generate_incremental_load_date_bounds()wherewe'll look at
varsto offer different loading modes (catchup, daterange, date list, offsets, ...)
this but much more
flexible, we happen to use BigQuery and this particular approach
doesn't work for BigQuery
need more thinking / conversation
dbt_config.py, a new place where python logic can beinjected into dbt-core, it can be powerful, but can lead to
environment issues / complexity. People need to not abuse this file.
No
import pandas as pdin there please! Where should this live?~/.dbt/?project can be packaged with the extra jinja context it needs to
operate.
macros/my_macros.pyanyone!? Conceptually that makes a dbtproject also a python app in some ways and that may not be ideal.
dbt_config.py and put it in their pythonpath. It's not that outrageous
but raises the complexity of the project / env setup. This kind of
complexity already exist with stuff like
profiles.ymlTODO