New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python model #5421
python model #5421
Conversation
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Feature/python model v1 incremental
{%- set res, table = adapter.execute(compiled_code, auto_begin=auto_begin, fetch=fetch_result) -%} | ||
{%- elif language == 'python' -%} | ||
{%- set res = adapter.submit_python_job(model, compiled_code) -%} | ||
{#-- TODO: What should table be for python models? --#} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Doesn't a Python job also generate a table on the data warehouse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! This table
refers to metadata returned when the query completes, which dbt uses to create its "result" object. Depending on how we submit and return the query, that table can include information like status code and number of rows created/updated. It's orthogonal to the actual table (relational object) produced by the query in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Makes sense. Thanks for the explanation. Since it's a submitted job I can see why there's no number of rows/status information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still working on that part, the goal, in the end, is having all those information still gathered the same as SQL models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chamini2 I tool a swing at the 3 adapters we added python support for and realized that we don't have much info to give for now other than status. Snowflake will only return row affected as 1 for stored procedure, for databricks and dataproc on gcp there seems to be no such info right now. We will probably circle back later on to see if things changes
database = '{{ this.database }}' | ||
schema = '{{ this.schema }}' | ||
identifier = '{{ this.identifier }}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check the this.database
, etc. are None
or not and set None
if it's None
instead of always setting strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can probably do something like that, this, for now, is only going to affect folks using dbt.this.database
in python script so we will probably leave it at the end to polish
This reverts commit faaa5e7.
* validate python model args * add file_name to parser for better error message * more info for easier searching during syntax error
@@ -1157,7 +1157,7 @@ def __init__(self, macros): | |||
|
|||
|
|||
@dataclass | |||
@schema_version("manifest", 6) | |||
@schema_version("manifest", 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All manifest versions here need to be double check/updated before merging in
) | ||
|
||
if get_manifest_schema_version(data) <= 6: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic to load existing manifest pre update, double check version before merge/release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments addressed, new additions look good 👍
* Python model beta version with update to manifest that renames `raw_sql` and `compiled_sql` to `raw_code` and `compiled_code` Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com> Co-authored-by: Ian Knox <ian.knox@dbtlabs.com> Co-authored-by: Stu Kilgore <stuart.kilgore@gmail.com>
This is the initial change for #5261
This PR would enable ref, source, and config for the python model
We added checks for python model that enforce definition of a model, args to pass in to the model, restriction on only return one dataframe.
Also super happy to hear any feedback and what you think we missed.
TODOs before this merge to main: