Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python model #5421

Merged
merged 34 commits into from Jul 28, 2022
Merged

python model #5421

merged 34 commits into from Jul 28, 2022

Conversation

ChenyuLInx
Copy link
Contributor

@ChenyuLInx ChenyuLInx commented Jun 29, 2022

This is the initial change for #5261

This PR would enable ref, source, and config for the python model
We added checks for python model that enforce definition of a model, args to pass in to the model, restriction on only return one dataframe.

Also super happy to hear any feedback and what you think we missed.

TODOs before this merge to main:

@cla-bot cla-bot bot added the cla:yes label Jun 29, 2022
{%- set res, table = adapter.execute(compiled_code, auto_begin=auto_begin, fetch=fetch_result) -%}
{%- elif language == 'python' -%}
{%- set res = adapter.submit_python_job(model, compiled_code) -%}
{#-- TODO: What should table be for python models? --#}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Doesn't a Python job also generate a table on the data warehouse?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This table refers to metadata returned when the query completes, which dbt uses to create its "result" object. Depending on how we submit and return the query, that table can include information like status code and number of rows created/updated. It's orthogonal to the actual table (relational object) produced by the query in the database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Makes sense. Thanks for the explanation. Since it's a submitted job I can see why there's no number of rows/status information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still working on that part, the goal, in the end, is having all those information still gathered the same as SQL models

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chamini2 I tool a swing at the 3 adapters we added python support for and realized that we don't have much info to give for now other than status. Snowflake will only return row affected as 1 for stored procedure, for databricks and dataproc on gcp there seems to be no such info right now. We will probably circle back later on to see if things changes

Comment on lines +64 to +66
database = '{{ this.database }}'
schema = '{{ this.schema }}'
identifier = '{{ this.identifier }}'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check the this.database, etc. are None or not and set None if it's None instead of always setting strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can probably do something like that, this, for now, is only going to affect folks using dbt.this.database in python script so we will probably leave it at the end to polish

@ChenyuLInx ChenyuLInx marked this pull request as ready for review July 24, 2022 18:37
@ChenyuLInx ChenyuLInx requested a review from a team July 24, 2022 18:37
stu-k and others added 2 commits July 25, 2022 14:58
We also moved language as a new required attribute for node. And added support to load existing manifest
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
@@ -1157,7 +1157,7 @@ def __init__(self, macros):


@dataclass
@schema_version("manifest", 6)
@schema_version("manifest", 7)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All manifest versions here need to be double check/updated before merging in

)

if get_manifest_schema_version(data) <= 6:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic to load existing manifest pre update, double check version before merge/release

Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good!

core/dbt/main.py Outdated Show resolved Hide resolved
core/dbt/include/global_project/macros/etc/statement.sql Outdated Show resolved Hide resolved
@ChenyuLInx ChenyuLInx requested a review from gshank July 27, 2022 19:46
Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Contributor

@stu-k stu-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments addressed, new additions look good 👍

@ChenyuLInx ChenyuLInx merged commit a7ff003 into main Jul 28, 2022
@ChenyuLInx ChenyuLInx deleted the feature/python-model-v1 branch July 28, 2022 18:43
@ChenyuLInx ChenyuLInx changed the title Draft for python model python model Jul 28, 2022
agoblet pushed a commit to BigDataRepublic/dbt-core that referenced this pull request Sep 16, 2022
* Python model beta version with update to manifest that renames `raw_sql` and `compiled_sql` to `raw_code` and `compiled_code`
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Ian Knox <ian.knox@dbtlabs.com>
Co-authored-by: Stu Kilgore <stuart.kilgore@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants