python model #5421

ChenyuLInx · 2022-06-29T04:31:24Z

This is the initial change for #5261

This PR would enable ref, source, and config for the python model
We added checks for python model that enforce definition of a model, args to pass in to the model, restriction on only return one dataframe.

Also super happy to hear any feedback and what you think we missed.

TODOs before this merge to main:

A decorator for log code execution #5510
validate python model args #5511
add model_language to run #5512
We have extra traceback.print_exc() added recently, do we want to make a flag for it? Gerda fixed it in Fix handling of top-level exceptions #5560
add check for materialization: this can be done after beta release, doing it in [CT-968] [Feature] Disallow “view” materialization if Python #5569

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>

Feature/python model v1 incremental

chamini2 · 2022-06-29T15:50:47Z

core/dbt/include/global_project/macros/etc/statement.sql

+      {%- set res, table = adapter.execute(compiled_code, auto_begin=auto_begin, fetch=fetch_result) -%}
+    {%- elif language == 'python' -%}
+      {%- set res = adapter.submit_python_job(model, compiled_code) -%}
+      {#-- TODO: What should table be for python models? --#}


Question: Doesn't a Python job also generate a table on the data warehouse?

Yes! This table refers to metadata returned when the query completes, which dbt uses to create its "result" object. Depending on how we submit and return the query, that table can include information like status code and number of rows created/updated. It's orthogonal to the actual table (relational object) produced by the query in the database.

Got it! Makes sense. Thanks for the explanation. Since it's a submitted job I can see why there's no number of rows/status information.

We are still working on that part, the goal, in the end, is having all those information still gathered the same as SQL models

@chamini2 I tool a swing at the 3 adapters we added python support for and realized that we don't have much info to give for now other than status. Snowflake will only return row affected as 1 for stored procedure, for databricks and dataproc on gcp there seems to be no such info right now. We will probably circle back later on to see if things changes

ueshin · 2022-07-06T22:47:22Z

core/dbt/include/global_project/macros/python_model/python.sql

+    database = '{{ this.database }}'
+    schema = '{{ this.schema }}'
+    identifier = '{{ this.identifier }}'


Can we check the this.database, etc. are None or not and set None if it's None instead of always setting strings?

I think we can probably do something like that, this, for now, is only going to affect folks using dbt.this.database in python script so we will probably leave it at the end to polish

This reverts commit faaa5e7.

* validate python model args * add file_name to parser for better error message * more info for easier searching during syntax error

We also moved language as a new required attribute for node. And added support to load existing manifest Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>

ChenyuLInx · 2022-07-26T00:13:57Z

core/dbt/contracts/graph/manifest.py

@@ -1157,7 +1157,7 @@ def __init__(self, macros):


 @dataclass
-@schema_version("manifest", 6)
+@schema_version("manifest", 7)


All manifest versions here need to be double check/updated before merging in

ChenyuLInx · 2022-07-26T00:20:21Z

core/dbt/contracts/util.py

                    )
-
+        if get_manifest_schema_version(data) <= 6:


Logic to load existing manifest pre update, double check version before merge/release

core/dbt/include/global_project/macros/etc/statement.sql

gshank

It's looking good!

core/dbt/main.py

core/dbt/include/global_project/macros/etc/statement.sql

core/dbt/compilation.py

core/dbt/parser/models.py

gshank

Looks good!

stu-k

Comments addressed, new additions look good 👍

* Python model beta version with update to manifest that renames `raw_sql` and `compiled_sql` to `raw_code` and `compiled_code` Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com> Co-authored-by: Ian Knox <ian.knox@dbtlabs.com> Co-authored-by: Stu Kilgore <stuart.kilgore@gmail.com>

ChenyuLInx and others added 15 commits June 2, 2022 21:25

Python model draft

adb1ba6

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>

cleaned up incremental logic

6edb9fd

cleanup

cfb90a0

changelog

da70895

whitespace issue

e8839f9

Merge pull request #35 from dbt-labs/feature/python-model-v1-incremental

359589f

Feature/python model v1 incremental

Misc experimentation

e7565c6

only relative config, disallow jinja

0db6d53

remove package detectiona and fix this

bf4c010

rebase and adjust context

e6ea97e

compilation without quoting for python models

7d0fca1

more error for wrong language

bffa6f4

clean up

bca78a8

new submit python args

84bb087

name adjustments

cf9d020

cla-bot bot added the cla:yes label Jun 29, 2022

chamini2 reviewed Jun 29, 2022

View reviewed changes

jtcohen6 mentioned this pull request Jul 5, 2022

Beta docs: Python models dbt-labs/docs.getdbt.com#1664

Closed

1 task

ueshin reviewed Jul 6, 2022

View reviewed changes

ChenyuLInx and others added 9 commits July 7, 2022 14:14

merge main

81e84c7

fix all tests

54c9aa2

Add language to tracked fields in run_model

faaa5e7

Revert "Add language to tracked fields in run_model"

92c9013

This reverts commit faaa5e7.

update to include python files (#5470)

876ee6b

add model_language to run (#5512)

34469c3

validate python model args (#5511)

820e031

* validate python model args * add file_name to parser for better error message * more info for easier searching during syntax error

A decorator for log code execution (#5510)

445a4dc

Merge branch 'main' into feature/python-model-v1

b370b94

ChenyuLInx marked this pull request as ready for review July 24, 2022 18:37

ChenyuLInx requested a review from a team July 24, 2022 18:37

ChenyuLInx requested review from iknox-fa, McKnight-42, gshank and jtcohen6 July 24, 2022 18:37

stu-k and others added 2 commits July 25, 2022 14:58

Add language to tracked fields in run_model (#5469)

b510dd8

Update raw_sql to raw_code(#5499)

09f0f84

We also moved language as a new required attribute for node. And added support to load existing manifest Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>

ChenyuLInx commented Jul 26, 2022

View reviewed changes

ChenyuLInx added 2 commits July 25, 2022 20:48

update version in manifest

6cbf1a4

add compatible previous version

2439c0c

ChenyuLInx requested a review from VersusFacit July 26, 2022 04:39

gshank reviewed Jul 26, 2022

View reviewed changes

core/dbt/include/global_project/macros/etc/statement.sql Outdated Show resolved Hide resolved

gshank reviewed Jul 26, 2022

View reviewed changes

core/dbt/main.py Outdated Show resolved Hide resolved

core/dbt/include/global_project/macros/etc/statement.sql Outdated Show resolved Hide resolved

lostmygithubaccount reviewed Jul 27, 2022

View reviewed changes

core/dbt/compilation.py Outdated Show resolved Hide resolved

ChenyuLInx commented Jul 27, 2022

View reviewed changes

core/dbt/parser/models.py Show resolved Hide resolved

review feedback

1d1f915

ChenyuLInx requested a review from gshank July 27, 2022 19:46

gshank approved these changes Jul 27, 2022

View reviewed changes

ChenyuLInx and others added 5 commits July 27, 2022 17:37

add check for model function, return and add comment to compile code

48aca3b

fix test

191b320

only dis-allow tuple

e870000

Add python incremental materialization test (#5571)

728897d

new name for code in docs

8433945

stu-k approved these changes Jul 28, 2022

View reviewed changes

ChenyuLInx merged commit a7ff003 into main Jul 28, 2022

ChenyuLInx deleted the feature/python-model-v1 branch July 28, 2022 18:43

ChenyuLInx changed the title ~~Draft for python model~~ python model Jul 28, 2022

jtcohen6 mentioned this pull request Sep 19, 2022

[CT-1208] [Bug] Manifest deserialization error with disabled models (v1.2 -> v1.3) #5883

Closed

2 tasks

imkehno mentioned this pull request Oct 14, 2022

DBT-core 1.3 incompatibility mjirv/dbt-datamocktool#55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python model #5421

python model #5421

ChenyuLInx commented Jun 29, 2022 •

edited

chamini2 Jun 29, 2022

jtcohen6 Jun 29, 2022

chamini2 Jun 29, 2022

ChenyuLInx Jul 7, 2022

ChenyuLInx Jul 21, 2022

ueshin Jul 6, 2022

ChenyuLInx Jul 7, 2022

ChenyuLInx Jul 26, 2022

ChenyuLInx Jul 26, 2022

gshank left a comment

gshank left a comment

stu-k left a comment

python model #5421

python model #5421

Conversation

ChenyuLInx commented Jun 29, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gshank left a comment

Choose a reason for hiding this comment

gshank left a comment

Choose a reason for hiding this comment

stu-k left a comment

Choose a reason for hiding this comment

ChenyuLInx commented Jun 29, 2022 •

edited