More flexible cluster configuration #467

ChenyuLInx · 2022-09-16T00:37:41Z

resolves #444

Description

When using notebook submission, if job_cluster_config is specified, we will run that model with a job_cluster.
This PR also makes user being able to specify a separate cluster_id or job_cluster_config for each individual model through config.

job_cluster_config will overwrite cluster_id in current situation.(if job_cluster_config is set for a model, we will always use it).

This PR also removes the need for user and put dbt model files under /dbt_python_mode/{$SCHEMA} in Databricks workspace

Checklist

I have read the contributing guide and understand what's expected of me
I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have opened an issue to add/update docs, or docs changes are not required/relevant for this PR
I have run changie new to create a changelog entry

github-actions · 2022-09-16T00:38:00Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-spark contributing guide.

ChenyuLInx · 2022-09-23T01:25:58Z

`job_cluster_config` will overwrite cluster_id in current situation.(if job_cluster_config is set for a model, we will always create a job cluster for that model, regardless of cluster_id).

@jtcohen6 @lostmygithubaccount is the current overwriting logic odd?

ueshin · 2022-09-28T20:58:50Z

dbt/adapters/spark/python_submissions.py

+
+    @property
+    def cluster_id(self) -> str:
+        return self.parsed_model.get("cluster_id", self.credentials.cluster_id)


@ChenyuLInx I'm updating and testing dbt-databricks, and found out that this is a mistake.

I think this should be:

self.parsed_model["config"].get("cluster_id", self.credentials.cluster_id)

### Description Follows "More flexible cluster configuration" at dbt-labs/dbt-spark#467. - Reuse `dbt-spark`'s implementation - Remove the dependency on `databricks-cli` - Internal refactorings Co-authored-by: allisonwang-db <allison.wang@databricks.com>

cla-bot bot added the cla:yes label Sep 16, 2022

ChenyuLInx changed the title ~~cluster submission playaround~~ More flexible cluster configuration Sep 23, 2022

ChenyuLInx marked this pull request as ready for review September 23, 2022 17:19

ChenyuLInx requested review from stu-k, gshank and McKnight-42 September 23, 2022 17:19

gshank approved these changes Sep 23, 2022

View reviewed changes

ChenyuLInx added 5 commits September 23, 2022 14:45

some hacky test

f0edf96

add job_cluster_config, make cluster_id configurable

b98d246

fix check credential

4a5f36f

add changelog

f548013

update config hierachy for python submission method

37c5377

ChenyuLInx force-pushed the enhancement/job_cluster_run branch from 904bbd1 to 37c5377 Compare September 23, 2022 21:45

ChenyuLInx merged commit f20aecd into main Sep 23, 2022

ChenyuLInx deleted the enhancement/job_cluster_run branch September 23, 2022 22:59

ueshin mentioned this pull request Sep 26, 2022

Follow "More flexible cluster configuration". databricks/dbt-databricks#194

Merged

ueshin reviewed Sep 28, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More flexible cluster configuration #467

More flexible cluster configuration #467

ChenyuLInx commented Sep 16, 2022 •

edited

Loading

github-actions bot commented Sep 16, 2022

ChenyuLInx commented Sep 23, 2022

ueshin Sep 28, 2022 •

edited

Loading

More flexible cluster configuration #467

More flexible cluster configuration #467

Conversation

ChenyuLInx commented Sep 16, 2022 • edited Loading

Description

Checklist

github-actions bot commented Sep 16, 2022

ChenyuLInx commented Sep 23, 2022

ueshin Sep 28, 2022 • edited Loading

Choose a reason for hiding this comment

ChenyuLInx commented Sep 16, 2022 •

edited

Loading

ueshin Sep 28, 2022 •

edited

Loading