Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible cluster configuration #467

Merged
merged 5 commits into from
Sep 23, 2022
Merged

Conversation

ChenyuLInx
Copy link
Contributor

@ChenyuLInx ChenyuLInx commented Sep 16, 2022

resolves #444

Description

Screenshot 2022-09-22 at 6 15 44 PM

Screenshot 2022-09-22 at 6 15 49 PM

When using notebook submission, if job_cluster_config is specified, we will run that model with a job_cluster.
This PR also makes user being able to specify a separate cluster_id or job_cluster_config for each individual model through config.

job_cluster_config will overwrite cluster_id in current situation.(if job_cluster_config is set for a model, we will always use it).

This PR also removes the need for user and put dbt model files under /dbt_python_mode/{$SCHEMA} in Databricks workspace

Checklist

@cla-bot cla-bot bot added the cla:yes label Sep 16, 2022
@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-spark contributing guide.

@ChenyuLInx ChenyuLInx changed the title cluster submission playaround More flexible cluster configuration Sep 23, 2022
@ChenyuLInx
Copy link
Contributor Author

`job_cluster_config` will overwrite cluster_id in current situation.(if job_cluster_config is set for a model, we will always create a job cluster for that model, regardless of cluster_id).

@jtcohen6 @lostmygithubaccount is the current overwriting logic odd?

@ChenyuLInx ChenyuLInx marked this pull request as ready for review September 23, 2022 17:19

@property
def cluster_id(self) -> str:
return self.parsed_model.get("cluster_id", self.credentials.cluster_id)
Copy link
Contributor

@ueshin ueshin Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenyuLInx I'm updating and testing dbt-databricks, and found out that this is a mistake.

I think this should be:

self.parsed_model["config"].get("cluster_id", self.credentials.cluster_id)

ueshin added a commit to databricks/dbt-databricks that referenced this pull request Sep 28, 2022
### Description

Follows "More flexible cluster configuration" at dbt-labs/dbt-spark#467.

- Reuse `dbt-spark`'s implementation
- Remove the dependency on `databricks-cli`
- Internal refactorings

Co-authored-by: allisonwang-db <allison.wang@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1095] Support Jobs Cluster for python models
3 participants