Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added source parameter for spark_python_task in databricks_job #2157

Merged
merged 1 commit into from Mar 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 5 additions & 2 deletions docs/resources/job.md
Expand Up @@ -143,6 +143,7 @@ Each entry in `webhook_notification` block takes a list `webhook` blocks. The fi
Note that the `id` is not to be confused with the name of the alert destination. The `id` can be retrieved through the API or the URL of Databricks UI `https://<workspace host>/sql/destinations/<notification id>?o=<workspace id>`

Example

```hcl
webhook_notifications {
on_failure {
Expand Down Expand Up @@ -170,13 +171,14 @@ You can invoke Spark submit tasks only on new clusters. **In the `new_cluster` s

### spark_python_task Configuration Block

* `python_file` - (Required) The URI of the Python file to be executed. [databricks_dbfs_file](dbfs_file.md#path), cloud file URIs (e.g. `s3:/`, `abfss:/`, `gs:/`) and workspace paths are supported. For python files stored in the Databricks workspace, the path must be absolute and begin with `/Repos`. This field is required.
* `python_file` - (Required) The URI of the Python file to be executed. [databricks_dbfs_file](dbfs_file.md#path), cloud file URIs (e.g. `s3:/`, `abfss:/`, `gs:/`), workspace paths and remote repository are supported. For Python files stored in the Databricks workspace, the path must be absolute and begin with `/Repos`. For files stored in a remote repository, the path must be relative. This field is required.
* `source` - (Optional) Location type of the Python file, can only be `GIT`. When set to `GIT`, the Python file will be retrieved from a Git repository defined in `git_source`.
* `parameters` - (Optional) (List) Command line parameters passed to the Python file.

### notebook_task Configuration Block

* `notebook_path` - (Required) The path of the [databricks_notebook](notebook.md#path) to be run in the Databricks workspace or remote repository. For notebooks stored in the Databricks workspace, the path must be absolute and begin with a slash. For notebooks stored in a remote repository, the path must be relative. This field is required.
* `source` - (Optional) Location type of the notebook, can only be `WORKSPACE` or `GIT`. When set to `WORKSPACE`, the notebook will be retrieved from the local Databricks workspace. When set to `GIT`, the notebook will be retrieved from a Git repository defined in git_source. If the value is empty, the task will use `GIT` if `git_source` is defined and `WORKSPACE` otherwise.
* `source` - (Optional) Location type of the notebook, can only be `WORKSPACE` or `GIT`. When set to `WORKSPACE`, the notebook will be retrieved from the local Databricks workspace. When set to `GIT`, the notebook will be retrieved from a Git repository defined in `git_source`. If the value is empty, the task will use `GIT` if `git_source` is defined and `WORKSPACE` otherwise.
* `base_parameters` - (Optional) (Map) Base parameters to be used for each run of this job. If the run is initiated by a call to run-now with parameters specified, the two parameters maps will be merged. If the same key is specified in base_parameters and in run-now, the value from run-now will be used. If the notebook takes a parameter that is not specified in the job’s base_parameters or the run-now override parameters, the default value from the notebook will be used. Retrieve these parameters in a notebook using `dbutils.widgets.get`.

### pipeline_task Configuration Block
Expand Down Expand Up @@ -214,6 +216,7 @@ One of the `query`, `dashboard` or `alert` needs to be provided.
* `alert` - (Optional) block consisting of single string field: `alert_id` - identifier of the Databricks SQL Alert.

Example

```hcl
resource "databricks_job" "sql_aggregation_job" {
name = "Example SQL Job"
Expand Down
1 change: 1 addition & 0 deletions jobs/resource_job.go
Expand Up @@ -31,6 +31,7 @@ type NotebookTask struct {
// SparkPythonTask contains the information for python jobs
type SparkPythonTask struct {
PythonFile string `json:"python_file"`
Source string `json:"source,omitempty" tf:"suppress_diff"`
Parameters []string `json:"parameters,omitempty"`
}

Expand Down