GitHub - databricks-internal/terraform-databricks-workspace-management: Terraform module for Databricks AWS E2 Workspace Management: https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/guides/workspace-management

Terraform module for Databricks Workspace Management (Part 2)

❗️ Important

👉 This Terraform module assumes you have Databricks Workspace already deployed.

👉 Workspace URL

👉 DAPI Token

Versions

Module tested for Terraform 1.0.1.
databrickslabs/databricks provider version 0.3.5
AWS provider version 3.47.0.
main branch: Provider versions not pinned to keep up with Terraform releases.
tags releases: Tags are pinned with versions (use ).

What this module does?

Deploy Cluster

This is where you would normally start with if you have just deployed your databricks workspace.

Two cluster modes are supported by this module:

Single Node mode: To deploy cluster in Single Node mode, update fixed_value to 0:

fixed_value         = 0

Standard mode: To deploy in Standard mode, two options are available:

fixed_value         = 1 or more

OR

auto_scaling         = [1,3]

Cluster ACL

Cluster can have one of these permissions: CAN_ATTACH_TO , CAN_RESTART and CAN_MANAGE.

cluster_access_control = [
  {
    group_name       = "<group_name>"
    permission_level = "CAN_RESTART"
  }
]

Cluster Policy

To build cluster with new cluster policy, use:

deploy_cluster_policy = true
policy_overrides = {
  "dbus_per_hour" : {
    "type" : "range",
    "maxValue" : 10
  },
  "autotermination_minutes" : {
    "type" : "fixed",
    "value" : 30,
    "hidden" : true
  }
}

To use existing Cluster policy, specify the existing policy id:

cluster_policy_id = "E0123456789"

To get existing policy id use:

curl -X GET --header "Authorization: Bearer $DAPI_TOKEN"  https://<workspace_name>/api/2.0/policies/clusters/list \
--data '{ "sort_order": "DESC", "sort_column": "POLICY_CREATION_TIME" }'

Cluster Policy ACL

policy_access_control = [
  {
    group_name       = "<group_name>"
    permission_level = "CAN_USE"
  }
]

Instance Profile

add_instance_profile_to_workspace = true (default false)
aws_attributes = {
    instance_profile_arn = "arn:aws:iam::123456789012:instance-profile/aws-instance-role"
}

Note: add_instance_profile_to_workspace to add Instance profile to Databricks workspace. To use existing set it to false.

Instance Pool

Note: To configure Instance Pool, add below configuration:

deploy_worker_instance_pool           = true
min_idle_instances                    = 1
max_capacity                          = 5
idle_instance_autotermination_minutes = 30

Instance Pool ACL

Instance pool can have one of these permissions: CAN_ATTACH_TO and CAN_MANAGE.

instance_pool_access_control = [
  {
    group_name       = "<group_name>"
    permission_level = "CAN_ATTACH_TO"
  }
]

❗️ Important

If deploy_worker_instance_pool is set to true and auto_scaling is enabled. Ensure max_capacity of Cluster Instance Pool is more than auto_scaling max value for Cluster.

Deploy Job

Two options are available:

Deploy Job to an existing cluster.
Deploy new Cluster and then deploy Job.

Two options are available to attach notebooks to a job:

Attach existing notebook to a job.
Create new notebook and attach it to a job.

Jobs ACL

Job can have one of these permissions: CAN_VIEW, CAN_MANAGE_RUN, IS_OWNER, and CAN_MANAGE.

Admins have CAN_MANAGE permission by default, and they can assign that permission to non-admin users, and service principals.

Job creator has IS_OWNER permission. Destroying databricks_permissions resource for a job would revert ownership to the creator.

Note:

A job must have exactly one owner. If resource is changed and no owner is specified, currently authenticated principal would become new owner of the job.
A job cannot have a group as an owner.
Jobs triggered through Run Now assume the permissions of the job owner and not the user, and service principal who issued Run Now.

jobs_access_control = [
  {
    group_name       = "<group_name>"
    permission_level = "CAN_MANAGE_RUN"
  }
]

Deploy Notebook

Put notebooks in notebooks folder and provide below information:

  notebooks = [
    {
      name       = "demo_notebook1"
      language   = "PYTHON"
      local_path = "notebooks/sample1.py"
      path       = "/Shared/demo/sample1.py"
    },
    {
      name       = "demo_notebook2"
      local_path = "notebooks/sample2.py"
    }
  ]

Notebook ACL

Notebook can have one of these permissions: CAN_READ, CAN_RUN, CAN_EDIT, and CAN_MANAGE.

notebooks_access_control = [
  {
    group_name       = "<group_name>"
    permission_level = "CAN_MANAGE"
  }
]

Deploy everything(cluster,job, and notebook):

Try this: If you want to test what resources are getting deployed.

Usage

Option 1:

terrafrom init
terraform plan -var='teamid=tryme' -var='prjid=project'
terraform apply -var='teamid=tryme' -var='prjid=project'
terraform destroy -var='teamid=tryme' -var='prjid=project'

Note: With this option please take care of remote state storage

Option 2:

Recommended method (store remote state in S3 using `prjid` and `teamid` to create directory structure):

Create python 3.6+ virtual environment

python3 -m venv <venv name>

Install package:

pip install tfremote

Set below environment variables:

export TF_AWS_BUCKET=<remote state bucket name>
export TF_AWS_PROFILE=default
export TF_AWS_BUCKET_REGION=us-west-2

Updated examples directory with required values.
Run and verify the output before deploying:

tf -c=aws plan -var='teamid=foo' -var='prjid=bar'

Run below to deploy:

tf -c=aws apply -var='teamid=foo' -var='prjid=bar'

Run below to destroy:

tf -c=aws destroy -var='teamid=foo' -var='prjid=bar'

NOTE:

Read more on tfremote

Databricks workspace management with default config

module "databricks_workspace_management" {
  source = "git::git@github.com:tomarv2/terraform-databricks-workspace-management.git"

  workspace_url = "https://<workspace-url>.cloud.sample.com"
  dapi_token    = "dapi123456789012"

  deploy_cluster  = true
  deploy_jobs      = true
  deploy_notebook = true
  notebook_path   = "notebooks/sample.py"
  notebook_name   = "demo-notebook"
  # -----------------------------------------
  # Do not change the teamid, prjid once set.
  teamid = var.teamid
  prjid  = var.prjid
}

Please refer to examples directory link for references.

Coming up

Helpful links

Databricks Sync - Tool for multi cloud migrations, DR sync of workspaces. It uses TF in the backend. Run it from command line or from a notebook.
Databricks Migrate - Tool to migrate a workspace(One time tool).
Databricks CICD Templates

Troubleshooting

Common error messages. Try the step one again if you below error.

Error: Failed to delete token in Scope <scope name>

Error: Scope <scope name> does not exist!

Requirements

Name	Version
terraform	>= 1.0.1
aws	~> 3.47
databricks	~> 0.3.5

Providers

Name	Version
databricks	0.3.5

Modules

No modules.

Resources

Name	Type
databricks_cluster.cluster	resource
databricks_cluster.single_node_cluster	resource
databricks_cluster_policy.this	resource
databricks_group.this	resource
databricks_group_member.group_members	resource
databricks_instance_pool.driver_instance_nodes	resource
databricks_instance_pool.worker_instance_nodes	resource
databricks_instance_profile.shared	resource
databricks_job.existing_cluster_new_job_existing_notebooks	resource
databricks_job.existing_cluster_new_job_new_notebooks	resource
databricks_job.new_cluster_new_job_existing_notebooks	resource
databricks_job.new_cluster_new_job_new_notebooks	resource
databricks_notebook.notebook_file	resource
databricks_notebook.notebook_file_deployment	resource
databricks_permissions.cluster	resource
databricks_permissions.driver_pool	resource
databricks_permissions.existing_cluster_new_job_new_notebooks	resource
databricks_permissions.jobs_notebook	resource
databricks_permissions.new_cluster_new_job_new_notebooks	resource
databricks_permissions.notebook	resource
databricks_permissions.policy	resource
databricks_permissions.worker_pool	resource
databricks_secret_acl.spectators	resource
databricks_user.users	resource
databricks_current_user.me	data source
databricks_node_type.cluster_node_type	data source
databricks_spark_version.latest	data source

Inputs

Name	Description	Type	Default	Required
add_instance_profile_to_workspace	Existing AWS instance profile ARN	`any`	`false`	no
allow_cluster_create	This is a field to allow the group to have cluster create privileges. More fine grained permissions could be assigned with databricks_permissions and cluster_id argument. Everyone without allow_cluster_create argument set, but with permission to use Cluster Policy would be able to create clusters, but within boundaries of that specific policy.	`bool`	`true`	no
allow_instance_pool_create	This is a field to allow the group to have instance pool create privileges. More fine grained permissions could be assigned with databricks_permissions and instance_pool_id argument.	`bool`	`true`	no
allow_sql_analytics_access	This is a field to allow the group to have access to SQL Analytics feature through databricks_sql_endpoint.	`bool`	`true`	no
auto_scaling	Number of min and max workers in auto scale.	`list(any)`	`null`	no
aws_attributes	Optional configuration block contains attributes related to clusters running on AWS	`any`	`null`	no
category	Node category, which can be one of: General purpose, Memory optimized, Storage optimized, Compute optimized, GPU	`string`	`"General purpose"`	no
cluster_access_control	Cluster access control	`any`	`null`	no
cluster_autotermination_minutes	cluster auto termination duration	`number`	`30`	no
cluster_id	Existing cluster id	`string`	`null`	no
cluster_policy_id	Exiting cluster policy id	`string`	`null`	no
create_group	Create a new group, if group already exists the deployment will fail.	`bool`	`false`	no
create_user	Create a new user, if user already exists the deployment will fail.	`bool`	`false`	no
dapi_token	databricks dapi token	`string`	n/a	yes
dapi_token_duration	databricks dapi token duration	`number`	`3600`	no
databricks_secret_key	databricks token type	`string`	`"token"`	no
databricks_username	User allowed to access the platform.	`string`	`""`	no
deploy_cluster	feature flag, true or false	`bool`	`false`	no
deploy_cluster_policy	feature flag, true or false	`bool`	`false`	no
deploy_driver_instance_pool	Driver instance pool	`bool`	`false`	no
deploy_jobs	feature flag, true or false	`bool`	`false`	no
deploy_worker_instance_pool	Worker instance pool	`bool`	`false`	no
driver_node_type_id	The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id.	`string`	`null`	no
email_notifications	Email notification block.	`any`	`null`	no
fixed_value	Number of nodes in the cluster.	`number`	`0`	no
gb_per_core	Number of gigabytes per core available on instance. Conflicts with min_memory_gb. Defaults to 0.	`string`	`0`	no
gpu	GPU required or not.	`bool`	`false`	no
group_can_attach_to	Group allowed to access the platform.	`string`	`""`	no
group_can_manage	Group allowed to access the platform.	`string`	`""`	no
group_can_restart	Group allowed to access the platform.	`string`	`""`	no
idle_instance_autotermination_minutes	idle instance auto termination duration	`number`	`20`	no
instance_pool_access_control	Instance pool access control	`any`	`null`	no
instance_profile_arn	ARN attribute of aws_iam_instance_profile output, the EC2 instance profile association to AWS IAM role. This ARN would be validated upon resource creation and it's not possible to skip validation.	`any`	`null`	no
is_meta_instance_profile	Whether the instance profile is a meta instance profile. Used only in IAM credential passthrough.	`any`	`false`	no
jobs_access_control	Jobs access control	`any`	`null`	no
language	notebook language	`string`	`"PYTHON"`	no
local_disk	Pick only nodes with local storage. Defaults to false.	`string`	`true`	no
local_notebooks	nested block: NestingSet, min items: 0, max items: 0	`any`	`[]`	no
local_path	notebook location on user machine	`string`	`null`	no
max_capacity	instance pool maximum capacity	`number`	`3`	no
max_concurrent_runs	An optional maximum allowed number of concurrent runs of the job.	`number`	`null`	no
max_retries	An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with a FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry indefinitely and the value 0 means to never retry. The default behavior is to never retry.	`number`	`0`	no
min_cores	Minimum number of CPU cores available on instance. Defaults to 0.	`string`	`0`	no
min_gpus	Minimum number of GPU's attached to instance. Defaults to 0.	`string`	`0`	no
min_idle_instances	instance pool minimum idle instances	`number`	`1`	no
min_memory_gb	Minimum amount of memory per node in gigabytes. Defaults to 0.	`string`	`0`	no
min_retry_interval_millis	An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried.	`number`	`null`	no
ml	ML required or not.	`bool`	`false`	no
notebook_access_control	Notebook access control	`any`	`null`	no
notebooks	nested block: NestingSet, min items: 0, max items: 0	`any`	`[]`	no
num_workers	number of workers for job	`number`	`1`	no
policy_access_control	Policy access control	`any`	`null`	no
policy_overrides	Cluster policy overrides	`any`	`null`	no
prjid	(Required) Name of the project/stack e.g: mystack, nifieks, demoaci. Should not be changed after running 'tf apply'	`string`	n/a	yes
remote_notebooks	nested block: NestingSet, min items: 0, max items: 0	`any`	`[]`	no
retry_on_timeout	An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout.	`bool`	`false`	no
schedule	Job schedule configuration.	`map(any)`	`null`	no
spark_conf	Optional Spark configuration block	`any`	`null`	no
spark_version	Runtime version of the cluster. Any supported databricks_spark_version id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control.	`string`	`null`	no
task_parameters	Base parameters to be used for each run of this job.	`map(any)`	`{}`	no
teamid	(Required) Name of the team/group e.g. devops, dataengineering. Should not be changed after running 'tf apply'	`string`	n/a	yes
timeout	An optional timeout applied to each run of this job. The default behavior is to have no timeout.	`number`	`null`	no
worker_node_type_id	The node type of the Spark worker.	`string`	`null`	no
workspace_url	databricks workspace url	`string`	n/a	yes

Outputs

Name	Description
cluster_id	databricks cluster id
cluster_policy_id	databricks cluster policy permissions
databricks_group	databricks group name
databricks_group_member	databricks group members
databricks_secret_acl	databricks secret acl
databricks_user	databricks user name
databricks_user_id	databricks user id
existing_cluster_new_job_existing_notebooks_id	databricks new cluster job id
existing_cluster_new_job_existing_notebooks_job	databricks new cluster job url
existing_cluster_new_job_new_notebooks_id	databricks new cluster job id
existing_cluster_new_job_new_notebooks_job	databricks new cluster job url
instance_profile	databricks instance profile ARN
new_cluster_new_job_existing_notebooks_id	databricks job id
new_cluster_new_job_existing_notebooks_job	databricks job url
new_cluster_new_job_new_notebooks_id	databricks job id
new_cluster_new_job_new_notebooks_job	databricks job url
notebook_url	databricks notebook url
notebook_url_standalone	databricks notebook url standalone
single_node_cluster_id	databricks cluster id

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github/workflows		.github/workflows
docs/images		docs/images
examples		examples
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
cluster.tf		cluster.tf
cluster_instance_pool.tf		cluster_instance_pool.tf
cluster_node_type.tf		cluster_node_type.tf
cluster_policy.tf		cluster_policy.tf
job.tf		job.tf
locals.tf		locals.tf
notebook.tf		notebook.tf
output.tf		output.tf
pat.tf		pat.tf
permissions.tf		permissions.tf
variables.tf		variables.tf
versions.tf		versions.tf
workspace-security.tf		workspace-security.tf

License

databricks-internal/terraform-databricks-workspace-management

Folders and files

Latest commit

History

Repository files navigation

Terraform module for Databricks Workspace Management (Part 2)

Versions

What this module does?

Cluster Policy ACL

Notebook ACL

Usage

Option 1:

Option 2:

Recommended method (store remote state in S3 using prjid and teamid to create directory structure):

Databricks workspace management with default config

Coming up

Helpful links

Troubleshooting

Requirements

Providers

Modules

Resources

Inputs

Outputs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

Recommended method (store remote state in S3 using `prjid` and `teamid` to create directory structure):