Terraform module for Databricks Workspace Management (Part 2)
❗️ Important
👉 This Terraform module assumes you have Databricks Workspace already deployed.
👉 Workspace URL
👉 DAPI Token
- Module tested for Terraform 1.0.1.
databrickslabs/databricks
provider version 0.3.5- AWS provider version 3.47.0.
main
branch: Provider versions not pinned to keep up with Terraform releases.tags
releases: Tags are pinned with versions (use).
- This is where you would normally start with if you have just deployed your databricks workspace.
Two cluster modes are supported by this module:
Single Node
mode: To deploy cluster in Single Node mode, updatefixed_value
to0
:
fixed_value = 0
Standard
mode: To deploy in Standard mode, two options are available:
fixed_value = 1 or more
OR
auto_scaling = [1,3]
Cluster can have one of these permissions: CAN_ATTACH_TO
, CAN_RESTART
and CAN_MANAGE
.
cluster_access_control = [
{
group_name = "<group_name>"
permission_level = "CAN_RESTART"
}
]
- To build cluster with new cluster policy, use:
deploy_cluster_policy = true
policy_overrides = {
"dbus_per_hour" : {
"type" : "range",
"maxValue" : 10
},
"autotermination_minutes" : {
"type" : "fixed",
"value" : 30,
"hidden" : true
}
}
- To use existing Cluster policy, specify the existing policy id:
cluster_policy_id = "E0123456789"
To get existing policy id use:
curl -X GET --header "Authorization: Bearer $DAPI_TOKEN" https://<workspace_name>/api/2.0/policies/clusters/list \
--data '{ "sort_order": "DESC", "sort_column": "POLICY_CREATION_TIME" }'
policy_access_control = [
{
group_name = "<group_name>"
permission_level = "CAN_USE"
}
]
add_instance_profile_to_workspace = true (default false)
aws_attributes = {
instance_profile_arn = "arn:aws:iam::123456789012:instance-profile/aws-instance-role"
}
Note: add_instance_profile_to_workspace
to add Instance profile to Databricks workspace. To use existing set it to false
.
Note: To configure Instance Pool
, add below configuration:
deploy_worker_instance_pool = true
min_idle_instances = 1
max_capacity = 5
idle_instance_autotermination_minutes = 30
Instance pool can have one of these permissions: CAN_ATTACH_TO
and CAN_MANAGE
.
instance_pool_access_control = [
{
group_name = "<group_name>"
permission_level = "CAN_ATTACH_TO"
}
]
❗️ Important
If
deploy_worker_instance_pool
is set totrue
andauto_scaling
is enabled. Ensuremax_capacity
of Cluster Instance Pool is more thanauto_scaling
max value for Cluster.
Two options are available:
- Deploy Job to an existing cluster.
- Deploy new Cluster and then deploy Job.
Two options are available to attach notebooks to a job:
- Attach existing notebook to a job.
- Create new notebook and attach it to a job.
Job can have one of these permissions: CAN_VIEW
, CAN_MANAGE_RUN
, IS_OWNER
, and CAN_MANAGE
.
Admins have CAN_MANAGE
permission by default, and they can assign that permission to non-admin users, and service principals.
Job creator has IS_OWNER
permission. Destroying databricks_permissions resource for a job would revert ownership to the creator.
Note:
- A job must have exactly one owner. If resource is changed and no owner is specified, currently authenticated principal would become new owner of the job.
- A job cannot have a group as an owner.
- Jobs triggered through Run Now assume the permissions of the job owner and not the user, and service principal who issued Run Now.
jobs_access_control = [
{
group_name = "<group_name>"
permission_level = "CAN_MANAGE_RUN"
}
]
Put notebooks in notebooks folder and provide below information:
notebooks = [
{
name = "demo_notebook1"
language = "PYTHON"
local_path = "notebooks/sample1.py"
path = "/Shared/demo/sample1.py"
},
{
name = "demo_notebook2"
local_path = "notebooks/sample2.py"
}
]
Notebook can have one of these permissions: CAN_READ
, CAN_RUN
, CAN_EDIT
, and CAN_MANAGE
.
notebooks_access_control = [
{
group_name = "<group_name>"
permission_level = "CAN_MANAGE"
}
]
- Try this: If you want to test what resources are getting deployed.
terrafrom init
terraform plan -var='teamid=tryme' -var='prjid=project'
terraform apply -var='teamid=tryme' -var='prjid=project'
terraform destroy -var='teamid=tryme' -var='prjid=project'
Note: With this option please take care of remote state storage
- Create python 3.6+ virtual environment
python3 -m venv <venv name>
- Install package:
pip install tfremote
- Set below environment variables:
export TF_AWS_BUCKET=<remote state bucket name>
export TF_AWS_PROFILE=default
export TF_AWS_BUCKET_REGION=us-west-2
-
Updated
examples
directory with required values. -
Run and verify the output before deploying:
tf -c=aws plan -var='teamid=foo' -var='prjid=bar'
- Run below to deploy:
tf -c=aws apply -var='teamid=foo' -var='prjid=bar'
- Run below to destroy:
tf -c=aws destroy -var='teamid=foo' -var='prjid=bar'
NOTE:
- Read more on tfremote
module "databricks_workspace_management" {
source = "git::git@github.com:tomarv2/terraform-databricks-workspace-management.git"
workspace_url = "https://<workspace-url>.cloud.sample.com"
dapi_token = "dapi123456789012"
deploy_cluster = true
deploy_jobs = true
deploy_notebook = true
notebook_path = "notebooks/sample.py"
notebook_name = "demo-notebook"
# -----------------------------------------
# Do not change the teamid, prjid once set.
teamid = var.teamid
prjid = var.prjid
}
Please refer to examples directory link for references.
- Databricks Sync - Tool for multi cloud migrations, DR sync of workspaces. It uses TF in the backend. Run it from command line or from a notebook.
- Databricks Migrate - Tool to migrate a workspace(One time tool).
- Databricks CICD Templates
Common error messages. Try the step one again if you below error.
Error: Failed to delete token in Scope <scope name>
Error: Scope <scope name> does not exist!
Name | Version |
---|---|
terraform | >= 1.0.1 |
aws | ~> 3.47 |
databricks | ~> 0.3.5 |
Name | Version |
---|---|
databricks | 0.3.5 |
No modules.
Name | Description | Type | Default | Required |
---|---|---|---|---|
add_instance_profile_to_workspace | Existing AWS instance profile ARN | any |
false |
no |
allow_cluster_create | This is a field to allow the group to have cluster create privileges. More fine grained permissions could be assigned with databricks_permissions and cluster_id argument. Everyone without allow_cluster_create argument set, but with permission to use Cluster Policy would be able to create clusters, but within boundaries of that specific policy. | bool |
true |
no |
allow_instance_pool_create | This is a field to allow the group to have instance pool create privileges. More fine grained permissions could be assigned with databricks_permissions and instance_pool_id argument. | bool |
true |
no |
allow_sql_analytics_access | This is a field to allow the group to have access to SQL Analytics feature through databricks_sql_endpoint. | bool |
true |
no |
auto_scaling | Number of min and max workers in auto scale. | list(any) |
null |
no |
aws_attributes | Optional configuration block contains attributes related to clusters running on AWS | any |
null |
no |
category | Node category, which can be one of: General purpose, Memory optimized, Storage optimized, Compute optimized, GPU | string |
"General purpose" |
no |
cluster_access_control | Cluster access control | any |
null |
no |
cluster_autotermination_minutes | cluster auto termination duration | number |
30 |
no |
cluster_id | Existing cluster id | string |
null |
no |
cluster_policy_id | Exiting cluster policy id | string |
null |
no |
create_group | Create a new group, if group already exists the deployment will fail. | bool |
false |
no |
create_user | Create a new user, if user already exists the deployment will fail. | bool |
false |
no |
dapi_token | databricks dapi token | string |
n/a | yes |
dapi_token_duration | databricks dapi token duration | number |
3600 |
no |
databricks_secret_key | databricks token type | string |
"token" |
no |
databricks_username | User allowed to access the platform. | string |
"" |
no |
deploy_cluster | feature flag, true or false | bool |
false |
no |
deploy_cluster_policy | feature flag, true or false | bool |
false |
no |
deploy_driver_instance_pool | Driver instance pool | bool |
false |
no |
deploy_jobs | feature flag, true or false | bool |
false |
no |
deploy_worker_instance_pool | Worker instance pool | bool |
false |
no |
driver_node_type_id | The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id. | string |
null |
no |
email_notifications | Email notification block. | any |
null |
no |
fixed_value | Number of nodes in the cluster. | number |
0 |
no |
gb_per_core | Number of gigabytes per core available on instance. Conflicts with min_memory_gb. Defaults to 0. | string |
0 |
no |
gpu | GPU required or not. | bool |
false |
no |
group_can_attach_to | Group allowed to access the platform. | string |
"" |
no |
group_can_manage | Group allowed to access the platform. | string |
"" |
no |
group_can_restart | Group allowed to access the platform. | string |
"" |
no |
idle_instance_autotermination_minutes | idle instance auto termination duration | number |
20 |
no |
instance_pool_access_control | Instance pool access control | any |
null |
no |
instance_profile_arn | ARN attribute of aws_iam_instance_profile output, the EC2 instance profile association to AWS IAM role. This ARN would be validated upon resource creation and it's not possible to skip validation. | any |
null |
no |
is_meta_instance_profile | Whether the instance profile is a meta instance profile. Used only in IAM credential passthrough. | any |
false |
no |
jobs_access_control | Jobs access control | any |
null |
no |
language | notebook language | string |
"PYTHON" |
no |
local_disk | Pick only nodes with local storage. Defaults to false. | string |
true |
no |
local_notebooks | nested block: NestingSet, min items: 0, max items: 0 | any |
[] |
no |
local_path | notebook location on user machine | string |
null |
no |
max_capacity | instance pool maximum capacity | number |
3 |
no |
max_concurrent_runs | An optional maximum allowed number of concurrent runs of the job. | number |
null |
no |
max_retries | An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with a FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry indefinitely and the value 0 means to never retry. The default behavior is to never retry. | number |
0 |
no |
min_cores | Minimum number of CPU cores available on instance. Defaults to 0. | string |
0 |
no |
min_gpus | Minimum number of GPU's attached to instance. Defaults to 0. | string |
0 |
no |
min_idle_instances | instance pool minimum idle instances | number |
1 |
no |
min_memory_gb | Minimum amount of memory per node in gigabytes. Defaults to 0. | string |
0 |
no |
min_retry_interval_millis | An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried. | number |
null |
no |
ml | ML required or not. | bool |
false |
no |
notebook_access_control | Notebook access control | any |
null |
no |
notebooks | nested block: NestingSet, min items: 0, max items: 0 | any |
[] |
no |
num_workers | number of workers for job | number |
1 |
no |
policy_access_control | Policy access control | any |
null |
no |
policy_overrides | Cluster policy overrides | any |
null |
no |
prjid | (Required) Name of the project/stack e.g: mystack, nifieks, demoaci. Should not be changed after running 'tf apply' | string |
n/a | yes |
remote_notebooks | nested block: NestingSet, min items: 0, max items: 0 | any |
[] |
no |
retry_on_timeout | An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout. | bool |
false |
no |
schedule | Job schedule configuration. | map(any) |
null |
no |
spark_conf | Optional Spark configuration block | any |
null |
no |
spark_version | Runtime version of the cluster. Any supported databricks_spark_version id. We advise using Cluster Policies to restrict the list of versions for simplicity while maintaining enough control. | string |
null |
no |
task_parameters | Base parameters to be used for each run of this job. | map(any) |
{} |
no |
teamid | (Required) Name of the team/group e.g. devops, dataengineering. Should not be changed after running 'tf apply' | string |
n/a | yes |
timeout | An optional timeout applied to each run of this job. The default behavior is to have no timeout. | number |
null |
no |
worker_node_type_id | The node type of the Spark worker. | string |
null |
no |
workspace_url | databricks workspace url | string |
n/a | yes |