Skip to content

Terraform module for creating a Databricks workspace in AWS within an existing VPC

License

Notifications You must be signed in to change notification settings

OSS-Nextail/terraform-aws-databricks-workspace-custom-vpc

Repository files navigation

terraform-aws-databricks-workspace-custom-vpc

Nextail Terraform module for creating a Databricks workspace in AWS within an existing VPC

Releasing a new version

  1. Run make update-docs and commit updated README
  2. Tag the commit with the proper version

Requirements

Name Version
terraform >= 1.1.2
aws >= 3.37.0
databricks >= 1.0.0

Providers

Name Version
aws >= 3.37.0
databricks >= 1.0.0
time n/a

Modules

Name Source Version
vpc_endpoints terraform-aws-modules/vpc/aws//modules/vpc-endpoints 3.2.0

Resources

Name Type
aws_eip.databricks_nat_gateways_eips resource
aws_iam_role.cross_account_role resource
aws_iam_role_policy.this resource
aws_nat_gateway.databricks_nat_gateways resource
aws_route_table.databricks_main_route_tables resource
aws_route_table.databricks_nat_route_tables resource
aws_route_table_association.databricks_main_route_tables_associations resource
aws_route_table_association.databricks_nat_route_tables_associations resource
aws_s3_bucket.root_storage_bucket resource
aws_s3_bucket_policy.root_bucket_policy resource
aws_s3_bucket_public_access_block.root_storage_bucket resource
aws_s3_bucket_versioning.root_storage_bucket resource
aws_security_group.this resource
aws_subnet.databricks_nat_subnets resource
aws_subnet.databricks_subnets resource
databricks_mws_credentials.this resource
databricks_mws_networks.this resource
databricks_mws_storage_configurations.this resource
databricks_mws_workspaces.this resource
time_sleep.wait resource
aws_internet_gateway.default data source
databricks_aws_assume_role_policy.assume_cross_acount_role data source
databricks_aws_bucket_policy.root_storage_bucket data source
databricks_aws_crossaccount_policy.this data source

Inputs

Name Description Type Default Required
add_deployment_name Whether to add the workspace name as a deployment name. Capability of adding deployment name must be provided by Databricks:
https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_workspaces#deployment_name
bool true no
aws_region AWS Region in which to provision the workspace, e.g. eu-west-1 string n/a yes
create_root_bucket Whether to create and configure the root bucket. If false, the module will assume that root_bucket_name belongs to a valid root bucket that thas been already created by the module bool true no
databricks_account_id Databricks account ID under which to provision the workspace string n/a yes
default_tags (optional) Tags to be set by default in all resources created for the workspace map(string) {} no
resource_prefix Prefix to apply in the names of shared AWS resources to be created for the workspace string n/a yes
root_bucket_name Name of the root bucket for the workspace, e.g. 'myworkspace-root-bucket'. It can be one already in use by other workspaces string n/a yes
security_group_egress_ports (Optional) List of custom ports to allow TCP egress access to 0.0.0.0/0 outside security group.
No need to specify ports 443, 3306 and 6666 as they will be open by default, as recommended by Databricks
list(number) [] no
security_groups_to_allow_egress_to (Optional) List of security group IDs to allow egress to within the VPC list(string) null no
subnets List of subnet definitions per Availability Zone.
Each one will create two subnets:
- Databricks Compute Resources subnet: each compute resource takes two IPs, so a good range of IPs would be from 512 to 4096, depending on specific needs
- NAT Gateway: each NAT subnet takes just one IP, so a /24 CIDR is more than enough
A minimum of two list items in different Availability Zones are required
Note: Internet access is required for Databricks clusters to work. Every NAT Gateway will require an available EIP and a default Internet Gateway in the VPC.
list(object({
main_subnet_cidr_block = string
nat_subnet_cidr_block = string
availability_zone = string
}))
n/a yes
vpc_endpoints (Optional) List of VPC endpoints to create. The valid values are 's3', 'kinesis-streams' and 'sts'.
If not specified, no VPC endpoints will be created. It is recommended to create all where possible.
More info: https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#regional-endpoints-1
map(bool)
{
"kinesis-streams": false,
"s3": false,
"sts": false
}
no
vpc_id ID of the VPC in which to provision the workspace. The VPC must have a valid Internet Gateway string n/a yes
workspace Databricks workspace name. Optionally will be used as deployment name, if add_deployment_name is true. string n/a yes

Outputs

Name Description
cross_account_role_name Name of the cross-account IAM role created for the Databricks workspace
databricks_host Databricks workspace URL for the given created workspace.
databricks_token Databricks workspace tokens for the given created workspace. Can be used to create resources in the workspace in the same Terraform state.
security_group_id ID of the security group created for the Databricks workspace

About

Terraform module for creating a Databricks workspace in AWS within an existing VPC

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •