Provisioning Azure Databricks workspace with a Hub & Spoke firewall for data exfiltration protection
This template provides an example deployment of: Hub-Spoke networking with egress firewall to control all outbound traffic from Databricks subnets. Details are described in: https://databricks.com/blog/2020/03/27/data-exfiltration-protection-with-azure-databricks.html
With this setup, you can setup firewall rules to block / allow egress traffic from your Databricks clusters. You can also use firewall to block all access to storage accounts, and use private endpoint connection to bypass this firewall, such that you allow access only to specific storage accounts.
To find IP and FQDN for your deployment, go to: https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr
Resources to be created:
- Resource group with random prefix
- Tags, including
Owner
, which is taken fromaz account show --query user
- Hub-Spoke topology, with hub firewall in hub vnet's subnet.
- Associated firewall rules, both FQDN and network rule using IP.
Note You can customize this module by adding, deleting or updating the Azure resources to adapt the module to your requirements. A deployment example using this module can be found in examples/adb-exfiltration-protection
- Reference this module using one of the different module source types
- Add a
variables.tf
with the same content in variables.tf - Add a
terraform.tfvars
file and provide values to each defined variable - Add a
output.tf
file. - (Optional) Configure your remote backend
- Run
terraform init
to initialize terraform and get provider ready. - Run
terraform apply
to create the resources.
Most of the values are to be found at: https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions and https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr
In variables.tfvars
, set these variables (bigger regions have multiple instances of each service):
metastore = ["consolidated-westeurope-prod-metastore.mysql.database.azure.com"]
scc_relay = ["tunnel.westeurope.azuredatabricks.net"]
webapp_ips = ["52.230.27.216/32"] # given at UDR page
eventhubs = ["prod-westeurope-observabilityeventhubs.servicebus.windows.net"]
# find these for your region, follow Databricks blog tutorial.
firewallfqdn = ["dbartifactsprodseap.blob.core.windows.net","dbartifactsprodeap.blob.core.windows.net","dblogprodseasia.blob.core.windows.net","cdnjs.com"]
Name | Version |
---|---|
azurerm | =2.83.0 |
databricks | 0.3.10 |
Name | Version |
---|---|
azurerm | 2.83.0 |
external | 2.2.0 |
random | 3.1.0 |
dns | 3.3.0 |
No modules.
Name | Description | Type | Default | Required |
---|---|---|---|---|
bypass_scc_relay | n/a | bool |
true |
no |
dbfs_prefix | n/a | string |
"dbfs" |
no |
eventhubs | n/a | list(string) |
n/a | yes |
firewallfqdn | n/a | list(string) |
n/a | yes |
hubcidr | n/a | string |
"10.178.0.0/20" |
no |
metastore | n/a | list(string) |
n/a | yes |
no_public_ip | n/a | bool |
true |
no |
private_subnet_endpoints | n/a | list |
[] |
no |
rglocation | n/a | string |
"southeastasia" |
no |
scc_relay | n/a | list(string) |
n/a | yes |
spokecidr | n/a | string |
"10.179.0.0/20" |
no |
tags | n/a | map |
{} |
no |
webappip | n/a | list(string) |
n/a | yes |
workspace_prefix | n/a | string |
"adb" |
no |
Name | Description |
---|---|
arm_client_id | n/a |
arm_subscription_id | n/a |
arm_tenant_id | n/a |
azure_region | n/a |
databricks_azure_workspace_resource_id | n/a |
resource_group | n/a |
workspace_url | n/a |