Skip to content

Latest commit

 

History

History
131 lines (103 loc) · 13.3 KB

File metadata and controls

131 lines (103 loc) · 13.3 KB

Provisioning Azure Databricks workspace with a Hub & Spoke firewall for data exfiltration protection

This template provides an example deployment of: Hub-Spoke networking with egress firewall to control all outbound traffic from Databricks subnets. Details are described in: https://databricks.com/blog/2020/03/27/data-exfiltration-protection-with-azure-databricks.html

With this setup, you can setup firewall rules to block / allow egress traffic from your Databricks clusters. You can also use firewall to block all access to storage accounts, and use private endpoint connection to bypass this firewall, such that you allow access only to specific storage accounts.

To find IP and FQDN for your deployment, go to: https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr

Overall Architecture

alt text

Resources to be created:

  • Resource group with random prefix
  • Tags, including Owner, which is taken from az account show --query user
  • Hub-Spoke topology, with hub firewall in hub vnet's subnet.
  • Associated firewall rules, both FQDN and network rule using IP.

How to use

Note You can customize this module by adding, deleting or updating the Azure resources to adapt the module to your requirements. A deployment example using this module can be found in examples/adb-exfiltration-protection

  1. Reference this module using one of the different module source types
  2. Add a variables.tf with the same content in variables.tf
  3. Add a terraform.tfvars file and provide values to each defined variable
  4. Add a output.tf file.
  5. (Optional) Configure your remote backend
  6. Run terraform init to initialize terraform and get provider ready.
  7. Run terraform apply to create the resources.

How to fill in variable values

Most of the values are to be found at: https://learn.microsoft.com/en-us/azure/databricks/resources/supported-regions and https://docs.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/udr

In variables.tfvars, set these variables (bigger regions have multiple instances of each service):

metastore         = ["consolidated-westeurope-prod-metastore.mysql.database.azure.com"]
scc_relay         = ["tunnel.westeurope.azuredatabricks.net"]
webapp_ips        = ["52.230.27.216/32"] # given at UDR page
eventhubs         = ["prod-westeurope-observabilityeventhubs.servicebus.windows.net"]
# find these for your region, follow Databricks blog tutorial.
firewallfqdn = ["dbartifactsprodseap.blob.core.windows.net","dbartifactsprodeap.blob.core.windows.net","dblogprodseasia.blob.core.windows.net","cdnjs.com"]

Requirements

Name Version
azurerm =2.83.0
databricks 0.3.10

Providers

Name Version
azurerm 2.83.0
external 2.2.0
random 3.1.0
dns 3.3.0

Modules

No modules.

Resources

Name Type
azurerm_databricks_workspace.this resource
azurerm_firewall.hubfw resource
azurerm_firewall_application_rule_collection.adbfqdn resource
azurerm_firewall_network_rule_collection.adbfnetwork resource
azurerm_network_security_group.this resource
azurerm_public_ip.fwpublicip resource
azurerm_resource_group.this resource
azurerm_route_table.adbroute resource
azurerm_storage_account.allowedstorage resource
azurerm_storage_account.deniedstorage resource
azurerm_subnet.hubfw resource
azurerm_subnet.private resource
azurerm_subnet.public resource
azurerm_subnet_network_security_group_association.private resource
azurerm_subnet_network_security_group_association.public resource
azurerm_subnet_route_table_association.privateudr resource
azurerm_subnet_route_table_association.publicudr resource
azurerm_virtual_network.hubvnet resource
azurerm_virtual_network.this resource
azurerm_virtual_network_peering.hubvnet resource
azurerm_virtual_network_peering.spokevnet resource
random_string.naming resource
azurerm_client_config.current data source
external_external.me data source

Inputs

Name Description Type Default Required
bypass_scc_relay n/a bool true no
dbfs_prefix n/a string "dbfs" no
eventhubs n/a list(string) n/a yes
firewallfqdn n/a list(string) n/a yes
hubcidr n/a string "10.178.0.0/20" no
metastore n/a list(string) n/a yes
no_public_ip n/a bool true no
private_subnet_endpoints n/a list [] no
rglocation n/a string "southeastasia" no
scc_relay n/a list(string) n/a yes
spokecidr n/a string "10.179.0.0/20" no
tags n/a map {} no
webappip n/a list(string) n/a yes
workspace_prefix n/a string "adb" no

Outputs

Name Description
arm_client_id n/a
arm_subscription_id n/a
arm_tenant_id n/a
azure_region n/a
databricks_azure_workspace_resource_id n/a
resource_group n/a
workspace_url n/a