In both of the specified staging and prod workspaces, this module:
- Links a pre-existing AAD applications and associates it with a newly created Azure Databricks service principal, configuring appropriate permissions and entitlements to run CI/CD for a project.
- Creates a workspace directory as a container for project-specific resources
The service principals are granted CAN_MANAGE permissions on the created workspace directories.
NOTE:
- This module is in preview so it is still experimental and subject to change. Feedback is welcome!
- The Databricks providers that are passed into the module should be configured with workspace admin permissions.
- The module assumes that one of the two Azure Infrastructure Modules (with Creation or Linking) has already been applied, namely that service principal groups with token usage permissions have been created with the default name
"mlops-service-principals"or by specifying theservice_principal_group_namefield. - The service principal AAD tokens are short-lived (<60 minutes in most cases). If a long-lived token is desired, the AAD token can be used to authenticate into a Databricks provider and provision a personal access token (PAT) for the service principal.
This option will use the client secrets and tenant IDs to generate AAD tokens for authentication. The advantage to this approach is not having to manually generate short-lived AAD tokens (normally maximum lifetime of 60 minutes) each time this module needs to be used.
NOTE: This option requires that Python 3.8+ be installed to obtain the service principal's AAD token.
provider "databricks" {
alias = "staging" # Authenticate using preferred method as described in Databricks provider
}
provider "databricks" {
alias = "prod" # Authenticate using preferred method as described in Databricks provider
}
module "mlops_azure_project_with_sp_linking" {
source = "databricks/mlops-azure-project-with-sp-linking/databricks"
providers = {
databricks.staging = databricks.staging
databricks.prod = databricks.prod
}
service_principal_name = "example-name"
project_directory_path = "/dir-name"
azure_staging_client_id = "k9l8m7n6o5-e5f6-g7h8-i9j0-a1b2c3d4p4"
azure_staging_client_secret = var.azure_staging_client_secret # This value is sensitive.
azure_staging_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4"
azure_prod_client_id = "k9l8m7n6p4-e5f6-g7h8-i9j0-a1b2c3d4o5"
azure_prod_client_secret = var.azure_prod_client_secret # This value is sensitive.
azure_prod_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4"
}This option will use the provided AAD tokens for authentication. The advantage to this approach is not having to create/provide client secrets and tenant IDs.
provider "databricks" {
alias = "staging" # Authenticate using preferred method as described in Databricks provider
}
provider "databricks" {
alias = "prod" # Authenticate using preferred method as described in Databricks provider
}
module "mlops_azure_project_with_sp_linking" {
source = "databricks/mlops-azure-project-with-sp-linking/databricks"
providers = {
databricks.staging = databricks.staging
databricks.prod = databricks.prod
}
service_principal_name = "example-name"
project_directory_path = "/dir-name"
azure_staging_client_id = "k9l8m7n6o5-e5f6-g7h8-i9j0-a1b2c3d4p4"
azure_staging_aad_token = var.azure_staging_aad_token # This value is sensitive.
azure_prod_client_id = "k9l8m7n6p4-e5f6-g7h8-i9j0-a1b2c3d4o5"
azure_prod_aad_token = var.azure_prod_aad_token # This value is sensitive.
}This can be helpful for common use cases such as Git authorization for Remote Git Jobs.
data "databricks_current_user" "staging_user" {
provider = databricks.staging
}
data "databricks_current_user" "prod_user" {
provider = databricks.prod
}
provider "databricks" {
alias = "staging_sp"
host = data.databricks_current_user.staging_user.workspace_url
token = module.mlops_azure_project_with_sp_linking.staging_service_principal_aad_token
}
provider "databricks" {
alias = "prod_sp"
host = data.databricks_current_user.prod_user.workspace_url
token = module.mlops_azure_project_with_sp_linking.prod_service_principal_aad_token
}
resource "databricks_git_credential" "staging_git" {
provider = databricks.staging_sp
git_username = var.git_username
git_provider = var.git_provider
personal_access_token = var.git_token # This should be configured with `repo` scope for Databricks Repos.
}
resource "databricks_git_credential" "prod_git" {
provider = databricks.prod_sp
git_username = var.git_username
git_provider = var.git_provider
personal_access_token = var.git_token # This should be configured with `repo` scope for Databricks Repos.
}Usage example with MLOps Azure Infrastructure Module with Service Principal Linking
provider "databricks" {
alias = "dev" # Authenticate using preferred method as described in Databricks provider
}
provider "databricks" {
alias = "staging" # Authenticate using preferred method as described in Databricks provider
}
provider "databricks" {
alias = "prod" # Authenticate using preferred method as described in Databricks provider
}
module "mlops_azure_infrastructure_with_sp_linking" {
source = "databricks/mlops-azure-infrastructure-with-sp-linking/databricks"
providers = {
databricks.dev = databricks.dev
databricks.staging = databricks.staging
databricks.prod = databricks.prod
}
staging_workspace_id = "123456789"
prod_workspace_id = "987654321"
azure_staging_client_id = "k9l8m7n6o5-e5f6-g7h8-i9j0-a1b2c3d4p4"
azure_staging_client_secret = var.azure_staging_client_secret # This value is sensitive.
azure_staging_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4"
azure_prod_client_id = "k9l8m7n6p4-e5f6-g7h8-i9j0-a1b2c3d4o5"
azure_prod_client_secret = var.azure_prod_client_secret # This value is sensitive.
azure_prod_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4"
additional_token_usage_groups = ["users"] # This field is optional.
}
module "mlops_azure_project_with_sp_linking" {
source = "databricks/mlops-azure-project-with-sp-linking/databricks"
providers = {
databricks.staging = databricks.staging
databricks.prod = databricks.prod
}
service_principal_name = "example-name"
project_directory_path = "/dir-name"
azure_staging_client_id = "k9l8m7n6o5-e5f6-g7h8-i9j0-a1b2c3d4p4"
azure_staging_client_secret = var.azure_staging_client_secret # This value is sensitive.
azure_staging_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4"
azure_prod_client_id = "k9l8m7n6p4-e5f6-g7h8-i9j0-a1b2c3d4o5"
azure_prod_client_secret = var.azure_prod_client_secret # This value is sensitive.
azure_prod_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4"
service_principal_group_name = module.mlops_azure_infrastructure_with_sp_linking.service_principal_group_name
# The above field is optional, especially since in this case service_principal_group_name will be mlops-service-principals either way,
# but this also serves to create an implicit dependency. Can also be replaced with the following line to create an explicit dependency:
# depends_on = [module.mlops_azure_infrastructure_with_sp_linking]
}| Name | Version |
|---|---|
| terraform | >=1.1.6 |
| databricks | >=0.5.8 |
| python <Option 1> | >=3.8 |
| Name | Description | Type | Default | Required |
|---|---|---|---|---|
| service_principal_name | The display name for the service principals. | string | N/A | yes |
| project_directory_path | Path/Name of Azure Databricks workspace directory to be created for the project. NOTE: The parent directories in the path must already be created. | string | N/A | yes |
| azure_staging_client_id | The client ID of the AAD service principal in the staging workspace that will be used to READ the model registry from the dev workspace. | string | N/A | yes |
| azure_staging_aad_token | The AAD token of the service principal in the staging workspace. This will need to be manually refreshed once it expires (often within several minutes). NOTE: Only this or both azure_staging_client_secret & azure_staging_tenant_id need to be provided. |
string | null | no |
| azure_staging_client_secret | The client secret of the AAD service principal in the staging workspace. NOTE: If azure_staging_aad_token is not provided, this and azure_staging_tenant_id must be provided to generate an AAD token. |
string | null | no |
| azure_staging_tenant_id | The tenant ID of the AAD service principal in the staging workspace. NOTE: If azure_staging_aad_token is not provided, this and azure_staging_client_secret must be provided to generate an AAD token. |
string | null | no |
| azure_prod_client_id | The client ID of the AAD service principal in the prod workspace that will be used to READ the model registry from the dev & staging workspaces. | string | N/A | yes |
| azure_prod_aad_token | The AAD token of the service principal in the prod workspace. This will need to be manually refreshed once it expires (often within several minutes). NOTE: Only this or both azure_prod_client_secret & azure_prod_tenant_id need to be provided. |
string | null | no |
| azure_prod_client_secret | The client secret of the AAD service principal in the prod workspace. NOTE: If azure_prod_aad_token is not provided, this and azure_prod_tenant_id must be provided to generate an AAD token. |
string | null | no |
| azure_prod_tenant_id | The tenant ID of the AAD service principal in the prod workspace. NOTE: If azure_prod_aad_token is not provided, this and azure_prod_client_secret must be provided to generate an AAD token. |
string | null | no |
| service_principal_group_name | The name of the service principal group in the staging and prod workspace. The created service principals will be added to this group. | string | "mlops-service-principals" |
no |
| Name | Description | Type | Sensitive |
|---|---|---|---|
| project_directory_path | Path/Name of Azure Databricks workspace directory created for the project. | string | no |
| staging_service_principal_application_id | Application ID of the created Azure Databricks service principal in the staging workspace. Identical to the Azure client ID of the linked AAD application associated with the service principal. | string | no |
| staging_service_principal_aad_token | Sensitive AAD token value of the created Azure Databricks service principal in the staging workspace. | string | yes |
| prod_service_principal_application_id | Application ID of the created Azure Databricks service principal in the prod workspace. Identical to the Azure client ID of the linked AAD application associated with the service principal. | string | no |
| prod_service_principal_aad_token | Sensitive AAD token value of the created Azure Databricks service principal in the prod workspace. | string | yes |
| Name | Authentication | Use |
|---|---|---|
| databricks.staging | Provided by the user. | Create group, directory, and service principal module in the staging workspace. |
| databricks.prod | Provided by the user. | Create group, directory, and service principal module in the prod workspace. |
| Name | Type |
|---|---|
| databricks_group.staging_sp_group | data source |
| databricks_group.prod_sp_group | data source |
| databricks_directory.staging_directory | resource |
| databricks_permissions.staging_directory_usage | resource |
| databricks_directory.prod_directory | resource |
| databricks_permissions.prod_directory_usage | resource |
| azure-link-service-principal.link_staging_sp | module |
| azure-link-service-principal.link_prod_sp | module |
- AAD token generation occasionally fails with
"HTTP Error 400: Bad Request"but the query is not actually invalid.- Solution: Re-run
terraform applyand the error should disappear.
- Solution: Re-run