Skip to content

This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Databricks AWS staging and prod workspaces.

License

databricks/terraform-databricks-mlops-aws-project

Repository files navigation

MLOps AWS Project Module

In both of the specified staging and prod workspaces, this module:

  • Creates and configures a service principal with appropriate permissions and entitlements to run CI/CD for a project.
  • Creates a workspace directory as a container for project-specific resources

The service principals are granted CAN_MANAGE permissions on the created workspace directories.

NOTE:

  1. This module is in preview so it is still experimental and subject to change. Feedback is welcome!
  2. The Databricks providers that are passed into the module must be configured with workspace admin permissions.
  3. The module assumes that the MLOps AWS Infrastructure Module has already been applied, namely that service principal groups with token usage permissions have been created with the default name "mlops-service-principals" or by specifying the service_principal_group_name field.
  4. The service principal tokens are created with a default expiration of 100 days (8640000 seconds), and the module will need to be re-applied after this time to refresh the tokens.

Usage

provider "databricks" {
  alias = "staging"     # Authenticate using preferred method as described in Databricks provider
}

provider "databricks" {
  alias = "prod"     # Authenticate using preferred method as described in Databricks provider
}

module "mlops_aws_project" {
  source = "databricks/mlops-aws-project/databricks"
  providers = {
    databricks.staging = databricks.staging
    databricks.prod = databricks.prod
  }
  service_principal_name = "example-name"
  project_directory_path = "/dir-name"
}
provider "databricks" {
  alias = "dev" # Authenticate using preferred method as described in Databricks provider
}

provider "databricks" {
  alias = "staging"     # Authenticate using preferred method as described in Databricks provider
}

provider "databricks" {
  alias = "prod"     # Authenticate using preferred method as described in Databricks provider
}

module "mlops_aws_infrastructure" {
  source = "databricks/mlops-aws-infrastructure/databricks"
  providers = {
    databricks.dev     = databricks.dev
    databricks.staging = databricks.staging
    databricks.prod    = databricks.prod
  }
  staging_workspace_id          = "123456789"
  prod_workspace_id             = "987654321"
  additional_token_usage_groups = ["users"]     # This field is optional.
}


module "mlops_aws_project" {
  source = "databricks/mlops-aws-project/databricks"
  providers = {
    databricks.staging = databricks.staging
    databricks.prod    = databricks.prod
  }
  service_principal_name = "example-name"
  project_directory_path = "/dir-name"
  service_principal_group_name = module.mlops_aws_infrastructure.service_principal_group_name 
  # The above field is optional, especially since in this case service_principal_group_name will be mlops-service-principals either way, 
  # but this also serves to create an implicit dependency. Can also be replaced with the following line to create an explicit dependency:
  # depends_on             = [module.mlops_aws_infrastructure]
}

Usage example with Git credentials for service principal

This can be helpful for common use cases such as Git authorization for Remote Git Jobs.

data "databricks_current_user" "staging_user" {
  provider = databricks.staging
}

data "databricks_current_user" "prod_user" {
  provider = databricks.prod
}

provider "databricks" {
  alias = "staging_sp"
  host  = data.databricks_current_user.staging_user.workspace_url
  token = module.mlops_aws_project.staging_service_principal_token
}

provider "databricks" {
  alias = "prod_sp"
  host  = data.databricks_current_user.prod_user.workspace_url
  token = module.mlops_aws_project.prod_service_principal_token
}

resource "databricks_git_credential" "staging_git" {
  provider              = databricks.staging_sp
  git_username          = var.git_username
  git_provider          = var.git_provider
  personal_access_token = var.git_token    # This should be configured with `repo` scope for Databricks Repos.
}

resource "databricks_git_credential" "prod_git" {
  provider              = databricks.prod_sp
  git_username          = var.git_username
  git_provider          = var.git_provider
  personal_access_token = var.git_token    # This should be configured with `repo` scope for Databricks Repos.
}

Requirements

Name Version
terraform >=1.1.6
databricks >=0.5.8

Inputs

Name Description Type Default Required
service_principal_name The display name for the service principals. string N/A yes
project_directory_path Path/Name of Databricks workspace directory to be created for the project. NOTE: The parent directories in the path must already be created. string N/A yes
service_principal_group_name The name of the service principal group in the staging and prod workspace. The created service principals will be added to this group. string "mlops-service-principals" no

Outputs

Name Description Type Sensitive
project_directory_path Path/Name of Databricks workspace directory created for the project. string no
staging_service_principal_application_id Application ID of the created Databricks service principal in the staging workspace. string no
staging_service_principal_token Sensitive personal access token (PAT) value of the created Databricks service principal in the staging workspace. string yes
prod_service_principal_application_id Application ID of the created Databricks service principal in the prod workspace. string no
prod_service_principal_token Sensitive personal access token (PAT) value of the created Databricks service principal in the prod workspace. string yes

Providers

Name Authentication Use
databricks.staging Provided by the user. Create group, directory, and service principal module in the staging workspace.
databricks.prod Provided by the user. Create group, directory, and service principal module in the prod workspace.

Resources

Name Type
databricks_group.staging_sp_group data source
databricks_group.prod_sp_group data source
databricks_directory.staging_directory resource
databricks_permissions.staging_directory_usage resource
databricks_directory.prod_directory resource
databricks_permissions.prod_directory_usage resource
aws-service-principal.staging_sp module
aws-service-principal.prod_sp module

About

This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Databricks AWS staging and prod workspaces.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages