Skip to content

SaschaDittmann/MLOps-Databricks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Software License PRs Welcome

MLOps for Azure Databricks Example

This repo is used in a tutorial for learning how to do DevOps for Machine Learning (also called MLOps) using Azure Databricks and Azure ML Services.

The DevOps Pipelines are defined using the azure-pipelines.yml for Azure DevOps, as well as main.yml for GitHub Actions.

Using This Sample Project

If you want to run this example in Azure DevOps, you need to prepare you enviroment with the following steps.

Required Accounts And Resources

This examples uses Azure DevOps as an CI/CD toolset, as well as the Microsoft Azure platform to host you trained Machine Learning Model.

You can start with both platform completely free:

Azure Databricks Workspace

In your Azure subsciption, you need to create an Azure Databricks workspace to get started.

NOTE: I recommend to place the Azure Databricks Workspace in a new Resource Group, to be able to clean everything up more easily afterwards.

Importing This DevOps Project

As soon as you have access to the Azure DevOps platform, you're able to create a project to host your MLOps pipeline.

As soon as this is created, you can import this GitHub repository into your Azure DevOps project.

Set up The Build Pipeline

By importing the GitHub files, you also imported the azure-pipelines.yml file.

This file can be used to create your first Build Pipeline.

NOTE: In this Build Pipeline I'm using a preview feature called "Multi-Stage Pipelines". In order to use those, you should enable this preview feature.

Connecting Azure Databricks

To be able to run this pipeline, you also need to connect your Azure Databricks Workspace.

Therefor, yor first need to generate an access token.

This token must be stored as encrypted secret in your Azure DevOps Build Pipeline...

Adding an Azure Pipeline Variable

NOTE: The variable must be called databricks.token

Azure Pipeline Variables

... or your GitHub Project.

Adding an Azure Pipeline Variable

NOTE: The GitHub Secret must be called DATABRICKS_TOKEN

Azure Pipeline Variables

Connecting the Azure ML Service Workspace

Step 1: Create Azure AD Service Principal

The Databricks-Notebooks for serving your model, will create an Azure Machine Learning Workspace (and other resources) for you.

To grant Azure Databricks access rights to your Azure Subscription, you need to create a Service Principal in your Azure Active Directory.

You can do that directly in the Cloud Shell of the Azure Portal, by using one these two commands:

az ad sp create-for-rbac -n "http://MLOps-Databricks"

Least Privilege Principle: If you want to narrow that down to a specific Resource Group and Azure Role, use the following command

az ad sp create-for-rbac -n "http://MLOps-Databricks" --role contributor --scopes /subscriptions/{SubID}/resourceGroups/{ResourceGroup1}

Make a note of the result of this command, as you will need it in a later step.

Step 2: Install / Update Databricks CLI

Azure Databricks has its own place to store secrets.

At the time of creating this example, this store can be only accessed via the Databricks command-line interface (CLI).

Therefor you should install this CLI on your local machine or in the Azure Cloud Shell.

pip install -U databricks-cli

NOTE: You need python 2.7.9 or later / 3.6 or later to install and use the Databricks command-line interface (CLI)

Step 3: Store Databricks Secrets

Using the Databricks CLI, you can now create your own section (scope) for your secrets...

databricks secrets create-scope --scope azureml

... and add the required secrets to the scope.

# Use the "tenant" property from the Azure AD Service Principal command output
databricks secrets put --scope azureml --key tenant_id
# Use the "appId" property from the Azure AD Service Principal command output
databricks secrets put --scope azureml --key client_id
# Use the "password" property from the Azure AD Service Principal command output
databricks secrets put --scope azureml --key client_secret

databricks secrets put --scope azureml --key subscription_id
databricks secrets put --scope azureml --key resource_group
databricks secrets put --scope azureml --key workspace_name

OPTIONAL: Pre-Approval Checks (Azure DevOps)

To avoid high costs from the Azure Kubernetes Service, which will be created by the "Deploy To Production" job, I recommend that you set up a Pre-Approval Check for the wine-quality-production environment.

This can be done in the Environments section of your Azure Pipelines.

Azure Pipeline Environments

About

MLOps using Azure Databricks, Azure DevOps and Azure ML Services

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages