
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>




# MLflow Lab

In this lab we will explore the path to moving models to production with MLflow using the following steps:

1. Load in Airbnb dataset, and save both training dataset and test dataset as Delta tables
1. Train an MLlib linear regression model using all the listing features and tracking parameters, metrics artifacts and Delta table version to MLflow
1. Register this initial model and move it to staging using MLflow Model Registry
1. Add a new column, **`log_price`** to both our train and test table and update the corresponding Delta tables
1. Train a second MLlib linear regression model, this time using **`log_price`** as our target and training on all features, tracking to MLflow 
1. Compare the performance of the different runs by looking at the underlying data versions for both models
1. Move the better performing model to production in MLflow model registry

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Learning Objectives:<br>

By the end of this lab, you should be able to;
* Create Delta tables from existing data
* Explain Delta history feature and history retention policy
* Track a model fit process with MLflow
* Register a model with MLflow Model Registry
* Manage MLflow model lifecycle
* Select best model and move it production with MLflow

## Lab Setup

The first thing we're going to do is to **run setup script**. This script will define the required configuration variables that are scoped to each user.

In [0]:
%run "../Includes/Classroom-Setup"

Python interpreter will be restarted.
Python interpreter will be restarted.


Resetting the learning environment:
| No action taken

Skipping install of existing datasets to "dbfs:/mnt/dbacademy-datasets/scalable-machine-learning-with-apache-spark/v02"

Validating the locally installed datasets:
| listing local files...(3 seconds)
| validation completed...(3 seconds total)

Creating & using the schema "charlie_ohara_4mi2_da_sml" in the catalog "hive_metastore"...(0 seconds)

Predefined tables in "charlie_ohara_4mi2_da_sml":
| -none-

Predefined paths variables:
| DA.paths.working_dir: dbfs:/mnt/dbacademy-users/charlie.ohara@standard.ai/scalable-machine-learning-with-apache-spark
| DA.paths.user_db:     dbfs:/mnt/dbacademy-users/charlie.ohara@standard.ai/scalable-machine-learning-with-apache-spark/database.db
| DA.paths.datasets:    dbfs:/mnt/dbacademy-datasets/scalable-machine-learning-with-apache-spark/v02

Setup completed (9 seconds)






##  Step 1. Creating Delta Tables




Data versioning is an advantage of using Delta Lake, which preserves previous versions of datasets so that you can restore later.

Let's split our dataset into train and test datasets, and writing them out in Delta format. You can read more at the Delta Lake <a href="https://docs.delta.io/latest/index.html" target="_blank">documentation</a>.

In [0]:
file_path = "dbfs:/mnt/dbacademy-datasets/scalable-machine-learning-with-apache-spark/v02/airbnb/sf-listings/sf-listings-2019-03-06-clean.delta/"
airbnb_df = spark.read.format("delta").load(file_path)

train_df, test_df = airbnb_df.randomSplit([.8, .2], seed=42)

In [0]:
train_delta_path = f"dbfs:/mnt/dbacademy-users/charlie.ohara@standard.ai/scalable-machine-learning-with-apache-spark/train.delta"
test_delta_path = f"dbfs:/mnt/dbacademy-users/charlie.ohara@standard.ai/scalable-machine-learning-with-apache-spark/test.delta"

# In case paths already exists
dbutils.fs.rm(train_delta_path, True)
dbutils.fs.rm(test_delta_path, True)

# write our test and train data as a delta table so we can keep version history   
train_df.write.mode("overwrite").format("delta").save(train_delta_path)
test_df.write.mode("overwrite").format("delta").save(test_delta_path)





Let's now read in our train and test Delta tables, specifying that we want the first version of these tables. This <a href="https://databricks.com/blog/2019/02/04/introducing-delta-time-travel-for-large-scale-data-lakes.html" target="_blank">blog post</a> has a great example of how to read in a Delta table at a given version.

In [0]:
# TODO
data_version = 0
train_delta = spark.read.format("delta").option("versionAsOf", data_version).load(train_delta_path)
test_delta = spark.read.format("delta").option("versionAsOf", data_version).load(test_delta_path)




### Review Delta Table History
All the transactions for this table are stored within this table including the initial set of insertions, update, delete, merge, and inserts.

In [0]:
display(spark.sql(f"DESCRIBE HISTORY delta.`{train_delta_path}`"))

version,timestamp,userId,userName,operation,operationParameters,job,notebook,clusterId,readVersion,isolationLevel,isBlindAppend,operationMetrics,userMetadata,engineInfo
0,2024-02-09T16:55:56.235+0000,6043631322962989,charlie.ohara@standard.ai,WRITE,"Map(mode -> Overwrite, partitionBy -> [])",,List(3021276973847115),0105-195350-nkhe7nqr,,WriteSerializable,False,"Map(numFiles -> 4, numOutputRows -> 5786, numOutputBytes -> 209126)",,Databricks-Runtime/12.2.x-cpu-ml-scala2.12






By default Delta tables <a href="https://docs.databricks.com/delta/delta-batch.html#data-retention" target="_blank">keep a commit history of 30 days</a>. This retention period can be adjusted by setting **`delta.logRetentionDuration`**, which will determine how far back in time you can go. Note that setting this can result in storage costs to go up. 

<img src="https://files.training.databricks.com/images/icon_note_24.png"/> Be aware that versioning with Delta in this manner may not be feasible as a long term solution. The retention period of Delta tables can be increased, but with that comes additional costs to storage. Alternative methods of data versioning when training models and tracking to MLflow is to save copies of the datasets, either as an MLflow artifact (for a small dataset), or save to a separate distributed location and record the location of the underlying dataset as a tag in MLflow





## Step 2. Log Initial Run to MLflow

Let's first log a run to MLflow where we use all features. We use the same approach with RFormula as before. This time however, let's also log both the version of our data and the data path to MLflow.

In [0]:
# TODO
import mlflow
import mlflow.spark
from pyspark.ml.regression import LinearRegression
from pyspark.ml import Pipeline
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.feature import RFormula

with mlflow.start_run(run_name="lr_model") as run:
    # Log parameters
    # used for later reference
    mlflow.log_param("data_path", train_delta_path)  
    # TODO: Log label: price-all-features
    mlflow.log_param("label", "price-all-features")
    # TODO: Log data_version: data_version
    mlflow.log_param("data_version", data_version)

    # Create pipeline
    # This essentially handles mushing all our features into 1 array
    r_formula = RFormula(formula="price ~ .", featuresCol="features", labelCol="price", handleInvalid="skip")
    # Then passing those features into linear regression algorithm 
    lr = LinearRegression(labelCol="price", featuresCol="features")
    # Combining the data prep step with the actually applying the algorithm step in a pipeline 
    pipeline = Pipeline(stages = [r_formula, lr])
    # Then creating the model
    # using the linear regression algorithm to predict the price based on the training data mapping of features to price
    model = pipeline.fit(train_delta)

    # Log pipeline
    # TODO: Log model: model
    mlflow.spark.log_model(model, "model")

    # Create predictions and metrics
    # Use transform to apply the model to predict the price of the test data 
    pred_df = model.transform(test_delta)
    # evalute the performance using rmse and r2 
    regression_evaluator = RegressionEvaluator(labelCol="price", predictionCol="prediction")
    rmse = regression_evaluator.setMetricName("rmse").evaluate(pred_df)
    r2 = regression_evaluator.setMetricName("r2").evaluate(pred_df)

    # Log metrics
    # TODO: Log RMSE
    mlflow.log_metric("rmse", rmse)
    # TODO: Log R2 - r2 closer to 1 = perfect fit, closer to 0 = terrible fit = terrible at predicting 
    mlflow.log_metric("r2", r2)

    run_id = run.info.run_id

2024/02/09 17:32:49 INFO mlflow.spark: Inferring pip requirements by reloading the logged model from the databricks artifact repository, which can be time-consuming. To speed up, explicitly specify the conda_env or pip_requirements when calling log_model().






## Step 3. Register Model and Move to Staging Using MLflow Model Registry

We are happy with the performance of the above model and want to move it to staging. Let's create the model and register it to the MLflow model registry.

<img src="https://files.training.databricks.com/images/icon_note_24.png"/> Make sure the path to **`model_uri`** matches the subdirectory (the second argument to **`mlflow.log_model()`**) included above.

In [0]:
model_uri = f"runs:/{run_id}/model"

model_name = "mllib-lr_charlie-ohara-4mi2-da-sml"
print(f"Model Name: {model_name}\n")

# this registers a model in the Machine Learning models tab which then can be used for production 
model_details = mlflow.register_model(model_uri=model_uri, name=model_name)

Model Name: mllib-lr_charlie-ohara-4mi2-da-sml



Successfully registered model 'mllib-lr_charlie-ohara-4mi2-da-sml'.
2024/02/09 17:36:46 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: mllib-lr_charlie-ohara-4mi2-da-sml, version 1
Created version '1' of model 'mllib-lr_charlie-ohara-4mi2-da-sml'.






Transition model to staging.

In [0]:
from mlflow.tracking.client import MlflowClient

client = MlflowClient()

client.transition_model_version_stage(
    name=model_name,
    version=1,
    stage="Staging"
)

Out[15]: <ModelVersion: creation_timestamp=1707500206031, current_stage='Staging', description='', last_updated_timestamp=1707500260994, name='mllib-lr_charlie-ohara-4mi2-da-sml', run_id='9a48e8310b384809882fd0b8b48e0cd6', run_link='', source='dbfs:/databricks/mlflow-tracking/3021276973847115/9a48e8310b384809882fd0b8b48e0cd6/artifacts/model', status='READY', status_message='', tags={}, user_id='6043631322962989', version='1'>

In [0]:
# Define a utility method to wait until the model is ready
def wait_for_model(model_name, version, stage="None", status="READY", timeout=300):
    import time

    last_stage = "unknown"
    last_status = "unknown"

    for i in range(timeout): 
        model_version_details = client.get_model_version(name=model_name, version=version)
        last_stage = str(model_version_details.current_stage)
        last_status = str(model_version_details.status)
        if last_status == str(status) and last_stage == str(stage):
            return

        time.sleep(1)

    raise Exception(f"The model {model_name} v{version} was not {status} after {timeout} seconds: {last_status}/{last_stage}")

In [0]:
# Force our notebook to block until the model is ready
wait_for_model(model_name, 1, stage="Staging")





Add a model description using <a href="https://mlflow.org/docs/latest/python_api/mlflow.client.html#mlflow.client.MlflowClient.update_registered_model" target="_blank">update_registered_model</a>.

In [0]:
# TODO
client.update_registered_model(model_name, "Demo model description") # general description for the model, not the specific version

Out[18]: <RegisteredModel: creation_timestamp=1707500205685, description='Demo model description', last_updated_timestamp=1707500352145, latest_versions=[], name='mllib-lr_charlie-ohara-4mi2-da-sml', tags={}>

In [0]:
wait_for_model(model_details.name, 1, stage="Staging")





##  Step 4. Feature Engineering: Evolve Data Schema

We now want to do some feature engineering with the aim of improving model performance; we can use Delta Lake to track older versions of the dataset. 

We will add **`log_price`** as a new column and update our Delta table with it.

In [0]:
from pyspark.sql.functions import col, log, exp

# Create a new log_price column for both train and test datasets
train_new = train_delta.withColumn("log_price", log(col("price")))
test_new = test_delta.withColumn("log_price", log(col("price")))




Save the updated DataFrames to **`train_delta_path`** and **`test_delta_path`**, respectively, passing the **`mergeSchema`** option to safely evolve its schema. 

Take a look at this <a href="https://databricks.com/blog/2019/09/24/diving-into-delta-lake-schema-enforcement-evolution.html" target="_blank">blog</a> on Delta Lake for more information about **`mergeSchema`**.

In [0]:
# TODO
train_new.write.mode("overwrite").option("mergeSchema", "true").save(train_delta_path) # schema changed because we added a new column
test_new.write.mode("overwrite").option("mergeSchema", "true").save(test_delta_path) # schema changed because we added a new column





Look at the difference between the original & modified schemas

In [0]:
set(train_new.schema.fields) ^ set(train_delta.schema.fields)

Out[24]: {StructField('log_price', DoubleType(), True)}





Let's review the Delta history of our **`train_delta`** table and load in the most recent versions of our train and test Delta tables.

In [0]:
display(spark.sql(f"DESCRIBE HISTORY delta.`{train_delta_path}`")) # interesting that history is kept even with overwrite using delta 

version,timestamp,userId,userName,operation,operationParameters,job,notebook,clusterId,readVersion,isolationLevel,isBlindAppend,operationMetrics,userMetadata,engineInfo
1,2024-02-09T17:42:00.198+0000,6043631322962989,charlie.ohara@standard.ai,WRITE,"Map(mode -> Overwrite, partitionBy -> [])",,List(3021276973847115),0105-195350-nkhe7nqr,0.0,WriteSerializable,False,"Map(numFiles -> 4, numOutputRows -> 5786, numOutputBytes -> 226210)",,Databricks-Runtime/12.2.x-cpu-ml-scala2.12
0,2024-02-09T16:55:56.235+0000,6043631322962989,charlie.ohara@standard.ai,WRITE,"Map(mode -> Overwrite, partitionBy -> [])",,List(3021276973847115),0105-195350-nkhe7nqr,,WriteSerializable,False,"Map(numFiles -> 4, numOutputRows -> 5786, numOutputBytes -> 209126)",,Databricks-Runtime/12.2.x-cpu-ml-scala2.12


In [0]:
data_version = 1 # read in based on version, assuming latest is the default 
train_delta_new = spark.read.format("delta").option("versionAsOf", data_version).load(train_delta_path)  
test_delta_new = spark.read.format("delta").option("versionAsOf", data_version).load(test_delta_path)





## Step 5. Use **`log_price`** as Target and Track Run with MLflow

Retrain the model on the updated data and compare its performance to the original, logging results to MLflow.

In [0]:
with mlflow.start_run(run_name="lr_log_model") as run:
    # Log parameters
    # run again using log price instead 
    mlflow.log_param("label", "log-price")
    # specify the delta data version used 
    mlflow.log_param("data_version", data_version)
    # specify the source path 
    mlflow.log_param("data_path", train_delta_path)    

    # Create pipeline
    # define the feature engineering 
    r_formula = RFormula(formula="log_price ~ . - price", featuresCol="features", labelCol="log_price", handleInvalid="skip")  
    # define the algorithm used 
    lr = LinearRegression(labelCol="log_price", predictionCol="log_prediction")
    # combine the steps needed 
    pipeline = Pipeline(stages = [r_formula, lr])
    # create a model fitting the new data with log price, predicting the log price 
    pipeline_model = pipeline.fit(train_delta_new)

    # Log model and update the registered model
    mlflow.spark.log_model(
        spark_model=pipeline_model,
        artifact_path="log-model",
        registered_model_name=model_name
    )  

    # Create predictions and metrics
    pred_df = pipeline_model.transform(test_delta)
    exp_df = pred_df.withColumn("prediction", exp(col("log_prediction")))
    rmse = regression_evaluator.setMetricName("rmse").evaluate(exp_df)
    r2 = regression_evaluator.setMetricName("r2").evaluate(exp_df)

    # Log metrics
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)  

    run_id = run.info.run_id

2024/02/09 17:45:48 INFO mlflow.spark: Inferring pip requirements by reloading the logged model from the databricks artifact repository, which can be time-consuming. To speed up, explicitly specify the conda_env or pip_requirements when calling log_model().
Registered model 'mllib-lr_charlie-ohara-4mi2-da-sml' already exists. Creating a new version of this model...
2024/02/09 17:47:11 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: mllib-lr_charlie-ohara-4mi2-da-sml, version 2
Created version '2' of model 'mllib-lr_charlie-ohara-4mi2-da-sml'.






## Step 6. Compare Performance Across Runs by Looking at Delta Table Versions 

Use MLflow's <a href="https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.search_runs" target="_blank">**`mlflow.search_runs`**</a> API to identify runs according to the version of data the run was trained on. Let's compare our runs according to our data versions.

Filter based on **`params.data_path`** and **`params.data_version`**.

In [0]:
# TODO
data_version = 0

mlflow.search_runs(filter_string=f"params.data_version = '{data_version}'")

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.rmse,metrics.r2,params.data_path,params.label,...,tags.mlflow.databricks.notebookPath,tags.mlflow.source.name,tags.mlflow.runName,tags.mlflow.databricks.notebookID,tags.mlflow.source.type,tags.mlflow.log-model.history,tags.mlflow.databricks.cluster.info,tags.mlflow.databricks.notebook.commandID,tags.mlflow.databricks.webappURL,tags.mlflow.databricks.cluster.libraries
0,9a48e8310b384809882fd0b8b48e0cd6,3021276973847115,FINISHED,dbfs:/databricks/mlflow-tracking/3021276973847...,2024-02-09 17:31:30.133000+00:00,2024-02-09 17:34:21.085000+00:00,350.46762,0.119544,dbfs:/mnt/dbacademy-users/charlie.ohara@standa...,price-all-features,...,/Users/charlie.ohara@standard.ai/scalable-mach...,/Users/charlie.ohara@standard.ai/scalable-mach...,lr_model,3021276973847115,NOTEBOOK,"[{""artifact_path"":""model"",""flavors"":{""spark"":{...","{""cluster_name"":""charlie"",""spark_version"":""12....",4404113587905816260_6297555016341071165_4a05bd...,https://us-central1.gcp.databricks.com,"{""installable"":[],""redacted"":[]}"
1,97598c3e8b2a435b864e2e014d63392b,3021276973847115,FAILED,dbfs:/databricks/mlflow-tracking/3021276973847...,2024-02-09 17:30:37.254000+00:00,2024-02-09 17:30:50.984000+00:00,,,dbfs:/mnt/dbacademy-users/charlie.ohara@standa...,price-all-features,...,/Users/charlie.ohara@standard.ai/scalable-mach...,/Users/charlie.ohara@standard.ai/scalable-mach...,lr_model,3021276973847115,NOTEBOOK,,"{""cluster_name"":""charlie"",""spark_version"":""12....",4404113587905816260_5855800030435895322_06cca6...,https://us-central1.gcp.databricks.com,"{""installable"":[],""redacted"":[]}"


In [0]:
# TODO
data_version = 1

mlflow.search_runs(filter_string=f"params.data_version = '{data_version}'")

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.rmse,metrics.r2,params.data_path,params.label,...,tags.mlflow.databricks.notebookPath,tags.mlflow.source.name,tags.mlflow.runName,tags.mlflow.databricks.notebookID,tags.mlflow.source.type,tags.mlflow.log-model.history,tags.mlflow.databricks.cluster.info,tags.mlflow.databricks.notebook.commandID,tags.mlflow.databricks.webappURL,tags.mlflow.databricks.cluster.libraries
0,5989338c807248b988596efed30605bc,3021276973847115,FINISHED,dbfs:/databricks/mlflow-tracking/3021276973847...,2024-02-09 17:44:34.614000+00:00,2024-02-09 17:47:54.258000+00:00,355.003697,0.096606,dbfs:/mnt/dbacademy-users/charlie.ohara@standa...,log-price,...,/Users/charlie.ohara@standard.ai/scalable-mach...,/Users/charlie.ohara@standard.ai/scalable-mach...,lr_log_model,3021276973847115,NOTEBOOK,"[{""artifact_path"":""log-model"",""flavors"":{""spar...","{""cluster_name"":""charlie"",""spark_version"":""12....",4404113587905816260_8057573663244688583_4c76f0...,https://us-central1.gcp.databricks.com,"{""installable"":[],""redacted"":[]}"



**Question:** Which version of the data produced the best model?


## Step 7. Move the Best Performing Model to Production Using MLflow Model Registry

Get the most recent model version and move it to production.

In [0]:
model_version_infos = client.search_model_versions(f"name = '{model_name}'")
new_model_version = max([model_version_info.version for model_version_info in model_version_infos])

In [0]:
client.update_model_version(
    name=model_name,
    version=new_model_version,
    description="This model version was built using a MLlib Linear Regression model with all features and log_price as predictor."
)

Out[31]: <ModelVersion: creation_timestamp=1707500831187, current_stage='None', description=('This model version was built using a MLlib Linear Regression model with all '
 'features and log_price as predictor.'), last_updated_timestamp=1707501163578, name='mllib-lr_charlie-ohara-4mi2-da-sml', run_id='5989338c807248b988596efed30605bc', run_link='', source='dbfs:/databricks/mlflow-tracking/3021276973847115/5989338c807248b988596efed30605bc/artifacts/log-model', status='READY', status_message='', tags={}, user_id='6043631322962989', version='2'>

In [0]:
model_version_details = client.get_model_version(name=model_name, version=new_model_version)
model_version_details.status

Out[32]: 'READY'

In [0]:
wait_for_model(model_name, new_model_version)

In [0]:
# TODO
# Move Model into Production
client.transition_model_version_stage(name=model_name, version=new_model_version, stage="production")

Out[35]: <ModelVersion: creation_timestamp=1707500831187, current_stage='Production', description=('This model version was built using a MLlib Linear Regression model with all '
 'features and log_price as predictor.'), last_updated_timestamp=1707501248525, name='mllib-lr_charlie-ohara-4mi2-da-sml', run_id='5989338c807248b988596efed30605bc', run_link='', source='dbfs:/databricks/mlflow-tracking/3021276973847115/5989338c807248b988596efed30605bc/artifacts/log-model', status='READY', status_message='', tags={}, user_id='6043631322962989', version='2'>

In [0]:
wait_for_model(model_name, new_model_version, "Production")



 

Have a look at the MLflow model registry UI to check that your models have been successfully registered. You should see that version 1 of your model is now in staging, with version 2 in production.





To finish the lab, let's clean up by archiving both model versions and deleting the whole model from the registry

In [0]:
client.transition_model_version_stage(
    name=model_name,
    version=1,
    stage="Archived"
)

Out[36]: <ModelVersion: creation_timestamp=1707500206031, current_stage='Archived', description='', last_updated_timestamp=1707501265930, name='mllib-lr_charlie-ohara-4mi2-da-sml', run_id='9a48e8310b384809882fd0b8b48e0cd6', run_link='', source='dbfs:/databricks/mlflow-tracking/3021276973847115/9a48e8310b384809882fd0b8b48e0cd6/artifacts/model', status='READY', status_message='', tags={}, user_id='6043631322962989', version='1'>

In [0]:
wait_for_model(model_name, 1, "Archived")

In [0]:
client.transition_model_version_stage(
    name=model_name,
    version=2,
    stage="Archived"
)

Out[38]: <ModelVersion: creation_timestamp=1707500831187, current_stage='Archived', description=('This model version was built using a MLlib Linear Regression model with all '
 'features and log_price as predictor.'), last_updated_timestamp=1707501278413, name='mllib-lr_charlie-ohara-4mi2-da-sml', run_id='5989338c807248b988596efed30605bc', run_link='', source='dbfs:/databricks/mlflow-tracking/3021276973847115/5989338c807248b988596efed30605bc/artifacts/log-model', status='READY', status_message='', tags={}, user_id='6043631322962989', version='2'>

In [0]:
wait_for_model(model_name, 2, "Archived")

In [0]:
client.delete_registered_model(model_name)


## Classroom Cleanup

Run the following cell to remove lessons-specific assets created during this lesson:

In [0]:
DA.cleanup()

Resetting the learning environment:
| dropping the schema "charlie_ohara_4mi2_da_sml"...(1 seconds)
| removing the working directory "dbfs:/mnt/dbacademy-users/charlie.ohara@standard.ai/scalable-machine-learning-with-apache-spark"...(0 seconds)

Validating the locally installed datasets:
| listing local files...(3 seconds)
| validation completed...(3 seconds total)


&copy; 2023 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>