<a href="https://colab.research.google.com/github/dave-killough/databricks-colab/blob/main/Databricks_CLI_Secrets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Databricks CLI Secrets 🔐
It's a critical best practice to keep sensitive information separate from code and logs.  One option for doing this is creating a Databricks secret that guards the sensitive information while enabling its safe use in code and redaction in logs. Although configuring a Databricks secret is not supported in the Workspace UI, it can be easily accomplished through the Databricks CLI. This entire process can be executed within a notebook.

In the example below, a storage account connection string is saved in a secret that can be securely accessed from Databricks code for operations such as updating a blob's content type. The first step is configuring the notebook's instance for running the Databricks CLI. In the following cell, replace the <WORKSPACE_HOST> part with the beginning of the URL used to access your Workspace. Your host value should look similar to this:

`https://adb-9999999999999999.10.azuredatabricks.net`

The next step is setting the access token. Click your user name on the top right of your Databricks Workspace UI, then select `User Settings`, then `Developer` under **Settings**, and then `MANAGE` in the **Access tokens** section.  Select `Generate new token` to enter a comment and generate the token.  Replace the `<ACCESS_TOKEN>` part below with the generated token.

⚠️ CAUTION: KEEP YOUR NOTEBOOK PRIVATE TO ENSURE THE SECURITY OF YOUR TOKEN AND PROTECT SENSITIVE INFORMATION.

After making the replacements, run the code cell below to create the configuration file that the Databricks CLI will use to authenticate your session:

In [None]:
with open("/root/.databrickscfg", "w") as file:
    file.write("""\
[DEFAULT]
host = <WORKSPACE_HOST>
token = <ACCESS_TOKEN>
""")

Now run the following cell, as it is, to quickly install the latest Databricks CLI and test your configuration by listing your clusters:

In [None]:
!rm -f '/usr/local/bin/databricks'
!curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
!databricks -v
!databricks clusters list

If the configuration and installation were successful, you are now able to run CLI commands directly in the notebook! To proceed with setting up the secret, you'll need a scope to contain it. Below, I have set the scope to be `eo990pipeline`, which the code uses for accessing the secret. Run the following cell to create this scope:

In [None]:
!databricks secrets create-scope eo990pipeline

With the scope created, we can now put a secret in it. For this example, we're using a connection string for an Azure storage account. You can obtain this from Azure by going to the storage account and selecting `Access keys` in the left pane under **Security + Networking**. Then, click `Show` next to the **Connection string** for either key1 or key2. Copy the value and put it in the cell below in place of `<SERVICE_ACCOUNT_CONNECTION_STRING>`. Run the cell to place the secret in the scope:

⚠️ CAUTION: ENSURE THE VALUE IS INSIDE OF DOUBLE QUOTES TO WORK PROPERLY.

In [None]:
!databricks secrets put-secret eo990pipeline sa_connection_string --string-value "<SERVICE_ACCOUNT_CONNECTION_STRING>"

Run the following code cell to verify the new scope and secret have been created successfully:

In [None]:
!databricks secrets list-scopes
!databricks secrets list-secrets eo990pipeline

This code, provided below, is specifically for use in Databricks to securely access the secret without revealing it. In this example, we're changing the content type of an HTML file within a blob to 'text/html', enabling it to be browsed instead of downloaded.

``` python
if 'DATABRICKS_RUNTIME_VERSION' in os.environ:
    # the content-type needs to be changed for serving the output
    !pip install pip install azure-storage-blob -q
    from azure.storage.blob import BlobServiceClient, ContentSettings
    from pyspark.dbutils import DBUtils
    dbutils = DBUtils(spark)
    connection_string = dbutils.secrets.get(scope="eo990pipeline", key="sa_connection_string")
    container_name = "eo990pipeline"
    blob_name = "clustering.html"
    blob_service_client = BlobServiceClient.from_connection_string(connection_string)
    blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
    blob_client.set_http_headers(content_settings=ContentSettings(content_type='text/html'))
```

And that's it! You can save this notebook to a secure location so you can revisit and modify your secrets setup or script the Databricks CLI for other tasks. This approach enables you to centralize your ACCESS_TOKEN, making it accessible from any computer, while also helping you reduce cluster costs and stay up to date with the latest Databricks CLI. Enjoy! -- Dave