d-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

#![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Key Vault-Backed Secret Scopes

## Learning Objectives
By the end of this lessons, you should be able to:
* Configure Databricks to access Key Vault secrets
* Read and write data directly from Blob Storage using secrets stored in Key Vault
* Set different levels of access permission using SAS at the Storage service level
* Mount Blob Storage into DBFS
* Describe how mounting impacts secure access to data
 
### Online Resources

- [Azure Databricks Secrets](https://docs.azuredatabricks.net/user-guide/secrets/index.html)
- [Azure Key Vault](https://docs.microsoft.com/en-us/azure/key-vault/key-vault-whatis)
- [Azure Databricks DBFS](https://docs.azuredatabricks.net/user-guide/dbfs-databricks-file-system.html)
- [Introduction to Azure Blob storage](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
- [Databricks with Azure Blob Storage](https://docs.databricks.com/spark/latest/data-sources/azure/azure-storage.html)
- [Azure Data Lake Storage Gen1](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake.html#mount-azure-data-lake)
- [Azure Data Lake Storage Gen2](https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake-gen2.html)

### Classroom setup

A quick script to define a username variable in Python and Scala.

In [0]:
%run ./Includes/User-Name

### List Secret Scopes

To list the existing secret scopes the `dbutils.secrets` utility can be used.

You can list all scopes currently available in your workspace with:

In [0]:
%python
dbutils.secrets.listScopes()

### List Secrets within a specific scope


To list the secrets within a specific scope, you can supply that scope name.

In [0]:
%python
dbutils.secrets.list("demo")

### Using your Secrets

To use your secrets, you supply the scope and key to the `get` method.

Run the following cell to retrieve and print a secret.

In [0]:
%python
print(dbutils.secrets.get(scope="demo", key="storageread"))

### Secrets are not displayed in clear text

Notice that the value when printed out is `[REDACTED]`. This is to prevent your secrets from being exposed.

## Mount Azure Blob Container - Read/List

In this section, we'll demonstrating using a `SASTOKEN` that only has list and read permissions managed at the Storage Account level.

**This means:**
- Any user within the workspace can view and read the files mounted using this key
- This key can be used to mount any container within the storage account with these privileges

In [0]:
# Unmount directory if previously mounted.
MOUNTPOINT = "/mnt/commonfiles"
if MOUNTPOINT in [mnt.mountPoint for mnt in dbutils.fs.mounts()]:
  dbutils.fs.unmount(MOUNTPOINT)

# Add the Storage Account, Container, and reference the secret to pass the SAS Token
STORAGE_ACCOUNT = dbutils.secrets.get(scope="demo", key="storageaccount")
CONTAINER = "commonfiles"
SASTOKEN = dbutils.secrets.get(scope="demo", key="storageread")

# Do not change these values
SOURCE = "wasbs://{container}@{storage_acct}.blob.core.windows.net/".format(container=CONTAINER, storage_acct=STORAGE_ACCOUNT)
URI = "fs.azure.sas.{container}.{storage_acct}.blob.core.windows.net".format(container=CONTAINER, storage_acct=STORAGE_ACCOUNT)

try:
  dbutils.fs.mount(
    source=SOURCE,
    mount_point=MOUNTPOINT,
    extra_configs={URI:SASTOKEN})
except Exception as e:
  if "Directory already mounted" in str(e):
    pass # Ignore error if already mounted.
  else:
    raise e

display(dbutils.fs.ls(MOUNTPOINT))

### Define and display a Dataframe that reads a file from the mounted directory

In [0]:
salesDF = (spark.read
              .option("header", True)
              .option("inferSchema", True)
              .csv(MOUNTPOINT + "/sales.csv"))

display(salesDF)

### Filter the Dataframe and display the results

In [0]:
from pyspark.sql.functions import col

sales2004DF = (salesDF
                  .filter((col("ShipDateKey") > 20031231) &
                          (col("ShipDateKey") <= 20041231)))
display(sales2004DF)

### Details....


While we can list and read files with this token, our job will abort when we try to write.

In [0]:
try:
  sales2004DF.write.mode("overwrite").parquet(MOUNTPOINT + "/sales2004")
except Exception as e:
  print(e)

### Review

At this point you should see how to:
* Use Secrets to access blobstorage
* Mount the blobstore to dbfs (Data Bricks File System)

Mounting data to dbfs makes that content available to anyone in that workspace. 

If you want to access blob store directly without mounting the rest of the notebook demonstrate that process.

## Writing Directly to Blob using SAS token

Note that when you mount a directory, by default, all users within the workspace will have the same privileges to interact with that directory. Here, we'll look at using a SAS token to directly write to a blob (without mounting). This ensures that only users with the workspace that have access to the associated key vault will be able to write.

In [0]:
spark.conf.set(URI, dbutils.secrets.get(scope="demo", key="storagewrite"))

### Listing Directory Contents and writing using SAS token

Because the configured container SAS gives us full permissions, we can interact with the blob storage using our `dbutils.fs` methods.

In [0]:
dbutils.fs.ls(SOURCE)

We can write to this blob directly, without exposing this mount to others in our workspace.

In [0]:
sales2004DF.write.mode("overwrite").parquet(SOURCE + "/sales2004")

In [0]:
dbutils.fs.ls(SOURCE)

### Deleting using SAS token

This scope also has delete permissions.

In [0]:
dbutils.fs.rm(SOURCE + "/sales2004", True)

### Cleaning up mounts

If you don't explicitly unmount, the read-only blob that you mounted at the beginning of this notebook will remain accessible in your workspace.

In [0]:
if MOUNTPOINT in [mnt.mountPoint for mnt in dbutils.fs.mounts()]:
  dbutils.fs.unmount(MOUNTPOINT)

## Congratulations!

You should now be able to use the following tools in your workspace:

* Databricks Secrets
* Azure Key Vault
* SAS token
* dbutils.mount

-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>