# DEMO: Mounting Azure Blob Storage and Azure Data Lake Store
---
This demo will show you how to *mount* two of the most popular big data stores on Azure: Azure Blob Storage and Azure Data Lake Store.  
  
*Mounting* Storage in Databricks refers to the process of establishing a connection between your Azure Databricks workspace and other Azure Storage Services. Mounting them allows them to be accessible via tools like `dbutils`. Additionally, and more importantly, it makes interacting with data stores on Azure simpler.

First, let's take a look at Mounting Azure Blob Storage.  
  
However, we need a Azure Blob Storage instance before we continue (if you haven't provisioned Azure Blob Storage, do so now):
  
Within the Azure Portal you need to grab two pieces of information:
1. The Endpoint for the blob with the format of 'wasbs://[your-container-name]@[your-storage-account-name].blob.core.windows.net'
2. The value associated with one of the Access Keys

In [0]:
# Name of the Storage Account
blobAccountName = "<ENTER YOUR STORAGE ACCOUNT NAME>"

# Name of the Container in the Blob Storage Account
containerName = "<ENTER IN THE NAME OF THE CONTAINER>"

# the second piece of information described above goes into blobKey
blobKey = "<ENTER IN THE STORAGE ACCOUNT ACCESS KEY>"

# the first piece of information described above goes into blobEndpoint
blobEndpoint = "wasbs://{0}@{1}.blob.core.windows.net/".format(containerName, blobAccountName)

Below is a command to mount the Azure Blob Storage container within your Databricks workspace.  
**Source**: The endpoint for your blob  
**mount_point**: The filepath that you will use to access and interact with the mounted storage within Azure Databricks  
**extra_configs**: Here, the only extra configuration you need is the Access Key value

## Explanation of extra_configs and How it should be constructed
In extra_configs, Databricks needs the key-value pair of "fs.azure.account.key.STORAGE-ACCOUNT-NAME-HERE.blob.core.windows.net" : blobKey  
  
The access key (i.e. the variable blobKey) should have already been provided in the above cell  
  
The config "fs.azure.account.key.STORAGE-ACCOUNT-NAME-HERE.blob.core.windows.net" is telling Databricks what kind of credential you are using for authenticating the connection to the storage account  
  
To fill this out you need to supply the Storage Account Name. For example, if you deployed a Storage Account with the name of **workshopStorage** then your configuration would look like fs.azure.account.key.**workshopStorage**.blob.core.windows.net  
  
The full `extra_configs` would look like the following: `extra_configs = {"fs.azure.account.key.workshopStorage.blob.core.windows.net":blobKey}`

In [0]:
configKey = "fs.azure.account.key.{0}.blob.core.windows.net".format(blobAccountName)

dbutils.fs.mount(
  source = blobEndpoint,
  mount_point = "/mnt/blobmount",
  extra_configs = {configKey:blobKey})

Now lets take a look and make sure it is all connected.

In [0]:
dbutils.fs.ls("/mnt/blobmount")

As you can see there is a directory, albiet an empty one. Nonetheless, this proves the mounting was successful and that files placed in this location in your Azure Blob Storage instance can be accessed here in Azure Databricks.

Now let's take a look at mounting Azure Data Lake Store. We will be using Azure Data Lake Store Gen1 for this demo. Documentation for mounting Azure Data Lake Store Gen2 can be found within the documentation for Azure Databricks (https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/adls-gen2/)

**Required before proceeding**  
- Azure Data Lake Store gen2 Instance deployed into your Azure environment
- A folder called 'main' created at the root of the Azure Data Lake Store gen 2
- App registered with Azure Active Directory (https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-create-service-principal-portal)
- Make sure we are setting correct ACL level permission using the service principal object ID. NOT the app Object ID 
  - Navigate to azure portal and select "Azure Active Directory" on the left pane. Then select "App Registrations". Find your registered app, which was set up in the step
    above. Record the "Application (Client) ID". 
  - Now open up your Azure CLI (must be preinstalled). Run the command "az ad sp show --id (Application (Client) ID)"
  - In the returned results, you will see a field called "Object ID". Record the value. 
  - Now open storage explorer (must be installed). Navigate to your adls gen 2 account and expand the hierarchy until you get to your container name. 
  - Right click the container name and select "Manage Access". A new window will pop up. Input the Object ID we recorded earlier into the "Add user or Group" field.
  - Select "Add" and "Save". You will now be ready to complete the rest of the demo.


(https://deep.data.blog/2019/03/28/avoiding-error-403-request-not-authorized-when-accessing-adls-gen-2-from-azure-databricks-while-using-a-service-principal/)


Now, once this has been set up we can get started with mounting Azure Data Lake Store gen2 in your Databricks workspace. First let's define some needed information for the mounting.  
  
**Note**: As you will see there is a little more involved when it comes to mounting Azure Data Lake Store gen2 due to how access and permissions work.

In [0]:
configs = {"fs.azure.account.auth.type": "OAuth",
           "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id": "<Application ID>", #Application ID found on the information blade of the App within App Registrations in Azure Active Directory
           "fs.azure.account.oauth2.client.secret": "<key>", # This is the key value that is only shown once, if you did not write it down create a new key and put the value here
           "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<Directory ID>/oauth2/token"} # This is found under Azure Active Directory > Properties - Directory ID (bottom part of the blade)

# Optionally, you can add <your-directory-name> to the source URI of your mount point.
dbutils.fs.mount(
  source = "abfss://<filesystem name>@<storage account name>.dfs.core.windows.net/",
  mount_point = "/mnt/adlsgen2",
  extra_configs = configs)

Let's make sure it is connected!

In [0]:
dbutils.fs.ls("/mnt/adlsgen2/")

Another useful feature is the `refreshMounts` command that is accessible via the `dbutils` commands. This refreshes the connection between your Databricks workspace and the storage locations on Azure. It is a good way of ensuring that you are still connected to the mounts.

In [0]:
dbutils.fs.refreshMounts()

Before we finish with this demo, let's take a look at how to *unmount* storage locations from your Databricks workspace.

In [0]:
dbutils.fs.unmount("/mnt/adlsgen2")
dbutils.fs.unmount("/mnt/blobmount")