# Mounting the Azure Container to Databricks File System

The data from the competition (https://bitgrit.net/competition/23) has been uploaded to Azure File Blob Storage to leverage the big data tools the platform offers and to facilitate collaboration. 
In order to use the data with Databricks, I decided to mount the container to Databricks.
Below you will find the different steps to do so

In [0]:
# Define the dbutils secrets to access the Azure Key Vault
# this way I don't leak my secrets to the Bitcoin miners :')
scope_name = 'key-vault-scope'
secret_name = dbutils.secrets.get(scope = 'key-vault-scope', key = 'nasa-comp-data-sas-token-0-raw-data')
key_secret_name = dbutils.secrets.get(scope = 'key-vault-scope', key = 'competitiondata-storage-access-key1')
container_name = "0-raw-data"
storage_account_name = "competitiondata"

# Creating a secret scope in Databricks
- Go to the Databricks workspace.
- Navigate to the URL: https://<databricks-instance>#secrets/createScope ( https://adb-2293949851761684.4.azuredatabricks.net/#secrets/createScope).
- Fill in the required fields:
Scope Name: Enter a name for your secret scope.
DNS Name: Enter the DNS name of your Key Vault (https://atl-nasa-challenge-kv.vault.azure.net/).
Resource ID: Enter the Resource ID of your Key Vault (/subscriptions/b0b9b7ad-f69c-4572-a875-9a1fecc576f3/resourceGroups/NASA_AIRPORT_CHALLENGE_2024/providers/Microsoft.KeyVault/vaults/atl-nasa-challenge-kv).

In [0]:
spark.conf.set("fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name), key_secret_name)

In [0]:
dbutils.fs.mount(
 source = "wasbs://{0}@{1}.blob.core.windows.net".format(container_name, storage_account_name),
 mount_point = "/mnt/nasa_challenge",
 extra_configs = {"fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name): key_secret_name}
)

Below are just some tests to see what I have in my directories.

In [0]:
dbutils.fs.ls("dbfs:/mnt/nasa_challenge")

In [0]:
dbutils.fs.ls("dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/")

In [0]:
dbutils.fs.ls("dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/CWAM_train_part_1_220901_220924/220901_220924/09/01")