# Mounting the Azure Container to Databricks File System

The data from the competition (https://bitgrit.net/competition/23) has been uploaded to Azure File Blob Storage to leverage the big data tools the platform offers and to facilitate collaboration. 
In order to use the data with Databricks, I decided to mount the container to Databricks.
Below you will find the different steps to do so

In [0]:
# Define the dbutils secrets to access the Azure Key Vault
# this way I don't leak my secrets to the Bitcoin miners :')
scope_name = 'key-vault-scope'
secret_name = dbutils.secrets.get(scope = 'key-vault-scope', key = 'nasa-comp-data-sas-token-0-raw-data')
key_secret_name = dbutils.secrets.get(scope = 'key-vault-scope', key = 'competitiondata-storage-access-key1')
container_name = "0-raw-data"
storage_account_name = "competitiondata"

# Creating a secret scope in Databricks
- Go to the Databricks workspace.
- Navigate to the URL: https://<databricks-instance>#secrets/createScope ( https://adb-2293949851761684.4.azuredatabricks.net/#secrets/createScope).
- Fill in the required fields:
Scope Name: Enter a name for your secret scope.
DNS Name: Enter the DNS name of your Key Vault (https://atl-nasa-challenge-kv.vault.azure.net/).
Resource ID: Enter the Resource ID of your Key Vault (/subscriptions/b0b9b7ad-f69c-4572-a875-9a1fecc576f3/resourceGroups/NASA_AIRPORT_CHALLENGE_2024/providers/Microsoft.KeyVault/vaults/atl-nasa-challenge-kv).

In [0]:
spark.conf.set("fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name), key_secret_name)

In [0]:
dbutils.fs.mount(
 source = "wasbs://{0}@{1}.blob.core.windows.net".format(container_name, storage_account_name),
 mount_point = "/mnt/nasa_challenge",
 extra_configs = {"fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name): key_secret_name}
)

[0;31m---------------------------------------------------------------------------[0m
[0;31mExecutionError[0m                            Traceback (most recent call last)
File [0;32m<command-1131958806848271>, line 1[0m
[0;32m----> 1[0m dbutils[38;5;241m.[39mfs[38;5;241m.[39mmount(
[1;32m      2[0m  source [38;5;241m=[39m [38;5;124m"[39m[38;5;124mwasbs://[39m[38;5;132;01m{0}[39;00m[38;5;124m@[39m[38;5;132;01m{1}[39;00m[38;5;124m.blob.core.windows.net[39m[38;5;124m"[39m[38;5;241m.[39mformat(container_name, storage_account_name),
[1;32m      3[0m  mount_point [38;5;241m=[39m [38;5;124m"[39m[38;5;124m/mnt/nasa_challenge[39m[38;5;124m"[39m,
[1;32m      4[0m  extra_configs [38;5;241m=[39m {[38;5;124m"[39m[38;5;124mfs.azure.account.key.[39m[38;5;132;01m{0}[39;00m[38;5;124m.blob.core.windows.net[39m[38;5;124m"[39m[38;5;241m.[39mformat(storage_account_name): key_secret_name}
[1;32m      5[0m )

File [0;32m/databricks/python_shell/dbru

Below are just some tests to see what I have in my directories.

In [0]:
dbutils.fs.ls("dbfs:/mnt/nasa_challenge")

[FileInfo(path='dbfs:/mnt/nasa_challenge/0-original-zip-files/', name='0-original-zip-files/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/', name='1-raw-unzipped-files/', size=0, modificationTime=0)]

In [0]:
dbutils.fs.ls("dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/")

[FileInfo(path='dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/CWAM_test/', name='CWAM_test/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/CWAM_train_part_10_2230616_230709/', name='CWAM_train_part_10_2230616_230709/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/CWAM_train_part_11_2230718_230810/', name='CWAM_train_part_11_2230718_230810/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/CWAM_train_part_12_2230819_230831/', name='CWAM_train_part_12_2230819_230831/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/CWAM_train_part_1_220901_220924/', name='CWAM_train_part_1_220901_220924/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/CWAM_train_part_2_2221003_221026/', name='CWAM_train_part_2_2221003_221026/', size=0, modificationTime=0),
 FileInfo(path='dbfs:/mnt/na

In [0]:
dbutils.fs.ls("dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/CWAM_train_part_1_220901_220924/220901_220924/09/01")

[FileInfo(path='dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/CWAM_train_part_1_220901_220924/220901_220924/09/01', name='01', size=0, modificationTime=1730035491000),
 FileInfo(path='dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/CWAM_train_part_1_220901_220924/220901_220924/09/01/2022_09_01_00_00_GMT.Forecast.h5.CWAM.h5.bz2', name='2022_09_01_00_00_GMT.Forecast.h5.CWAM.h5.bz2', size=8490160, modificationTime=1730035488000),
 FileInfo(path='dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/CWAM_train_part_1_220901_220924/220901_220924/09/01/2022_09_01_00_15_GMT.Forecast.h5.CWAM.h5.bz2', name='2022_09_01_00_15_GMT.Forecast.h5.CWAM.h5.bz2', size=7697723, modificationTime=1730035477000),
 FileInfo(path='dbfs:/mnt/nasa_challenge/1-raw-unzipped-files/CWAM_train_part_1_220901_220924/220901_220924/09/01/2022_09_01_00_30_GMT.Forecast.h5.CWAM.h5.bz2', name='2022_09_01_00_30_GMT.Forecast.h5.CWAM.h5.bz2', size=6911056, modificationTime=1730035489000),
 FileInfo(path='dbfs:/mnt/nasa_challenge/1-raw-un