
## Azure Storage connect with Azure Databricks using access keys

1. Set up the spark config
2. List the files from the container
3. Read the data

In [0]:
dbutils.secrets.help()

In [0]:
dbutils.secrets.listScopes()

In [0]:
storage_key = dbutils.secrets.get("formula1abcd", "formula1abcd-storage-key")

In [0]:
dbutils.secrets.get()

In [0]:
spark.conf.set("fs.azure.account.key.formula1abcd.dfs.core.windows.net",
                 storage_key)

In [0]:
dbutils.fs.ls("abfss://demo@formula1abcd.dfs.core.windows.net")

In [0]:
display(spark.read.csv("abfss://demo@formula1abcd.dfs.core.windows.net/circuits.csv").option("header", "true"))

# error because any additional operation should be perform before spark read the data

In [0]:
display(spark.read.option("header", "true").csv("abfss://demo@formula1abcd.dfs.core.windows.net/circuits.csv"))


## Azure Storage connect with Azure Databricks using SAS Token

1. Set up the spark config
2. List the files from the container
3. Read the data

In [0]:
sas_token = dbutils.secrets.get("formula1abcd", "formula1abcd-sas-token")

In [0]:
spark.conf.set("fs.azure.account.auth.type.formula1abcd.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.formula1abcd.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.formula1abcd.dfs.core.windows.net", sas_token)

In [0]:
dbutils.fs.ls("abfss://demo@formula1abcd.dfs.core.windows.net")

In [0]:
display(spark.read.option("header", "true").csv("abfss://demo@formula1abcd.dfs.core.windows.net/circuits.csv"))


## Azure Storage connect with Azure Databricks using Azure Principal

1. Register Azure AD application
2. Generate a secret/password for the application
3. Set Spark Config with App/Client ID, Directory/Tenant ID & Secret
4. Assign Role 'Storage Blob Data Contributor' to the Data Lake

In [0]:
client_id = dbutils.secrets.get("formula1abcd", "formula1abcd-client-id")
tenant_id = dbutils.secrets.get("formula1abcd", "formula1abcd-tenant-id")
client_secret = dbutils.secrets.get("formula1abcd", "formula1abcd-client-secret")

In [0]:
spark.conf.set("fs.azure.account.auth.type.formula1abcd.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.formula1abcd.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.formula1abcd.dfs.core.windows.net", client_id)
spark.conf.set("fs.azure.account.oauth2.client.secret.formula1abcd.dfs.core.windows.net", client_secret)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.formula1abcd.dfs.core.windows.net", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")

In [0]:
dbutils.fs.ls("abfss://demo@formula1abcd.dfs.core.windows.net")

# permission error: We need to connect Azure AD to the storage account

In [0]:
dbutils.fs.ls("abfss://demo@formula1abcd.dfs.core.windows.net")

In [0]:
display(spark.read.option("header", "true").csv("abfss://demo@formula1abcd.dfs.core.windows.net/circuits.csv"))