### Accessing ADLS

- #### [Access directly using storage account key](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake-gen2#--access-directly-using-the-storage-account-access-key)
- #### Access using service principal
    - [Directly](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake-gen2#create-and-grant-permissions-to-service-principal) or
    - [Mount](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake-gen2#--mount-an-azure-data-lake-storage-gen2-account-using-a-service-principal-and-oauth-20) points
      - mount points available to all the users
    - #### Best suited for:    
     - Ideal for running jobs, automated workloads
     - Best suited for auto-pilot data-eng workloads
- #### [AAD credential pass thru](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake-gen2#---access-automatically-with-your-azure-active-directory-credentials)
  - Ideal for interactive analytical workloads
  - Supports mount points
  - Available for ADLS Gen2, PowerBI, Synapse DW

#### Download these notebooks from my [github](https://github.com/bhavink/databricks/tree/master/adb4u) repo
`https://github.com/bhavink/databricks/tree/master/adb4u`

In [0]:
%run ./secrets

###Access directly using storage account key

- Open to whosoever has the storage access key
- No data access controls / restrictions
- Quick and Easy
- Not recommended

#### You'll need
- Data Lake Storage account and container name
- ADLS Gen2 account access key

In [0]:
%scala
spark.conf.set(
  "fs.azure.account.key.labsdatalake.dfs.core.windows.net",labsdatalake_key)

spark.sparkContext.hadoopConfiguration.set(
  "fs.azure.account.key.labsdatalake.dfs.core.windows.net",labsdatalake_key)

display(dbutils.fs.ls("abfss://container@labsdatalake.dfs.core.windows.net/listonly/"))

path,name,size
abfss://container@labsdatalake.dfs.core.windows.net/listonly/chick_menu.csv,chick_menu.csv,873


#### You'll need
- Data Lake Storage account and container name
- Azure Tenant Id
- Service Principal Client Id
- Service Principal Client Secret
- Add Service Principal or the group that this principal belongs to, to the data lake storage account with appropriate role
  - Stroage Blob Data Reader
  - Storage Blob Data Owner
  - Storage Blob Data Contributor

In [0]:
%scala
spark.conf.set("fs.azure.account.auth.type.labsdatalake.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.labsdatalake.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.labsdatalake.dfs.core.windows.net", labs_sp_client_id)
spark.conf.set("fs.azure.account.oauth2.client.secret.labsdatalake.dfs.core.windows.net",labs_sp_client_key)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.labsdatalake.dfs.core.windows.net", s"https://login.microsoftonline.com/$tenant_id/oauth2/token")


spark.sparkContext.hadoopConfiguration.set("fs.azure.account.auth.type.labsdatalake.dfs.core.windows.net", "OAuth")
spark.sparkContext.hadoopConfiguration.set("fs.azure.account.oauth.provider.type.labsdatalake.dfs.core.windows.net",  "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.sparkContext.hadoopConfiguration.set("fs.azure.account.oauth2.client.id.labsdatalake.dfs.core.windows.net", labs_sp_client_id)
spark.sparkContext.hadoopConfiguration.set("fs.azure.account.oauth2.client.secret.labsdatalake.dfs.core.windows.net", labs_sp_client_key)
spark.sparkContext.hadoopConfiguration.set("fs.azure.account.oauth2.client.endpoint.labsdatalake.dfs.core.windows.net", s"https://login.microsoftonline.com/$tenant_id/oauth2/token")


display(dbutils.fs.ls("abfss://container@labsdatalake.dfs.core.windows.net/listonly/"))

path,name,size
abfss://container@labsdatalake.dfs.core.windows.net/listonly/chick_menu.csv,chick_menu.csv,873


In [0]:
%python
df = spark.sql("select * from csv.`abfss://container@labsdatalake.dfs.core.windows.net/listonly/chick_menu.csv`")
display(df)

display(dbutils.fs.ls("abfss://container@labsdatalake.dfs.core.windows.net/listonly/"))

_c0,_c1
SKU,NAME
90002,Grilled Chicken Cool Wrap
90003,Grilled Nuggets
90010,Egg White Grill
90011,Chicken Noodle Soup
90018,"Chicken, Egg and Cheese Bagel"
90019,Chicken Nuggets
90026,Hash Browns
90027,Greek Yogurt Parfait
90000,"Sausage/Bacon, Egg and Cheese Biscuit"


path,name,size
abfss://container@labsdatalake.dfs.core.windows.net/listonly/chick_menu.csv,chick_menu.csv,873


In [0]:
%sql

select * from csv.`abfss://container@labsdatalake.dfs.core.windows.net/listonly/chick_menu.csv`

_c0,_c1
SKU,NAME
90002,Grilled Chicken Cool Wrap
90003,Grilled Nuggets
90010,Egg White Grill
90011,Chicken Noodle Soup
90018,"Chicken, Egg and Cheese Bagel"
90019,Chicken Nuggets
90026,Hash Browns
90027,Greek Yogurt Parfait
90000,"Sausage/Bacon, Egg and Cheese Biscuit"


In [0]:
%scala
val df = spark.read.csv("abfss://container@labsdatalake.dfs.core.windows.net/listonly/chick_menu.csv")
df.write.format("parquet").save("abfss://container@labsdatalake.dfs.core.windows.net/listonly/data.csv")

In [0]:
%scala

val configs = Map(
  "fs.azure.account.auth.type" -> "OAuth",
  "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id" -> labs_sp_client_id,
  "fs.azure.account.oauth2.client.secret" -> labs_sp_client_key,
  "fs.azure.account.oauth2.client.endpoint" -> s"https://login.microsoftonline.com/$tenant_id/oauth2/token")


// Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
  source = "abfss://container@labsdatalake.dfs.core.windows.net/",
  mountPoint = "/mnt/labsdatalake/container",
  extraConfigs = configs)

In [0]:
%fs ls /mnt/labsdatalake/container/listonly/chick_menu.csv

path,name,size
dbfs:/mnt/labsdatalake/container/listonly/chick_menu.csv,chick_menu.csv,873


### ADB with an [egress appliance](https://databricks.com/blog/2020/03/27/data-exfiltration-protection-with-azure-databricks.html) and ADLS access

#### If you have a firewall in front of ADB then the access pattern would like this:

  -  AAD Servcie endpoint enabled on ADB subnets
  -  Storage service endpoint disabled on ADB subnets (yes you have to disable service endpoint)
  -  Storage service endpoint enabled on subnet hosting firewall
  -  Firewall subnet whitelisted on ADLS Gen2 --> Netowrking --> Firewall and Virtual Networks --> Selected Networks
  -  As the firewall, ADB and ADLS are on Azure, traffic will stay on Azure backbone.