This example notebook closely follows the [Databricks documentation](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake.html) for how to set up Azure Data Lake Store as a data source in Databricks.

### 0) Prerequisites

Before we can load the data, we need to take care of two things:
*Give our Databricks workspace permission to access the data lake where our data resides
*Create a place where we can manage our secrets/credentials, so we don't have to put our credentials in the notebooks

In [3]:
a=1

###0.1) Get service credentials:
<ul>
  <li> Client ID `<aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee>` and Client Credential `<NzQzY2QzYTAtM2I3Zi00NzFmLWI3MGMtMzc4MzRjZmk=>`. Follow the instructions in [Create service principal with portal](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-create-service-principal-portal). </li> 
<li> Get directory ID `<ffffffff-gggg-hhhh-iiii-jjjjjjjjjjjj>`: This is also referred to as *tenant ID*. Follow the instructions in [Get tenant ID](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-create-service-principal-portal#get-tenant-id). </li> 
<li> If you haven't set up the service app, follow this [tutorial](https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-extract-load-sql-data-warehouse). Set access at the root directory or desired folder level to the service or everyone.</li> 
    </ul>

### 0.2) Create Secret Scope
<ul>
<li> Next, Create the secret scope via the Databricks CLI, accessible through the Azure Bash CLI: https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-cli-from-azure-cloud-shell </li>
<li> Configure the secrets scope within the Databricks CLI: https://docs.databricks.com/user-guide/secrets/example-secret-workflow.html </li>
<li> Fill in the Defined secret scope and secrets to connect to the adls with hidden credentials </li>
  </ul>

### 1) Mount ADLS
Next, we mount ALDS to the DBFS file path, so all user of the cluster can have access to the data. (in Python)
we first set variables with predefined secrets. Next, we mount the ADLS to a folder of own choice,i.e. /mnt/MNIST/

In [7]:
%python
clientid = dbutils.preview.secret.get(scope = "scopename", key = "clientid")
credential = dbutils.preview.secret.get(scope = "scopename", key = "adlskeys")
refreshurl = dbutils.preview.secret.get(scope = "scopename", key = "tenantid")

    # Mount the ADLS
configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
       "dfs.adls.oauth2.client.id": clientid,
       "dfs.adls.oauth2.credential": credential,
       "dfs.adls.oauth2.refresh.url": refreshurl}

dbutils.fs.mount(
       source = "adl://<name of adls>.azuredatalakestore.net/",
       mount_point = "/mnt/MNIST/",
       extra_configs = configs)

We can list our mounted data sources with:

In [9]:
%fs ls /mnt/

path,name,size
dbfs:/mnt/MNIST/,MNIST/,0
dbfs:/mnt/training-sources/,training-sources/,0


In case we want to unmount, we can do that with:

In [11]:
dbutils.fs.unmount("/mnt/MNIST/")

###2) Direct Access

With Spark configs, the Azure Data Lake Store settings can be specified per notebook. To keep things simple, the example below includes the credentials in plaintext. However, we strongly discourage you from storing secrets in plaintext. Instead, we recommend storing the credentials as [Databricks Secrets](https://docs.azuredatabricks.net/user-guide/secrets/index.html#secrets-user-guide).

**Note:** `spark.conf` values are visible only to the DataSet and DataFrames API. If you need access to them from an RDD, refer to the [documentation](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake.html#access-azure-data-lake-store-using-the-rdd-api).

In [13]:
spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "<ApplicationIDfromRegisteredApp>")
spark.conf.set("dfs.adls.oauth2.credential", "<Keys generated from app>")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/<Tenant-ID>/oauth2/token")

In [14]:
val df = spark.read.parquet("adl://<Name of ADLS>.azuredatalakestore.net/<Directory of files>")

dbutils.fs.ls("adl://<Name of ADLS>.azuredatalakestore.net/<Directory of files>")