# Process disparate paths into a single view with `eval_mount`/`eval_download` 

This example demonstrates how to leverage MLTable to create a table of different storage paths that can then be mounted/download into a single view on the compute target's filesystem with `eval_mount`/`eval_download` modes. You can take specific folders and/or files from the same or different storage accounts/containers and create that view on your compute target's file system by either a mount or download mechanism. For example:

<img src="./media/eval_mount1.png" alt="evaluate mount" width="600"/>

This avoids having to create multiple inputs in your training jobs when the data is spread across different storage locations. 

In this notebook, we show a scenario where we want to download to our compute target a folder of images *and* an annotations file. The annotations file is located in the root directory on the storage account. If you were to use the standard `download` mode in your AzureML job, you would either need to create two inputs (one pointing to the folder and one to the annotations file) or you would need to download everything in the root directory. In this case the data is both the images and annotations file, so we want to keep those together.

## 📦 Install dependencies

Ensure you have the latest MLTable library and dependencies.

In [None]:
%pip install -r ../mltable-requirements.txt

## 🐍 Create an MLTable using the Python SDK

Here you build your data loading steps using the `mltable` Python SDK. The `show()` method allows you to see the effect of the data loading transformation.

In [None]:
import mltable

# create paths to the data files
# NOTE: YOU MUST USE THE SAME URI SCHEMA FOR ALL PATHS (e.g. all wasbs:// or all abfss:// or all azureml://)
paths = [
    {"folder": "wasbs://data@azuremlexampledata.blob.core.windows.net/pet-images/cat"},
    {"folder": "wasbs://data@azuremlexampledata.blob.core.windows.net/pet-images/dog"},
    {
        "file": "wasbs://data@azuremlexampledata.blob.core.windows.net/pet-images-annotations.csv"
    },
]

# create the mltable
tbl = mltable.from_paths(paths)

### 🐼 Load into a Pandas data frame

You can load your Azure ML Table into Pandas using:

In [None]:
df = tbl.to_pandas_dataframe()
df.head(5)

### 💾 Save data loading steps 
Next, you'll save all your data loading steps into an `MLTable` file. This allows you to *reproduce* your Pandas data frame at a later point in time without having to redefine the data loading steps in your code.

In [None]:
# save the data loading steps in an MLTable file
tbl.save("./disparate-files")

#### 🔍 View the saved file

In the next code cell, we show you the `MLTable` file so you can understand how the data loading steps are serialized into a file.

In [None]:
with open("./disparate-files/MLTable", "r") as f:
    print(f.read())

### 🤝 Create a data asset to aid sharing and reproducibility

You'll now create a data asset, which will automatically upload the `MLTable` to cloud storage (the default AzureML datastore) so that others can use it easily.

In [None]:
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AML_WORKSPACE_NAME>"

In [None]:
import time
from azure.ai.ml import MLClient
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential

# set the version number of the data asset to the current UTC time
VERSION = time.strftime("%Y.%m.%d.%H%M%S", time.gmtime())

# connect to the AzureML workspace
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

my_data = Data(
    path="./disparate-files",
    type=AssetTypes.MLTABLE,
    description="A sample of cat and dog images with an annotation file.",
    name="pets-mltable-example",
    version=VERSION,
)

ml_client.data.create_or_update(my_data)

### 📖 Read the data asset in a job

You can also access your Table in a job, using:

In [None]:
from azure.ai.ml import MLClient, command, Input
from azure.ai.ml.entities import Environment
from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azure.identity import DefaultAzureCredential

# connect to the AzureML workspace
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

# get the latest version of the data asset
# Note: the VERSION was set in a previous cell.
data_asset = ml_client.data.get(name="pets-mltable-example", version=VERSION)

input = {
    "pets": Input(
        type=AssetTypes.MLTABLE, path=data_asset.id, mode=InputOutputModes.EVAL_DOWNLOAD
    )
}

cmd = """
    find ${{inputs.pets}}
"""

job = command(
    command=cmd,
    inputs=input,
    compute="cpu-cluster",
    environment="azureml://registries/azureml/environments/sklearn-1.1/versions/4",
)

ml_client.jobs.create_or_update(job)

## Job output

This job is very simple - it lists out the downloaded absolute paths on the compute target's filesystem:

<img src="./media/eval_mount_output.png" alt="evaluate mount job output" width="600"/>

What you'll notice is the data is downloaded into the following directory structure on the filesystem:

```
/https%3A
    └── %2Fazuremlexampledata.blob.core.windows.net
        └── data
            ├── pet-images-annotations.csv
            └── pet-images
                ├── cat
                │   ├── cat0.jpg
                │   └── cat10.jpg
                └── dog
                    ├── dog0.jpg
                    └── dog10.jpg
```

