# 🛣️ Creating a Table from paths

You can create a table containing the paths on cloud storage. In this example, there are some dog and cat images stored in cloud storage in the following folder structure:

```
/pet-images
  /cat
    0.jpeg
    1.jpeg
    ...
  /dog
    0.jpeg
    1.jpeg
```

MLTable can extract the storage URIs of these images and the useful folder names for labelling purposes.

## 📦 Install dependencies

Ensure you have the latest MLTable library and dependencies.

In [None]:
%pip install -r ../mltable-requirements.txt

## 🐍 Create an MLTable using the Python SDK

Here you build your data loading steps using the `mltable` Python SDK. The `show()` method allows you to see the effect of the data loading transformation.

In [None]:
import mltable

# create paths to the data files
paths = [
    {
        "pattern": "wasbs://data@azuremlexampledata.blob.core.windows.net/pet-images/**/*.jpg"
    }
]

# create the mltable
tbl = mltable.from_paths(paths)

# extract useful information from the path
tbl = tbl.extract_columns_from_partition_format(
    "{account}/{container}/{folder}/{label}"
)

tbl = tbl.drop_columns(["account", "container", "folder"])

### 🐼 Load into a Pandas data frame

You can load your Azure ML Table into Pandas using:

In [None]:
df = tbl.to_pandas_dataframe()
df.head(5)

### 📉 Plot the images
The Path column is `StreamInfo` type, which means we can open and read the data.

In [None]:
# plot images on a grid. Note this takes ~1min to execute.
import matplotlib.pyplot as plt
from PIL import Image

fig = plt.figure(figsize=(20, 20))
columns = 4
rows = 5
for i in range(1, columns * rows + 1):
    with df.Path[i].open() as f:
        img = Image.open(f)
        fig.add_subplot(rows, columns, i)
        plt.imshow(img)
        plt.title(df.label[i])

### 💾 Save data loading steps 
Next, you'll save all your data loading steps into an `MLTable` file. This allows you to *reproduce* your Pandas data frame at a later point in time without having to redefine the data loading steps in your code.

In [None]:
# save the data loading steps in an MLTable file
tbl.save("./pets")

#### 🔍 View the saved file

In the next code cell, we show you the `MLTable` file so you can understand how the data loading steps are serialized into a file.

In [None]:
with open("./pets/MLTable", "r") as f:
    print(f.read())

## ♻️ Reproduce data loading steps

Now that the data loading steps have been serialized into a file, you can reproduce them at any point in time using the `load()` method. This means you do not need to redefine your data loading steps in code and makes it easier to share with others.

In [None]:
import mltable

# load the previously saved MLTable file
tbl = mltable.load("./pets/")
df = tbl.to_pandas_dataframe()
df.head(5)

### 🤝 Create a data asset to aid sharing and reproducibility

You'll now create a data asset, which will automatically upload the `MLTable` to cloud storage (the default AzureML datastore) so that others can use it easily.

In [None]:
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AML_WORKSPACE_NAME>"

In [None]:
import time
from azure.ai.ml import MLClient
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential

# set the version number of the data asset to the current UTC time
VERSION = time.strftime("%Y.%m.%d.%H%M%S", time.gmtime())

# connect to the AzureML workspace
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

my_data = Data(
    path="./pets",
    type=AssetTypes.MLTABLE,
    description="A sample of cat and dog images",
    name="pets-mltable-example",
    version=VERSION,
)

ml_client.data.create_or_update(my_data)

### 📖 Read the data asset in an interactive session

Now you have your MLTable stored in the cloud, you and Team members can access it using a friendly name in an interactive session (for example, a notebook).

In [None]:
import mltable
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# connect to the AzureML workspace
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

# get the latest version of the data asset
# Note: the variable VERSION is set in the previous code code
data_asset = ml_client.data.get(name="pets-mltable-example", version=VERSION)

# the table from the data asset id
tbl = mltable.load(f"azureml:/{data_asset.id}")

# load into pandas
df = tbl.to_pandas_dataframe()
df.head()