# Azure ML Tables: Local to cloud example ☁️

In this notebook, you'll see how AzureML Tables (`mltable` type) can be used locally without any connection to the cloud or Azure ML. Then you'll see how you can upload your data to the cloud by creating an asset and consume that asset in a notebook and/or job.

In [None]:
%pip install -r ../mltable-requirements.txt

## Create an MLTable file using Python 🐍

Here you build your data loading steps using the `mltable` Python SDK. The `show()` method allows you to see the effect of the data loading transformation.

In [None]:
import mltable
from mltable import DataType

# Here we use a glob pattern to find the titanic.csv file by recursively searching the current directory
paths = [{"pattern": "./**/titanic.csv"}]

# create a table from the titanic.csv file. In this case, we don't want to infer the column types
tbl = mltable.from_delimited_files(paths, infer_column_types=False)

# set the column types
column_types = {
    "PassengerId": DataType.to_int(),
    "Survived": DataType.to_int(),
    "Pclass": DataType.to_int(),
    "Name": DataType.to_string(),
    "Sex": DataType.to_string(),
    "Age": DataType.to_int(),
    "SibSp": DataType.to_int(),
    "Parch": DataType.to_int(),
    "Ticket": DataType.to_string(),
    "Fare": DataType.to_float(),
    "Cabin": DataType.to_string(),
    "Embarked": DataType.to_string(),
}
tbl = tbl.convert_column_types(column_types)

# drop passengerId, Name and ticket columns
tbl = tbl.drop_columns(["PassengerId", "Ticket", "Name"])

# display the first 5 rows of the table
tbl.show(5)

### Note: You can load an Azure ML Table into a Pandas data frame 🐼

You can load your Azure ML Table into Pandas using:

In [None]:
df = tbl.to_pandas_dataframe()
df.head(5)

## 💾 Save the data loading steps to an MLTable file

We recommend storing the `MLTable` file with the data, so you have a self-contained artifact where everything you need is in that folder path.

In [None]:
# save the MLTable to local disk - in the same directory as the titanic.csv file
tbl.save("./data")

## ♻️ Reproduce the data loading steps

With your data loading steps saved into an `MLTable` file, you can re-load the Table you want at any point in time using the `load()` method:

In [None]:
# load in the table from the saved location
tbl = mltable.load("./data")

# load the table into a pandas dataframe
df = tbl.to_pandas_dataframe()

# show the first few rows of the dataframe
df.head()

## Create a data asset to aid sharing and reproducibility 🤝

Your data (including the `MLTable` file) is currently saved on disk, making it hard to share with Team members. By creating a *data asset* in AzureML, your data will be uploaded to cloud storage and "bookmarked", meaning your Team members can access the MLTable using a friendly name. Also, the data asset is *versioned*.

In [None]:
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AML_WORKSPACE_NAME>"

In [None]:
from azure.ai.ml import MLClient
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential

# connect to the AzureML workspace
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

my_path = "./data"

my_data = Data(
    path=my_path,
    type=AssetTypes.MLTABLE,
    description="The titanic dataset.",
    name="titanic-example",
)

ml_client.data.create_or_update(my_data)

## Access data asset in an interactive session

Now you have your MLTable stored in the cloud, you and Team members can access it using a friendly name in an interactive session (for example, a notebook).

In [None]:
import mltable
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# connect to the AzureML workspace
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

# get the latest version of the data asset
data_asset = ml_client.data.list(name="titanic-example").next()

# create a table
tbl = mltable.load(f"azureml:/{data_asset.id}")

# load into a pandas dataframe
df = tbl.to_pandas_dataframe()
df.head()

## Access data asset into a job

You can also access your Table in a job, using:

In [None]:
from azure.ai.ml import MLClient, command, Input
from azure.ai.ml.entities import Environment
from azure.identity import DefaultAzureCredential

# connect to the AzureML workspace
ml_client = MLClient(
    DefaultAzureCredential(), subscription_id, resource_group, workspace
)

# get the data asset
data_asset = ml_client.data.get(name="titanic-example", version="1")

job = command(
    command="python train.py --input ${{inputs.titanic}}",
    inputs={"titanic": Input(type="mltable", path=data_asset.id)},
    compute="cpu-cluster",
    environment=Environment(
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
        conda_file="./job-env/conda_dependencies.yml",
    ),
    code="./src",
)

ml_client.jobs.create_or_update(job)