Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Verify the NFS Setup in AMLArc

## Prerequisites

* [Setup Azure Arc-enabled Machine Learning Training and Inferencing on AKS on Azure Stack HCI](https://github.com/Azure/AML-Kubernetes/tree/master/docs/AKS-HCI/AML-ARC-Compute.md)

* [Setup NFS Server on Azure Stack HCI and Use your Data and run managed Machine Learning Experiments On-Premises](https://github.com/Azure/AML-Kubernetes/tree/master/docs/AKS-HCI/Train-AzureArc.md)

* (Optional) Upload some training data to NFS Server for verification

## Initialize AzureML workspace

Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`. 

If you haven't done already please go to `config.json` file and fill in your workspace information.

In [None]:
from azureml.core.workspace import Workspace,  ComputeTarget
from azureml.exceptions import ComputeTargetException

ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

## Setup compute target

Find the attach name for the Arc enabled AKS-HCI in your AzureML workspace.

attach_name is the attached name for your AKS-HCI cluster you setup in [this step](https://github.com/Azure/AML-Kubernetes/blob/master/docs/AKS-HCI/AML-ARC-Compute.md#attach-your-azure-arc-enabled-cluster-to-your-azure-machine-learning-workspace-as-a-compute-target)

In [None]:
from azureml.core.compute import KubernetesCompute

attach_name = "<NAME_OF_AML_ATTACHED_COMPUTE_OF_YOUR_AKS-HCI_CLUSTER>"
arcK_target = KubernetesCompute(ws, attach_name)

print(f"compute target id in endpoint yaml: azureml:{arcK_target.name}, instance type name in deployment yaml: {arcK_target.default_instance_type}")

## Configure the training job and submit

This experiment will list the contents of the NFS mounting point on training pods.

### Create an experiement

In [None]:
from azureml.core import Experiment

experiment_name = 'nfs-demo'

exp = Experiment(workspace=ws, name=experiment_name)

### Create an environment

In [None]:
# customized environment

from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
# to install required packages
env = Environment('tutorial-env')
cd = CondaDependencies.create(pip_packages=['azureml-dataset-runtime[pandas,fuse]', 'azureml-defaults'], conda_packages = ['scikit-learn==0.22.1'])

env.python.conda_dependencies = cd

### Configure the training job

`<MountPathOnTrainingPod>` is the same as the mountPath defined in mount-config.yaml.

In [None]:
from azureml.core import ScriptRunConfig

nfs_folder = "<MountPathOnTrainingPod>" # training data are saved to <mountPoint> (have to use / as the path separator)

args = ['--nfs-folder', nfs_folder]
script_folder =  "nfs_script"
src = ScriptRunConfig(source_directory=script_folder,
                      script='test.py', 
                      arguments=args,
                      compute_target=arcK_target,
                      environment=env)

### Submit the job

Run your experiment by submitting your ScriptRunConfig object. Note that this call is asynchronous.

In [None]:
run = exp.submit(config=src)
run.wait_for_completion(show_output=True)  # specify True for a verbose log

### Verify the job

Go to the Azure Machine Learning Online Studio to verify the job status. 

* If it succeeded, in driver log, you will see the contents listed under the NFS mounting path in training pods (max number: 1000)
* If it failed, you can judge the error message in experiment page, below shows an example if you give a **wrong** NFS mounting path. Please confirm the NFS mounting path based on your config map using [mount-config.yaml](https://github.com/Azure/AML-Kubernetes/blob/master/examples/train-using-nfs/amlarc-nfs-setup/mount-config.yaml). When you saw the error message, you can choose to cancel the NFS verify experiment to avoid more retries.

![fishy](images/verify-nfs-training.png)