## Code to mount and enumerate files (images) on blob storage

we want to mount not download

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets#mount-vs-download

In this example I'm going to use my training images that I use for another exercise.  I have folders of images of camping equipment.

This SAS token has read/list access to my training data that I want to enumerate using AMLS datasets (filesets)

```code
https://davewdemoblobs.blob.core.windows.net/gear-images?sv=2019-12-12&st=2020-03-10T16%3A15%3A00Z&se=2030-03-11T16%3A15%3A00Z&sr=c&sp=rl&sig=hjISASpvLRY%2F77wvJ04IQmz00dObhQXp%2FP3wYT9y8%2BY%3D
?sv=2019-12-12&st=2020-03-10T16%3A15%3A00Z&se=2030-03-11T16%3A15%3A00Z&sr=c&sp=rl&sig=hjISASpvLRY%2F77wvJ04IQmz00dObhQXp%2FP3wYT9y8%2BY%3D

```



In [1]:
from azureml.core import Workspace, Dataset, Datastore
from azureml.data import OutputFileDatasetConfig
from azureml.data.datapath import DataPath

## change these as needed
subscription_id = '52061d21-01dd-4f9e-aca9-60fff4d67ee2'
resource_group = 'MLOpsWorkshop'
workspace_name = 'mlops'

workspace = Workspace(subscription_id, resource_group, workspace_name)

In [3]:
## build a datastore that points to the storage acct using the SAS token
# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py#register-azure-blob-container-workspace--datastore-name--container-name--account-name--sas-token-none--account-key-none--protocol-none--endpoint-none--overwrite-false--create-if-not-exists-false--skip-validation-false--blob-cache-timeout-none--grant-workspace-access-false--subscription-id-none--resource-group-none-

datastore_name='training_files' # Name of the datastore to workspace
account_name='davewdemoblobs'
sas_token="sv=2019-12-12&st=2020-03-10T16%3A15%3A00Z&se=2030-03-11T16%3A15%3A00Z&sr=c&sp=rl&sig=hjISASpvLRY%2F77wvJ04IQmz00dObhQXp%2FP3wYT9y8%2BY%3D"

blob_datastore = Datastore.register_azure_blob_container(
    workspace=workspace, 
    datastore_name=datastore_name, 
    container_name='gear-images',
    account_name=account_name,
    sas_token=sas_token)

In [5]:
datastores = workspace.datastores
for name, datastore in datastores.items():
    print(name, datastore.datastore_type)

training_files AzureBlob
gear_images2 AzureBlob
gear_images AzureBlob
davewdemoblobs AzureBlob
azureml_globaldatasets AzureBlob
workspacefilestore AzureFile
workspaceblobstore AzureBlob


In [6]:
datastore_paths = [(blob_datastore, 'gear_images')]

In [8]:
datastore_paths

[({
    "name": "training_files",
    "container_name": "gear-images",
    "account_name": "davewdemoblobs",
    "protocol": "https",
    "endpoint": "core.windows.net"
  },
  'gear_images')]

In [9]:
ds = Dataset.File.from_files(path=datastore_paths)

In [10]:
ds

{
  "source": [
    "('training_files', 'gear_images')"
  ],
  "definition": [
    "GetDatastoreFiles"
  ]
}

In [15]:
# now register the ds for future use
ds.register(
    workspace=workspace,
    name = 'training_files_gear_images',
    description = "for CNN training",
    create_new_version=True)

{
  "source": [
    "('training_files', 'gear_images')"
  ],
  "definition": [
    "GetDatastoreFiles"
  ],
  "registration": {
    "id": "3f44d614-6d93-4d65-8da4-7c6659a7967c",
    "name": "training_files_gear_images",
    "version": 2,
    "description": "for CNN training",
    "workspace": "Workspace.create(name='mlops', subscription_id='52061d21-01dd-4f9e-aca9-60fff4d67ee2', resource_group='MLOpsWorkshop')"
  }
}

In [16]:
workspace.datasets

{'training_files_gear_images': DatasetRegistration(id='3f44d614-6d93-4d65-8da4-7c6659a7967c', name='training_files_gear_images', version=2, description='for CNN training', tags={}), 'gear-images-axes': DatasetRegistration(id='2e5f29bc-3ff1-417d-aeff-48ae125ce32c', name='gear-images-axes', version=1, description='', tags={}), 'gear-images': DatasetRegistration(id='5f14aeb3-6cde-4a0b-a11f-da33d2e44b28', name='gear-images', version=1, description='', tags={}), 'testds': DatasetRegistration(id='ebfc0767-b85d-4e0e-8f0b-cbbc7eb2f031', name='testds', version=1, description='', tags={}), 'customer488': DatasetRegistration(id='893f1e93-1df4-4129-a2ce-4ef279488342', name='customer488', version=1, description='', tags={}), 'synovos-curated488': DatasetRegistration(id='c4c98e2c-ea72-46df-964f-a27e2786c105', name='synovos-curated488', version=1, description='', tags={}), 'synovos488class-tabular': DatasetRegistration(id='3b7e5398-963d-4840-882d-665159b0293e', name='synovos488class-tabular', version

In [17]:
# when I want to reference these in later training runs...
image_ds = workspace.datasets['training_files_gear_images']

In [18]:
image_ds

{
  "source": [
    "('training_files', 'gear_images')"
  ],
  "definition": [
    "GetDatastoreFiles"
  ],
  "registration": {
    "id": "3f44d614-6d93-4d65-8da4-7c6659a7967c",
    "name": "training_files_gear_images",
    "version": 2,
    "description": "for CNN training",
    "workspace": "Workspace.create(name='mlops', subscription_id='52061d21-01dd-4f9e-aca9-60fff4d67ee2', resource_group='MLOpsWorkshop')"
  }
}

In [19]:
import tempfile
mnt_path = tempfile.mkdtemp()
mnt_path

'/tmp/tmp0xi3jonv'

In [20]:
mounted_imgs = image_ds.mount(mnt_path)

In [21]:
mounted_imgs.start()

In [22]:
!ls $mnt_path

axes   carabiners  gloves	      harnesses  insulated_jackets  rope
boots  crampons    hardshell_jackets  helmets	 pulleys	    tents


In [23]:
os.listdir(mnt_path+'/carabiners')

['10007284x1065726_zm.jpeg',
 '10007285x1065726_zm.jpeg',
 '10011087x1013367_zm.jpeg',
 '10019159x1036971_zm.jpeg',
 '10019160x1036971_zm.jpeg',
 '10019161x1036971_zm.jpeg',
 '10019208_zm.jpeg',
 '10019209_zm.jpeg',
 '10019210x1014941_zm.jpeg',
 '10019223_zm.jpeg',
 '10019224_zm.jpeg',
 '10019231_zm.jpeg',
 '10019232_zm.jpeg',
 '100205.jpeg',
 '100207.jpeg',
 '100208.jpeg',
 '100231.jpeg',
 '10044113x1010913_zm.jpeg',
 '10044113x1012338_zm.jpeg',
 '10080850x1149501_zm.jpeg',
 '10080851x1149501_zm.jpeg',
 '10080854x1046934_zm.jpeg',
 '10085897x1010938_zm.jpeg',
 '10085899x1012905_zm.jpeg',
 '10085902x1011898_zm.jpeg',
 '10085903x1012549_zm.jpeg',
 '10090684x1010913_zm.jpeg',
 '10094158x1024698_zm.jpeg',
 '10094159x1024698_zm.jpeg',
 '10094191x1013039_zm.jpeg',
 '10094266x1012549_zm.jpeg',
 '10110498x1050047_zm.jpeg',
 '10110581x1050047_zm.jpeg',
 '10110590x1010913_zm.jpeg',
 '10110590x1010938_zm.jpeg',
 '10110590x1012204_zm.jpeg',
 '10110590x1012549_zm.jpeg',
 '10110590x1013005_zm.jpeg'