## Azure ML Stable Diffusion Expriment

To use this notebook, you need to download `config.json` file from Azure ML Workspace and place it in this folder. This will allow us to get the workspace reference right away:

First we get the dataset from the local directory 

In [1]:
#Please change the directory to wherever the art data is stored. Etiher locally or on Azure ML notebooks directory
dataset = r'Ukiyo_e'

Then we have to remove certain keywords from the image names.

In [3]:
import os
import re
# List of words to remove from filenames
words_to_remove = ["train", "test", "valid"]

# Function to rename files
def rename_files(directory):
    for foldername, subfolders, filenames in os.walk(directory):
        for filename in filenames:
            # Use regular expressions to replace "train," "test," and "valid" in filenames
            new_filename = re.sub(r'(?i)(train|test|valid)', '', filename)
            
            # Check if the filename has changed
            if new_filename != filename:
                old_path = os.path.join(foldername, filename)
                new_path = os.path.join(foldername, new_filename)
                # Rename the file
                os.rename(old_path, new_path)
                print(f'Renamed: {filename} -> {new_filename}')

# Call the function to rename files
rename_files(dataset)

Next we get the Azure ML workspace. 

In [2]:
#Remeber to download the config file from the Azure Portal for your workspace
from azureml.core import Workspace

try:
    ws = Workspace.from_config()
    print(ws.name, ws.location, ws.resource_group, ws.location, sep='\t')
    print('Library configuration succeeded')
except:
    print('Workspace not found')

stable-diffusion	southeastasia	Azure-ML-Workshop	southeastasia
Library configuration succeeded


Now we get the computer cluster. If the cluster does not exist - we will create it programmatically!

In [3]:
#If you have created a cluster, make sure that cluster_name = "AzMlCluster" is changed to the name of the cluster you creatd
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cluster_name = "AzMlCluster"

# Verify that cluster does not exist already
try:
    cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D3_V2',
                                                           vm_priority='lowpriority',
                                                           min_nodes=1,
                                                           max_nodes=4)
    cluster = ComputeTarget.create(ws, cluster_name, compute_config)

cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


Now we  get the default datastore in the workspace.

In [4]:
from azureml.core import Dataset
from azureml.data.datapath import DataPath
ds = ws.get_default_datastore()

And now we upload the images dataset into the Azure ML Workspace's default datastore:

In [6]:
Dataset.File.upload_directory(src_dir=dataset, target=DataPath(ds,"ukiyo_e"), overwrite = True, show_progress=True)

Validating arguments.
Arguments validated.
'overwrite' is set to True. Any file already present in the target will be overwritten.
Uploading files from 'C:/Marcus/Important Docs/MLSA/WikiArt/Ukiyo_e' to 'ukiyo_e'
Creating new dataset


{
  "source": [
    "('workspaceblobstore', '/ukiyo_e')"
  ],
  "definition": [
    "GetDatastoreFiles"
  ]
}