## Scikit-Learn Hierarchical Clustering
### Using MALL_CUSTOMERS_VIEW from DWC. This view has 200 records

## Install fedml_azure package

In [1]:
pip install fedml_azure --force-reinstall

Processing ./fedml_azure-1.0.0-py3-none-any.whl
Collecting hdbcli
  Using cached hdbcli-2.10.13-cp34-abi3-manylinux1_x86_64.whl (11.7 MB)
Installing collected packages: hdbcli, fedml-azure
  Attempting uninstall: hdbcli
    Found existing installation: hdbcli 2.10.13
    Uninstalling hdbcli-2.10.13:
      Successfully uninstalled hdbcli-2.10.13
  Attempting uninstall: fedml-azure
    Found existing installation: fedml-azure 1.0.0
    Uninstalling fedml-azure-1.0.0:
      Successfully uninstalled fedml-azure-1.0.0
Successfully installed fedml-azure-1.0.0 hdbcli-2.10.13
Note: you may need to restart the kernel to use updated packages.


## Import the libraries needed in this notebook

In [2]:
from fedml_azure import DwcAzureTrain

## Set up
### Creating a Training object and setting the workspace, compute target, and environment.

Before running the below cell, ensure that you have a workspace and replace the subscription_id, 
resource_group, and workspace_name with your information.

The whl file for the fedml_azure library must be passed to the pip_wheel_files key in the environment_args and 
to use scikit-learn, you must pass the name to conda_packages as well.

In [3]:
#creation of training object and creating workspace in constructor.

training = DwcAzureTrain(
                       workspace_args={"subscription_id": '<subscription_id>',
                                        "resource_group": '<resource_group>',
                                        "workspace_name": '<workspace_name>'
                                        },
                        experiment_args={'name':'test-2'},
                        environment_type='CondaPackageEnvironment',
                        environment_args={'name':'test-env-hc', 'conda_packages':['scikit-learn'],'pip_packages':['fedml_azure']},
                        compute_type='AmlComputeCluster',
                        compute_args={'vm_size':'Standard_D12_v2',
                                'vm_priority':'lowpriority',
                                'compute_name':'cpu-clu-hc',
                                'min_nodes':0,
                                'max_nodes':1,
                                'idle_seconds_before_scaledown':1700
                                })


Getting existing Workspace
Creating Experiment
Creating Compute_target
Found compute target. just use it. cpu-clu-hc
Creating Environment


### Then, we need to generate the run config. This is needed to package the configuration specified so we can submit a job for training. 

Before running the following cell, you should have a config.json file with the specified values to allow you to access to DWC. Provide this file path to config_file_path in the below cell.

You should also have the follow view MALL_CUSTOMERS_VIEW created in your DWC. To gather this data, please refer to https://www.kaggle.com/roshansharma/mall-customers-clustering-analysis/data

https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py

In [4]:
#generating the run config
src=training.generate_run_config(config_file_path='dwc_configs/config.json',
                          config_args={
                                          'source_directory':'Scikit-Learn-Hierarchical-Clustering',
                                          'script':'train_script.py',
                                          'arguments':['--model_file_name','regression.pkl', '--table_name', 'MALL_CUSTOMERS_VIEW', '--table_size', '1']
                                          }
                            )

Generating script run config
Config file already exists in the script_directory Scikit-Learn-Hierarchical-Clustering


### Submitting the job for training

In [5]:
#submitting the training run
run=training.submit_run(src)

Submitting training run
RunId: test-2_1633630168_0e428ff1
Web View: https://ml.azure.com/runs/test-2_1633630168_0e428ff1?wsid=/subscriptions/cb97564e-cea8-45a4-9c5c-a3357e8f7ee4/resourcegroups/Sample2_AzureML_Resource/workspaces/Sample2_AzureML_Worskpace&tid=42f7676c-f455-423c-82f6-dc2d99791af7

Streaming azureml-logs/20_image_build_log.txt

2021/10/07 18:05:23 Downloading source code...
2021/10/07 18:05:25 Finished downloading source code
2021/10/07 18:05:25 Creating Docker network: acb_default_network, driver: 'bridge'
2021/10/07 18:05:25 Successfully set up Docker network: acb_default_network
2021/10/07 18:05:25 Setting up Docker configuration...
2021/10/07 18:05:26 Successfully set up Docker configuration
2021/10/07 18:05:26 Logging in to registry: f29b6b92835b480fb06a79cd942c7451.azurecr.io
2021/10/07 18:05:26 Successfully logged into f29b6b92835b480fb06a79cd942c7451.azurecr.io
2021/10/07 18:05:26 Executing step ID: acb_step_0. Timeout(sec): 5400, Working directory: '', Network: '

## Register the model for deployment

In [6]:
model = training.register_model(run=run,
                           model_args={'model_name':'sklearn_hcluster_model', 
                           'model_path':'outputs/regression.pkl'},
                           resource_config_args={"cpu":1, "memory_in_gb":0.5},
                           is_sklearn_model=True)

print('Name:', model.name)
print('Version:', model.version)

Registering the model
Configuring parameters for sklearn model
Name: sklearn_hcluster_model
Version: 2
