## Scikit-Learn PCA and Logistic Regression Pipeline
### Using BREASTCANCER_VIEW from DWC. This view has 569 records

## Install fedml_azure package

In [1]:
pip install fedml_azure-1.0.0-py3-none-any.whl --force-reinstall

Processing ./fedml_azure-1.0.0-py3-none-any.whl
Collecting hdbcli
  Using cached hdbcli-2.10.13-cp34-abi3-manylinux1_x86_64.whl (11.7 MB)
Installing collected packages: hdbcli, fedml-azure
  Attempting uninstall: hdbcli
    Found existing installation: hdbcli 2.10.13
    Uninstalling hdbcli-2.10.13:
      Successfully uninstalled hdbcli-2.10.13
  Attempting uninstall: fedml-azure
    Found existing installation: fedml-azure 1.0.0
    Uninstalling fedml-azure-1.0.0:
      Successfully uninstalled fedml-azure-1.0.0
Successfully installed fedml-azure-1.0.0 hdbcli-2.10.13
Note: you may need to restart the kernel to use updated packages.


## Import the libraries needed in this notebook

In [2]:
from fedml_azure import DwcAzureTrain

## Set up
### Creating a Training object and setting the workspace, compute target, and environment.

Before running the below cell, ensure that you have a workspace and replace the subscription_id, resource_group, and workspace_name with your information.

The whl file for the fedml_azure library must be passed to the pip_wheel_files key in the environment_args and to use scikit-learn, you must pass the name to conda_packages as well.

In [3]:
#creation of training object and creating workspace in constructor.

training = DwcAzureTrain(workspace_args={"subscription_id": "cb97564e-cea8-45a4-9c5c-a3357e8f7ee4",
                                        "resource_group": "Sample2_AzureML_Resource",
                                        "workspace_name": "Sample2_AzureML_Worskpace"
                                        },
                          experiment_args={'name':'test-2'},
                          environment_type='CondaPackageEnvironment',
                          environment_args={'name':'test-env-2','conda_packages':['scikit-learn'],'pip_wheel_files':['fedml_azure-1.0.0-py3-none-any.whl']},
                          compute_type='AmlComputeCluster',
                          compute_args={'vm_size':'Standard_D12_v2',
                                'vm_priority':'lowpriority',
                                'compute_name':'cpu-clu-pcap',
                                'min_nodes':0,
                                'max_nodes':1,
                                'idle_seconds_before_scaledown':1700
                                })


Getting existing Workspace
Creating Experiment
Creating Compute_target
Found compute target. just use it. cpu-clu-pcap
Creating Environment


### Then, we need to generate the run config. This is needed to package the configuration specified so we can submit a job for training. 

Before running the following cell, you should have a config.json file with the specified values to allow you to access to DWC. Provide this file path to config_file_path in the below cell.

You should also have the follow view BREASTCANCER_VIEW created in your DWC. To gather this data, please refer to https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py

In [4]:
#generating the run config
src=training.generate_run_config(config_file_path='dwc_configs/config.json',
                          config_args={
                                          'source_directory':'Scikit-Learn-PCAPipeline',
                                          'script':'pca_script.py',
                                          'arguments':['--model_file_name','regression.pkl', '--table_name', 'BREASTCANCER_VIEW', '--n_components', '3']
                                          }
                            )

Generating script run config
Config file already exists in the script_directory Scikit-Learn-PCAPipeline


### Submitting the job for training

In [5]:
#submitting the training run
run=training.submit_run(src)

Submitting training run
RunId: test-2_1633631602_e6f40d7f
Web View: https://ml.azure.com/runs/test-2_1633631602_e6f40d7f?wsid=/subscriptions/cb97564e-cea8-45a4-9c5c-a3357e8f7ee4/resourcegroups/Sample2_AzureML_Resource/workspaces/Sample2_AzureML_Worskpace&tid=42f7676c-f455-423c-82f6-dc2d99791af7

Streaming azureml-logs/55_azureml-execution-tvmps_d5e0b89c818c57034db183db9d2f3ab8b86ea23b2750ef49f9bab0350a8e59be_p.txt

2021-10-07T18:35:14Z Successfully mounted a/an Blobfuse File System at /mnt/batch/tasks/shared/LS_root/jobs/sample2_azureml_worskpace/azureml/test-2_1633631602_e6f40d7f/mounts/workspaceblobstore
2021-10-07T18:35:14Z The vmsize standard_d12_v2 is not a GPU VM, skipping get GPU count by running nvidia-smi command.
2021-10-07T18:35:14Z Starting output-watcher...
2021-10-07T18:35:14Z IsDedicatedCompute == False, starting polling for Low-Pri Preemption
2021-10-07T18:35:15Z Executing 'Copy ACR Details file' on 10.0.0.5
2021-10-07T18:35:15Z Copy ACR Details file succeeded on 10.0.0

## Register the model for deployment

In [6]:
model=training.register_model(run=run,
                           model_args={'model_name':'sklearn_pcapipeline_model',
                                       'model_path':'outputs/regression.pkl'},
                            resource_config_args={'cpu':1, 'memory_in_gb':0.5},
                            is_sklearn_model=True
                           )
print('Name:', model.name)
print('Version:', model.version)

Registering the model
Configuring parameters for sklearn model
Name: sklearn_pcapipeline_model
Version: 3
