## Scikit-Learn PCA
### Using BREASTCANCER_VIEW from DWC. This view has 569 records

## Install fedml-azure package

In [1]:
pip install fedml-azure --force-reinstall

Processing ./fedml_azure-1.0.0-py3-none-any.whl
Collecting hdbcli
  Using cached hdbcli-2.10.13-cp34-abi3-manylinux1_x86_64.whl (11.7 MB)
Installing collected packages: hdbcli, fedml-azure
  Attempting uninstall: hdbcli
    Found existing installation: hdbcli 2.10.13
    Uninstalling hdbcli-2.10.13:
      Successfully uninstalled hdbcli-2.10.13
  Attempting uninstall: fedml-azure
    Found existing installation: fedml-azure 1.0.0
    Uninstalling fedml-azure-1.0.0:
      Successfully uninstalled fedml-azure-1.0.0
Successfully installed fedml-azure-1.0.0 hdbcli-2.10.13
Note: you may need to restart the kernel to use updated packages.


## Import the libraries needed in this notebook

In [2]:
from fedml_azure import DwcAzureTrain

## Set up
### Creating a Training object and setting the workspace, compute target, and environment.

Before running the below cell, ensure that you have a workspace and replace the subscription_id, resource_group, and workspace_name with your information.

The fedml-azure pip package must be passed to the pip_packages key in the environment_args and 
to use scikit-learn, you must pass the name to conda_packages as well.

In [3]:
#creation of training object and creating workspace in constructor.

training = DwcAzureTrain(workspace_args={"subscription_id": '<subscription_id>',
                                        "resource_group": '<resource_group>',
                                        "workspace_name": '<workspace_name>'
                },
                        environment_type='CondaPackageEnvironment',
                        environment_args={'name':'test-env-pca', 'conda_packages':['scikit-learn'],'pip_packages':['fedml-azure']},
                        experiment_args={'name':'test-2'},
                        compute_type='AmlComputeCluster',
                        compute_args={'vm_size':'Standard_D12_v2',
                                'vm_priority':'lowpriority',
                                'compute_name':'cpu-clu-pca',
                                'min_nodes':0,
                                'max_nodes':1,
                                'idle_seconds_before_scaledown':1700
                                })

Getting existing Workspace
Creating Experiment
Creating Compute_target
Found compute target. just use it. cpu-clu-pca
Creating Environment


### Here we are updating the experiment for this training job. This is optional.

In [4]:
#self,script_directory,experiment=None,experiment_args=None,compute_type=None,compute=None,compute_args=None,environment=None,environment_type=None,environment_args=None,is_compute_create_required=False
training.update_experiment(experiment_args={'name': 'test-1'})

Updating experiment
Creating Experiment


### Then, we need to generate the run config. This is needed to package the configuration specified so we can submit a job for training. 

Before running the following cell, you should have a config.json file with the specified values to allow you to access to DWC. Provide this file path to config_file_path in the below cell.

You should also have the follow view BREASTCANCER_VIEW created in your DWC. To gather this data, please refer to https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py

In [5]:
#generating the run config
src=training.generate_run_config(config_file_path='dwc_configs/config.json',
                          config_args={
                                          'source_directory':'Scikit-Learn-Dimensionality-Reduction',
                                          'script':'pca_script.py',
                                          'arguments':['--model_file_name','regression.pkl', '--table_name', 'BREASTCANCER_VIEW', '--num_components', '3']
                                          }
                            )

Generating script run config
Config file already exists in the script_directory Scikit-Learn-Dimensionality-Reduction


### Submitting the job for training

In [None]:
#submitting the training run
run=training.submit_run(src)

## Register the model for deployment

In [7]:
model = training.register_model(run=run,
                           model_args={'model_name':'sklearn_pca_model', 
                           "model_path":'outputs/regression.pkl'},
                           resource_config_args={"cpu":1, "memory_in_gb":0.5},
                           is_sklearn_model=True
                            )

print('Name:', model.name)
print('Version:', model.version)

Registering the model
Configuring parameters for sklearn model
Name: sklearn_pca_model
Version: 8
