## Scikit-Learn Hyperparameter Tuning
### Using local data (data was created from preprocessor script)

## Install fedml-azure package

In [1]:
pip install fedml-azure --force-reinstall

Processing ./fedml_azure-1.0.0-py3-none-any.whl
Collecting hdbcli
  Using cached hdbcli-2.10.13-cp34-abi3-manylinux1_x86_64.whl (11.7 MB)
Installing collected packages: hdbcli, fedml-azure
  Attempting uninstall: hdbcli
    Found existing installation: hdbcli 2.10.13
    Uninstalling hdbcli-2.10.13:
      Successfully uninstalled hdbcli-2.10.13
  Attempting uninstall: fedml-azure
    Found existing installation: fedml-azure 1.0.0
    Uninstalling fedml-azure-1.0.0:
      Successfully uninstalled fedml-azure-1.0.0
Successfully installed fedml-azure-1.0.0 hdbcli-2.10.13
Note: you may need to restart the kernel to use updated packages.


## Import the libraries needed in this notebook

In [2]:
from fedml_azure import DwcAzureTrain

## Set up
### Creating a Training object and setting the workspace, compute target, and environment.

Before running the below cell, ensure that you have a workspace and replace the subscription_id, 
resource_group, and workspace_name with your information.

The fedml-azure pip package must be passed to the pip_packages key in the environment_args and 
to use scikit-learn, you must pass the name to conda_packages as well.

In [3]:
#creation of training object and creating workspace in constructor.

training = DwcAzureTrain(
                        workspace_args={"subscription_id": '<subscription_id>',
                                        "resource_group": '<resource_group>',
                                        "workspace_name": '<workspace_name>'
                                        },
                        experiment_args={'name':'test-2'},
                        environment_type='CondaPackageEnvironment',
                        environment_args={'name':'test-env-hyper','conda_packages':['scikit-learn'],'pip_packages':['fedml-azure']},
                        compute_type='AmlComputeCluster',
                        compute_args={'vm_size':'Standard_D12_v2',
                            'vm_priority':'lowpriority',
                            'compute_name':'cpu-clu-hyper',
                            'min_nodes':0,
                            'max_nodes':1,
                            'idle_seconds_before_scaledown':1700
                            })


Getting existing Workspace
Creating Experiment
Creating Compute_target
Found compute target. just use it. cpu-clu-hyper
Creating Environment


### Then, we need to generate the run config. This is needed to package the configuration specified so we can submit a job for training. 

Before running the following cell, you should have a config.json file with the specified values to allow you to access to DWC. Provide this file path to config_file_path in the below cell.

In this example, we are using local data for training.
Before running this cell, please make sure to have run the Data Preprocessor model example. That model will download an output directory containing the preprocessed_data.csv and labels.csv files used for this model.

Make sure to specify the correct output directory in the next cell before running it for the preprocessed_file_name and labels_file_name arguments.

https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py

In [4]:
#generating the run config
import json
hyperparameters = {
    'max_depth': [2, 4, 6],
    'n_estimators': [100, 250, 300],
    'max_features': [4, 5, 6, 'sqrt'],
    'min_samples_leaf': [25, 30]
    }
hyperparameters = json.dumps(hyperparameters)
src=training.generate_run_config(is_dwc_connection_required=False,
                          config_args={
                                          'source_directory':'Scikit-Learn-Hyperparameter-Tuning',
                                          'script':'tuning_script.py',
                                          'arguments':[
                                              '--model_file_name','tuning.pkl',
                                              '--preprocessed_file_name', 'preprocessed_data.csv',
                                              '--labels_file_name', 'labels.csv',
                                              '--hyperparameters', hyperparameters,
                                              '--n_jobs', 24]
                                          }
                            )

Generating script run config
Skipping the copy of db connection config 'config.json' to 'source_directory'


### Submitting the job for training

In [None]:
#submitting the training run
run=training.submit_run(src)

## Register the model for deployment

In [6]:
model=training.register_model(run=run,
                           model_args={'model_name':'sklearn_tuning_model',
                                       'model_path':'outputs/tuning.pkl'},
                            resource_config_args={'cpu':1, 'memory_in_gb':0.5},
                            is_sklearn_model=True
                           )

print('Name:', model.name)
print('Version:', model.version)

Registering the model
Configuring parameters for sklearn model
Name: sklearn_tuning_model
Version: 4
