## Scikit-Learn Hyperparameter Tuning
### Using local data (data was created from preprocessor script)

## Install fedml_gcp package

In [1]:
pip install fedml_gcp-1.0.0-py3-none-any.whl --force-reinstall

Processing ./fedml_gcp-1.0.0-py3-none-any.whl
Collecting google
  Using cached google-3.0.0-py2.py3-none-any.whl (45 kB)
Collecting hdbcli
  Using cached hdbcli-2.10.13-cp34-abi3-manylinux1_x86_64.whl (11.7 MB)
Collecting beautifulsoup4
  Using cached beautifulsoup4-4.10.0-py3-none-any.whl (97 kB)
Collecting soupsieve>1.2
  Using cached soupsieve-2.2.1-py3-none-any.whl (33 kB)
Installing collected packages: soupsieve, beautifulsoup4, hdbcli, google, fedml-gcp
  Attempting uninstall: soupsieve
    Found existing installation: soupsieve 2.2.1
    Uninstalling soupsieve-2.2.1:
      Successfully uninstalled soupsieve-2.2.1
  Attempting uninstall: beautifulsoup4
    Found existing installation: beautifulsoup4 4.10.0
    Uninstalling beautifulsoup4-4.10.0:
      Successfully uninstalled beautifulsoup4-4.10.0
  Attempting uninstall: hdbcli
    Found existing installation: hdbcli 2.10.13
    Uninstalling hdbcli-2.10.13:
      Successfully uninstalled hdbcli-2.10.13
  Attempting uninstall: goo

## Import Libraries

In [2]:
from fedml_gcp import DwcGCP
import numpy as np
import pandas as pd
import json

## Create DwcGCP Instance to access class methods and train model

It is expected that the bucket name passed here already exists in Cloud Storage.

In [3]:
dwc = DwcGCP(project_name='fed-ml',
                 bucket_name='fedml-bucket')

## Data setup

In this example, we are using local data for training.
Before running this cell, please make sure to have run the Data Preprocessor model example. That model will write to an output directory for the bucket specified in that models arguments. The output directory will contain the preprocessed_data.csv and labels.csv files used for this model. Download those files for use and write them to the Hyperparameter Tuning Script package folder.

In [4]:
dwc.download_blob('fedml-bucket', 'datapreprocessor/output/preprocessed_data.csv',
                  'HyperparameterTuning/trainer/preprocessed_data.csv')
dwc.download_blob('fedml-bucket', 'datapreprocessor/output/y_train.csv',
                  'HyperparameterTuning/trainer/labels.csv')

Downloaded storage object datapreprocessor/output/preprocessed_data.csv from bucket fedml-bucket to local file HyperparameterTuning/trainer/preprocessed_data.csv.
Downloaded storage object datapreprocessor/output/y_train.csv from bucket fedml-bucket to local file HyperparameterTuning/trainer/labels.csv.


### Create tar bundle of script folder so GCP can use it for training

Before running this cell, please ensure that the script package has all the necessary files for a training job.

In [5]:
dwc.make_tar_bundle('HyperparameterTuning.tar.gz', 'HyperparameterTuning', 'h_tuning/train/HyperparameterTuning.tar.gz')

File HyperparameterTuning.tar.gz uploaded to h_tuning/train/HyperparameterTuning.tar.gz.


### Train Model

GCP takes in training inputs that are specific to the training job and the environment needed.

In the training inputs, we are the python module. This is the module that your script package is named, and it references the task.py file inside the script package.


In [6]:
hyperparameters = {
    'max_depth': [2, 4, 6],
    'n_estimators': [100, 250, 300],
    'max_features': [4, 5, 6, 'sqrt'],
    'min_samples_leaf': [25, 30]
    }
hyperparameters = json.dumps(hyperparameters)
training_inputs = {
    'scaleTier': 'BASIC',
    'packageUris': ['gs://fedml-bucket/h_tuning/train/HyperparameterTuning.tar.gz', "gs://fedml-bucket/fedml_gcp-1.0.0-py3-none-any.whl"],
    'pythonModule': 'trainer.task',
    'args': ['--preprocessed_file_name', 'preprocessed_data.csv',
             '--labels_file_name', 'labels.csv',
             '--hyperparameters', hyperparameters,
             '--n_jobs', '24',
            '--bucket_name', 'fedml-bucket'],
    'region': 'us-east1',
    'jobDir': 'gs://fedml-bucket',
    'runtimeVersion': '2.5',
    'pythonVersion': '3.7',
    'scheduling': {'maxWaitTime': '3600s', 'maxRunningTime': '7200s'}
}

In [7]:
dwc.train_model('h_tuning_final_train1', training_inputs)

Training Job Submitted Succesfully
Job status for fed-ml.h_tuning_final_train1:
    state : QUEUED


### Deploy model

In [8]:
dwc.deploy(model_name='h_tuning_final_deploy1', model_location='/h_tuning/model', version='v1', region='us-east1')

{'name': 'projects/fed-ml/models/h_tuning_final_deploy1', 'regions': ['us-east1'], 'etag': '070g99Zt5C8='}
{'name': 'projects/fed-ml/operations/create_h_tuning_final_deploy1_version-1633569633431', 'metadata': {'@type': 'type.googleapis.com/google.cloud.ml.v1.OperationMetadata', 'createTime': '2021-10-07T01:20:33Z', 'operationType': 'CREATE_VERSION', 'modelName': 'projects/fed-ml/models/h_tuning_final_deploy1', 'version': {'name': 'projects/fed-ml/models/h_tuning_final_deploy1/versions/version', 'deploymentUri': 'gs://fedml-bucket/h_tuning/model', 'createTime': '2021-10-07T01:20:33Z', 'runtimeVersion': '2.5', 'etag': 'tJSihQU6IYA=', 'framework': 'SCIKIT_LEARN', 'machineType': 'mls1-c1-m2', 'pythonVersion': '3.7'}}}
