## Scikit-Learn Hyperparameter Tuning
### Using local data (data was created from preprocessor script)

## Install fedml_gcp package

In [None]:
pip install fedml_gcp

## Import Libraries

In [None]:
from fedml_gcp import DwcGCP
import numpy as np
import pandas as pd
import json

## Create DwcGCP Instance to access class methods and train model

It is expected that the bucket name passed here already exists in Cloud Storage.

In [None]:
dwc = DwcGCP(project_name='example-project',
                 bucket_name='<bucket-name>')

## Data setup

In this example, we are using local data for training.
Before running this cell, please make sure to have run the Data Preprocessor model example. That model will write to an output directory for the bucket specified in that models arguments. The output directory will contain the preprocessed_data.csv and labels.csv files used for this model. Download those files for use and write them to the Hyperparameter Tuning Script package folder.

In [None]:
dwc.download_blob('<bucket-name>', 'datapreprocessor/output/preprocessed_data.csv',
                  'HyperparameterTuning/trainer/preprocessed_data.csv')
dwc.download_blob('<bucket-name>', 'datapreprocessor/output/y_train.csv',
                  'HyperparameterTuning/trainer/labels.csv')

### Create tar bundle of script folder so GCP can use it for training

Before running this cell, please ensure that the script package has all the necessary files for a training job.

In [None]:
dwc.make_tar_bundle('HyperparameterTuning.tar.gz', 'HyperparameterTuning', 'h_tuning/train/HyperparameterTuning.tar.gz')

### Train Model

GCP takes in training inputs that are specific to the training job and the environment needed.

In the training inputs, we are the python module. This is the module that your script package is named, and it references the task.py file inside the script package.


In [None]:
hyperparameters = {
    'max_depth': [2, 4, 6],
    'n_estimators': [100, 250, 300],
    'max_features': [4, 5, 6, 'sqrt'],
    'min_samples_leaf': [25, 30]
    }
hyperparameters = json.dumps(hyperparameters)
training_inputs = {
    'scaleTier': 'BASIC',
    'packageUris': ['gs://<bucket-name>/h_tuning/train/HyperparameterTuning.tar.gz'],
    'pythonModule': 'trainer.task',
    'args': ['--preprocessed_file_name', 'preprocessed_data.csv',
             '--labels_file_name', 'labels.csv',
             '--hyperparameters', hyperparameters,
             '--n_jobs', '24',
            '--bucket_name', '<bucket-name>'],
    'region': 'us-east1',
    'jobDir': 'gs://<bucket-name>',
    'runtimeVersion': '2.5',
    'pythonVersion': '3.7',
    'scheduling': {'maxWaitTime': '3600s', 'maxRunningTime': '7200s'}
}

In [None]:
dwc.train_model('h_tuning_final_train1', training_inputs)

### Deploy model

In [None]:
dwc.deploy(model_name='<example-model>', model_location='/h_tuning/model', version='v1', region='us-east1')