## AWS SageMaker XGBoost Classification Training Models with cross validation Experiment (Features and Parameters research)

Purpose: Based on the configured list of features and/or parameters train several models to compare results. Each model is a set of cross validation folds.

The idea is to do a **deep**  model comparison with the same method (XGBoost Classification), target variable and dataset but different sets of features and/or parameters. 

The models can be trained in parallel and even with a large dataste, the result is available relatively soon comparing to sequntial run on a local server.

The output is not just a model and score (ROC-AUC) but also feature importance, test dataset evaluation score and training/validation errors to analyze overfitting.

OpenSource XGBoost is used in the script but it can be easily replaced with SageMaker built-in XGBoost (except returning feature importance, test dataset evaluatin scores and training/validation errors to analyze overfitting). 

The advantage of using this notebook over the next one (03.AWS... native XGBoost CV) is for more parallelism. Each fold is trained in parallel. It maybe be better for larger datasets but  on the other hand, will require more simultaneously running training jobs and there is a limit in AWS for large instances running in parallel.

The notebook uses the same approach and scripts for models training as 01.AWS... The only difference is experiment results post processing.


#### Custom evaluation metrics

Only in OpenSource XGBoost custom evaluation metrics can be defined and used for model training directly in the script used for training. However, AWS Sagemaker monitoring system, charts and experiments do not see them. Custom evaluation metrics should be configured in metric_defintions.

OpenSource SageMaker XGBoost (as in March 2021 version 1.2-1) is incorrectly recognizing XGBoost as a built-in algorithm and using metric definition raise the error:
An error occurred (ValidationException) when calling the CreateTrainingJob operation: You can't override the metric definitions for Amazon SageMaker algorithms. Please retry the request without specifying metric definitions.

There are 2 approached to make AWS SageMaker fully functional.

**The first one is just to name the custom function as a standard one.**

The trick is:
1. Create a function to calculate custom metric (gini). If it's named gini it will be used in training but NOT visible for experiments or hyperparameters tuning.
2. Name it as a standard score (auc_xgb instead of gini_xgb) 
3. Disable using a standard score in the parameters: 'disable_default_eval_metric': '1', and do not add eval_metric at all.

That's easily can be accomplished just in the script used for training.

**The second approach is more complex and require more efforts**

The workaround is to create your own container from the official AWS Sagemaker Open Source XGBoost GitHub repository, host it in your own ECR repository, and use this image from Python SDK. 

Steps to create your own ECR repository:

1. Install and configure aws-cli (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html and https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)

2. Install docker (https://docs.docker.com/engine/install/ubuntu/) or just add your user to the docker group (Some steps below do not work when docker used via sudo)
- sudo usermod -aG docker kate
Activate the changes to groups
- newgrp docker
Verify that you can run docker commands without sudo
- docker run hello-world
3. Build container
- git clone https://github.com/aws/sagemaker-xgboost-container
- docker build -t xgboost-container-base:1.2-1-cpu-py3 -f docker/1.0-1/base/Dockerfile.cpu .
- python setup.py bdist_wheel --universal
- docker build -t preprod-xgboost-container:1.2-1-cpu-py3 -f docker/1.2-1/final/Dockerfile.cpu .
4. Create ECR repository and push the above image.
- eval $(aws ecr get-login --region  us-west-2 --no-include-email | sed 's|https://||')
- aws ecr create-repository --repository-name sagemaker-xgboost --region us-west-2
- docker tag preprod-xgboost-container:1.2-1-cpu-py3 XYZ.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.2-1-cpu-py3
- docker push XYZ.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.2-1-cpu-py3
5. Use image_uri=XYZ.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.2-1-cpu-py3 in XGBoost()

#### Notebook Main steps:

1. Experiment configuration. Instead of hardcoding datafile name, target variable, featuresets and parameters sets directly in the code I use an Excel file. Each tab with a predefined name contains featuresets for each model or parameter sets, etc. At the end, the code records the results back into the same excel file. Excel is used as an UI to easily configure experiment. 

AWS SageMaker Experiment is used also but I did not find it's very useful to track the featuresets and process and visaulaize the results (available in SageMaker notebook).  I need to average data before comparing and take into account sem.

2. Preparing training and validation datasets in configured number of folds (num_folds) - data preprocessing - in S3 in a format suitable for AWS Sagemaker. SKLearnProcessor and a processing job are used to create all datasets for all models in one process but the same can be done directly in the script and only the result can be moved to S3. If the datasets can be re-used from a previous experiment, only S3 location to the files can be configured instead.
Usually, testing different featuretests requires creation individual datasets per testing model and different parameters can be tested on the same dataset.


3. Training each model is done in parallel. The number of simultaneously running training jobs is contolled by a parameter (MaxNumOfRunningModels).

4. Extracting results, visualization, performing t-test and saving to an experiment log file. This is done for averaged results from all folds per model. 


#### Known issues:
1. Warnings after upgrading SageMaker to version 2:
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
Looks like warnings from XGBoost open-source estimator. No clear information about the parameters
2. All models artifacts are saved into output_path provided as a parameter to the estimator (expected only model.tar.gz) except source/sourcedir.tar.gz which is saved into toot of a bucket from output_path. Previously everything except model.tar.gz was saved into default bucket
3. As on Mar/2021 open source, script mode, AWS SageMaker XGBoost is still recognized like a standard, built-in algorithm and prevent us using custom metric definitions.
See https://github.com/aws/sagemaker-xgboost-container/issues/121

In [None]:
temp_folder='/home/kate/Research/Property/Notebooks/Experiments/tmp/'
#Experiment_name must NOT contain underscore (_)
Experiment_name='FeatureSet'
#Experiments log file
Experiments_file='/home/kate/Research/Property/Notebooks/Experiments/Logs/Set1-Classification.xlsx'
#AllExperiments_tab is a table with a list of all experiments included in the log
#Mandatory columns: Experiment (Experiment_name), Dataset(data file name), Target(target column name from Dataset)
#The rest of the columns are not use in the code below. I usually add in a free form: objective,status,result,notebook name used to conduct the experiment
AllExperiments_tab='Experiments'
#Experiment configuration:
#1.Experiment_Features_tab: differenet datasets to try
#each line in the tab contains a model name and set of features to built a dataset for SageMaker
#a feature can be an exact column name from the Dataset column in AllExperiments_tab or a calculation based on exact column names and eval pandas function
#if the experiment objective is to try different parameters sets, all models (if more then 1) can have the same feature sets.
Experiment_Features_tab='%s Features'%Experiment_name
#2. Alternatively a set of data files with preprocessed data in S3 can be provided in a form:
#Model,Training_data,Validation_data[, Testing_data, Testing_labels]
Experiment_InputData_tab='%s InputData'%Experiment_name
#3. Experiment_Params_tab: each line in the tab contains a model name and set of XGBoost parametersto apply to a model
#the set of models should be consistent in Experiment_Features_tab and Experiment_Params_tab
#parameters can be the same for all models or specific in each model
Experiment_Params_tab='%s Params'%Experiment_name

#Trial names in AWS Sage Maker experiment
Trial_name_preprocessing='%s-PreparingTrainValidData'%Experiment_name
Trial_name_training='%s-TrainingModels'%Experiment_name

#everything stored in
bucket='kdproperty'

path_to_data='Data'
path_to_training_data='Data/Experiments/%s/training'%Experiment_name
path_to_validation_data='Data/Experiments/%s/validation'%Experiment_name
path_to_testing_data='Data/Experiments/%s/testing'%Experiment_name
path_to_testing_labels='Data/Experiments/%s/labels'%Experiment_name
path_to_configuration='Config'
path_to_models='Models/Experiments/%s'%Experiment_name

#preprocessing parameters
split_year='2019'

#number of folds for CV
num_folds='10'

#entry_point defines a script to be run for model training
#the scripts have different ouput and metric defnitions should be adjusted accordingly
entry_point='ModelTraining_Gini_EvalMetric.py' #'ModelTraining_Gini_named_AUC_EvalMetric.py' #'ModelTraining_Gini_EvalMetric.py' #'ModelTraining.py' uses a standard XGBoost metric (auc), no need for custom image_uri in XGBoost and metric definitions. 
# ModelTraining_Gini_named_AUC_EvalMetric.py uses XGBoost training with custom evaluation metric - gini, but the name of teh custom function inside the script is auc, 
# no need for custom image_uri in XGBoost and metric definitions.
# ModelTraining_Gini_EvalMetric.py uses XGBoost training with custom evaluation metric - gini. Use custom image_uri and metric defnitions.


#just a placeholder. the parameter can be commented in XGBoostbor just leave it as empty string. It's used when standard metrics are used.
stnadard_image_uri=''
#it's needed as a workaround to be able to work with custom metrics and scripts output in AWS Sagemaker montitor systems, charts and experiment
custom_image_uri='757107622481.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.2-1-cpu-py3'

#level of details returning from CV
#any Y return models from a best iteration
#FeatureImportance Y/N
GetFIFlg='Y'
#Scores for Test data (should be provided in fit "test" input) Y/N
GetTestScoreFlg='Y'
#Prediction of Test data (should be provided in fit "test" input) Y/N
GetTestPredFlg='Y'  


#Significance level for t-test
alpha=0.05

#n2/n1 (validation/training) ratio for corrected t-test if n2=n1 or n2/n1 = 1 then it's just usual Student t-test withoot correction
#10 folds means 1/9 validation/training ratio 
n2=1
n1=9

preprocessing_instance_type='ml.t3.large'
preprocessing_instance_count=1

#Training parameters
training_instance_type='ml.c5.xlarge'
training_instance_count=1

#How many simultaneously running training jobs we want to see in the system
MaxNumOfRunningModels = 30
#when a job is completes/failed or stopped a new one can be added Jobs status is checked periodically
check_training_job_every_sec=10

#What to do with th2 experiment (rest of running jobs) if a training job failed or stopped
StopOnFailedModel = True

In [2]:
import sys
import time
import os

import re

import pandas as pd
import numpy as np

import boto3

import s3fs
import tarfile

from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput

import sagemaker
from sagemaker.session import TrainingInput
from sagemaker.xgboost.estimator import XGBoost

from sagemaker.analytics import ExperimentAnalytics

import matplotlib.pyplot as plt
import scipy.stats as stats

import warnings
warnings.filterwarnings('ignore')

In [3]:
#should be run as a first step
#role arn is used when run from a local machine
sagemaker_execution_role = 'arn:aws:iam::757107622481:role/service-role/AmazonSageMaker-ExecutionRole-20200819T131882'

region = boto3.session.Session().region_name
sagemaker_session = sagemaker.session.Session()
s3 = s3fs.S3FileSystem()

## Experiment
Experiment is configured in an experiment log file (Excel file, in my case,  in different tabs)

1. Reading an experiment configuration (Experiment_name) from an experiment log file (Experiments_file). Target and Dataset columns in AllExperiments_tab contain data file name used and target column

In [4]:
experiments = pd.read_excel(open(Experiments_file, 'rb'), sheet_name=AllExperiments_tab)

In [5]:
target=experiments[experiments['Experiment']==Experiment_name]['Target'].values[0]
print('Target of models in %s experiment is %s'%(Experiment_name,target))
data_file=experiments[experiments['Experiment']==Experiment_name]['Dataset'].values[0]
print('Datafile used in %s experiment is %s'%(Experiment_name,data_file))

Target of models in FeatureSet experiment is hasclaim
Datafile used in FeatureSet experiment is property_water_claims_non_cat_fs_v5.csv


2. Models based on individual datasets to be created, trained and compared in the experiment (Experiment_Features_tab) is a table with first column Model name (should be unique) and next columns [1:51] features to train the model. Feature is the exact column name from the dataset or a calculation based on exact column names and eval pandas function

This configuration will be used to preprocess data and also need to be moved to S3 in csv format for easy reading in a preprocessing script if we use AWS SKLearnProcessor/job/instances

In [6]:
model_features = pd.read_excel(open(Experiments_file, 'rb'), sheet_name=Experiment_Features_tab)
model_features  

Unnamed: 0,Model,F1,F2,F3,F4,F5,F6,F7,F8,F9,F10,F11,F12
0,BaseModel,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
1,NoPlumbLeak,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,
2,Nocovalimit,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,,
3,Nopool,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,,,
4,NoWaterRisk,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,,,,
5,Nolandlord,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,,,,,
6,NoPipeFroze,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,,,,,,
7,Norplcostdwel,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,,,,,,,
8,Nousagetype,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,,,,,,,,
9,Model1,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,water_risk_3_blk,,,,,,


In [7]:
#we may need to get mapping between f0 - fN features in the dataset and feature importance output based on Modelname
def GetMap(model):
    feature_map={}
    df=model_features[model_features['Model']==model].loc[:, model_features.columns != 'Model']
    for i,c in enumerate(df.columns):
        feature_map['f%s'%i]=df[c].values[0]
    return feature_map

2a.Preprocessed data may already exists in an S3. Experiment configuration can provide the list of files per model. In this case (len(preprocessed_data)==0) the code skips all steps to preprocess data

In [8]:
try:
    preprocessed_data = pd.read_excel(open(Experiments_file, 'rb'), sheet_name=Experiment_InputData_tab)
except:
    preprocessed_data = pd.DataFrame()

In [9]:
preprocessed_data

Unnamed: 0,Model,fold,Training_data,Validation_data,Testing_data,F1,F2,F3,F4,F5,F6,F7,F8,F9,F10,F11,F12
0,BaseModel,0,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
1,BaseModel,1,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
2,BaseModel,2,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
3,BaseModel,3,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
4,BaseModel,4,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
115,Model3,5,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,usagetype_encd,water_risk_3_blk,,,,,,,
116,Model3,6,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,usagetype_encd,water_risk_3_blk,,,,,,,
117,Model3,7,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,usagetype_encd,water_risk_3_blk,,,,,,,
118,Model3,8,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,usagetype_encd,water_risk_3_blk,,,,,,,


2b.Saving into S3 models configurations (sets of features) to be used in data preprocessing

In [10]:
if len(preprocessed_data)==0:
    Model_Config_file='%s.csv'%Experiment_name
    Models_Config_path = os.path.join(temp_folder, Model_Config_file) 

    model_features.to_csv(Models_Config_path, header=True, index=False)


    input_code = sagemaker_session.upload_data(
        Models_Config_path,
        bucket=bucket,
        key_prefix=path_to_configuration
    )

3. Model params to be used in training is a table with first column Model name (should be unique and corresponds to models in Experiment_Features_tab) and next columns are XGBoost parameters
In a general case, all models can have the same parameters

In [11]:
model_params = pd.read_excel(open(Experiments_file, 'rb'), sheet_name=Experiment_Params_tab)
model_params

Unnamed: 0,Model,objective,eval_metric,booster,scale_pos_weight,colsample_bylevel,colsample_bytree,eta,subsample,max_depth,num_round
0,BaseModel,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,5000
1,NoPlumbLeak,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,5000
2,Nocovalimit,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,5000
3,Nopool,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,5000
4,NoWaterRisk,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,5000
5,Nolandlord,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,5000
6,NoPipeFroze,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,5000
7,Norplcostdwel,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,5000
8,Nousagetype,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,5000
9,Model1,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,5000


4.Verification if we have the same set of models in both configurations

In [12]:
models_from_model_features=model_features['Model'].tolist()
models_from_model_params=model_params['Model'].tolist()
if len([x for x in models_from_model_features if x not in models_from_model_params])!=0:
    raise Exception('Different set of models in featuresets and parametersets!')
if len(preprocessed_data)>0:
    models_from_preprocessed_data=preprocessed_data['Model'].tolist()
    if len([x for x in models_from_preprocessed_data if x not in models_from_model_params])!=0:
        raise Exception('Different set of models in input data and parametersets!')

5.Creating experiments and trials in SageMaker

In [13]:
#sys.path.append('/home/kate/Research/YearBuilt/Notebooks/Experiments')
import ExperimentsUtils as eu

In [14]:
eu.cleanup_experiment(Experiment_name)
eu.create_experiment(Experiment_name)
eu.create_trial(Experiment_name,Trial_name_preprocessing)
eu.create_trial(Experiment_name,Trial_name_training)

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials

## Data preprocessing

1. We may not need an AWS SKLearnProcessor/job/instances for relatively small and medium datasets (unless our data are already in S3 and it takes time and money to download)
What's important is to save the prepared datasets in a predefined S3 location to be used in training.
In a case of really huge datasets and instensive, time consuming preprocessing, a separate SKLearnProcessor/job for each model can be created with more then 1 powerful instance.

Preprocessing script below reads the data from the input dataset, model configurations (desired featuresets), seprate 2020 (split_year) as a test dataset (not used in the experiment, because the data may be not complete developed in the year) and split the rest of the data to training and validation folds. 

Training and validation datasets are saved in AWS SageMaker form (first column is a target, no header) in csv format. The location and filenames are based on model names: folder name is a model name and file names also contain a model name.

In [15]:
%%writefile preprocessingStratifiedKFold_for_all_models.py

#Training and Validation dataset for SageMaker are the same structure: no headers, the first column is a target and the rest are features


import argparse
import os
import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedKFold

if __name__=='__main__':
    
    parser = argparse.ArgumentParser()
    parser.add_argument('--data_file', type=str)
    parser.add_argument('--split_year', type=int)   
    parser.add_argument('--num_folds', type=int)      
    parser.add_argument('--target', type=str)      
    parser.add_argument('--config_file', type=str)     
    args, _ = parser.parse_known_args()    
    print('Received arguments {}'.format(args))
    
    target_column=args.target
    input_data_path = os.path.join('/opt/ml/processing/input', args.data_file)
    config_data_path = os.path.join('/opt/ml/processing/config', args.config_file)
    


    
   
    
    print('Reading input data from {}'.format(input_data_path))
    dataset = pd.read_csv(input_data_path, error_bad_lines=False, index_col=False)
    dataset_test=dataset[(dataset.cal_year == args.split_year)]
    dataset=dataset[(dataset.cal_year < args.split_year)]    
    

    print('Reading config data from {}'.format(config_data_path))
    models = pd.read_csv(config_data_path, error_bad_lines=False, index_col=False) 
    
           
    #StratifiedKFold
    kfold =args.num_folds 
    skf = StratifiedKFold(n_splits=kfold, random_state=42, shuffle=True)
    
    #iterating thru config file with models and featureset
    for index, row in models.iterrows():
        model=row['Model']
        print (index, ': Creating datasets for model %s'%model)
        featureset=row[1:51].tolist()
        featureset=[x for x in featureset if str(x) != 'nan']
        print(','.join(featureset))
        
        #creating dataset for a model according to configured dataset
        X = pd.DataFrame()
        X_test = pd.DataFrame()  
        for f in featureset:
            X[f]=dataset.eval(f)
            X_test[f]=dataset_test.eval(f)             
        y=dataset.eval(target_column)    
        y_test=dataset_test.eval(target_column) 

        #Testing data starts from y_test because they are read in XGBoost processing script to DMatrix amd first column is separated anyway
        #Without the column the script can not predict
        print('Testing data...')
        test_data_output_path = '/opt/ml/processing/output/testing_data/%s/'%model              
        if not os.path.exists(test_data_output_path):
            os.makedirs(test_data_output_path)       
        test_data_output_path = os.path.join(test_data_output_path,  'testing_%s.csv'%(model))  
        test_dataset=pd.DataFrame({target_column:y_test}).join(X_test)
        test_dataset.to_csv(test_data_output_path, header=False, index=False)
        
        #The rest of the data will be used in cv-fold as a whole and seprated to training/validation insode cv        
        for i, (train_index, test_index) in enumerate(skf.split(X, y)):
            print(' fold: {}  of  {} : '.format(i+1, kfold))
            X_train, X_valid = X.iloc[train_index], X.iloc[test_index]
            y_train, y_valid = y.iloc[train_index], y.iloc[test_index] 

            train_data_output_path = '/opt/ml/processing/output/training_data/%s/'%model              
            if not os.path.exists(train_data_output_path):
                os.makedirs(train_data_output_path)
            train_data_output_path_fold = os.path.join(train_data_output_path,  'fold_%s_training_%s.csv'%(model,i))
            
            validation_data_output_path = '/opt/ml/processing/output/validation_data/%s/'%model  
            if not os.path.exists(validation_data_output_path):
                os.makedirs(validation_data_output_path)            
            validation_data_output_path_fold = os.path.join(validation_data_output_path, 'fold_%s_validation_%s.csv'%(model,i))       
        
            training_dataset=pd.DataFrame({target_column:y_train}).join(X_train)
            training_dataset.to_csv(train_data_output_path_fold, header=False, index=False)
                                                   
            validation_dataset=pd.DataFrame({target_column:y_valid}).join(X_valid)   
            validation_dataset.to_csv(validation_data_output_path_fold, header=False, index=False)    

Overwriting preprocessingStratifiedKFold_for_all_models.py


In [16]:
if len(preprocessed_data)==0:
    data_processor = SKLearnProcessor(framework_version='0.20.0',
                                     role=sagemaker_execution_role,
                                     instance_type=preprocessing_instance_type,
                                     instance_count=preprocessing_instance_count)    
    data_processor.run(code='preprocessingStratifiedKFold_for_all_models.py',
                            inputs=[ProcessingInput(input_name='data',source='s3://%s/%s/%s'%(bucket,path_to_data,data_file),destination='/opt/ml/processing/input'),
            ProcessingInput(input_name='config',source='s3://%s/%s/%s'%(bucket,path_to_configuration,Model_Config_file),destination='/opt/ml/processing/config'),
                                   ],
                        outputs=[ProcessingOutput(output_name='training_data', source='/opt/ml/processing/output/training_data',destination='s3://%s/%s/'%(bucket,path_to_training_data)),                                 
                                 ProcessingOutput(output_name='validation_data', source='/opt/ml/processing/output/validation_data',destination='s3://%s/%s/'%(bucket,path_to_validation_data)),                                      
                                 ProcessingOutput(output_name='testing_data', source='/opt/ml/processing/output/testing_data',destination='s3://%s/%s/'%(bucket,path_to_testing_data))
                                ],
                        arguments=['--data_file',data_file,
                                 '--split_year',split_year,
                                 '--num_folds',num_folds,  
                                 '--target',target,                                    
                                 '--config_file',Model_Config_file],
                        experiment_config = {
        'ExperimentName': Experiment_name ,
        'TrialName' : Trial_name_preprocessing,
        'TrialComponentDisplayName' : '%s-%s'%(Trial_name_preprocessing,'-'+time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())),},
                    wait=True
                     )
else:
    print('Data already preprocessed in S3')

Data already preprocessed in S3


In [17]:
#in a case a previous step started preproceesin (nothing provided in InputData) Stop the execution if there is an issue with creating input data for the models
if len(preprocessed_data)==0:
    job_name=data_processor.jobs[-1].describe()['ProcessingJobName']
    if not(sagemaker_session.was_processing_job_successful(job_name)):
        raise Exception('Preprocessing job Failed!')    

2. Preprocessing output (training and validation datasets) is saved separately for each model and fold in a folder with the same name as a models name configured in the experiment

In [18]:
#was not tested with different feature sets (more then 1 row in model_feature)
#preprocessed_data=preprocessed_data[0:0]
if len(preprocessed_data)==0:
    j=0
    for index, row in model_features.iterrows():
        model=row['Model']
        folds_preprocessed_data = pd.DataFrame(columns=['Model','fold', 'Training_data', 'Validation_data', 'Testing_data'])
        for i in range(0,int(num_folds),1):
            train_input = 's3://%s/%s/%s/fold_%s_training_%s.csv'%(bucket,path_to_training_data,model,model,i)
            validation_input = 's3://%s/%s/%s/fold_%s_validation_%s.csv'%(bucket,path_to_validation_data,model,model,i)
            test_data = 's3://%s/%s/%s/testing_%s.csv'%(bucket,path_to_testing_data,model,model) 
            folds_preprocessed_data.loc[j]=[model, i,  train_input,validation_input,test_data]
            j=j+1
        folds_preprocessed_data = pd.merge(folds_preprocessed_data,model_features,on='Model', how='inner')
        preprocessed_data = pd.concat([preprocessed_data, folds_preprocessed_data], axis=0)
    #Saving into the Experiment log file names of created training and validation datasets in S3
    eu.SaveToExperimentLog(Experiments_file, '%s InputData'%Experiment_name, preprocessed_data)
preprocessed_data

Unnamed: 0,Model,fold,Training_data,Validation_data,Testing_data,F1,F2,F3,F4,F5,F6,F7,F8,F9,F10,F11,F12
0,BaseModel,0,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
1,BaseModel,1,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
2,BaseModel,2,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
3,BaseModel,3,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
4,BaseModel,4,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
115,Model3,5,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,usagetype_encd,water_risk_3_blk,,,,,,,
116,Model3,6,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,usagetype_encd,water_risk_3_blk,,,,,,,
117,Model3,7,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,usagetype_encd,water_risk_3_blk,,,,,,,
118,Model3,8,s3://kdproperty/Data/Experiments/FeatureSet/tr...,s3://kdproperty/Data/Experiments/FeatureSet/va...,s3://kdproperty/Data/Experiments/FeatureSet/te...,cal_year-yearbuilt,cova_deductible,sqft,usagetype_encd,water_risk_3_blk,,,,,,,


## Model training

1. Custom script to train a model. It's requred for open-source SageMaker XGBoost container used further in the notebook. The script returns some additional information (feature importance, test dataset scores and train/validation errors) from training for custom processing. The custom output is saved in output.tar.gz.
 - ModelTraining.py uses standard evaluation metric (AUC), no need for metric definitions or custom image
 - ModelTraining_Gini_named_AUC_EvalMetric.py uses XGBoost training with custom evaluation metric - gini, but the name of the custom function inside the script is auc, 
   no need for custom image_uri in XGBoost and metric definitions.
 - ModelTraining_Gini_EvalMetric.py uses XGBoost training with custom evaluation metric - gini. Use custom image_uri and metric defnitions.
You need just one script, configured in entry_point above, and used in XGBoost below 

2. For each parameter set we create an estimator and train it using training and validation datasets created in previous step and saved in a predefined location 
based on Model name. 

The train and valid files locations are saved in preprocessed_data dataframe. They are created for each fold.

Since we built our training jobs based on preconfigured parameters and train/valid locations the data in 2 configuration must be consistent (the same model names).

Only configured number (MaxNumOfRunningModels) of models is running at the same time. The process starts initial MaxNumOfRunningModels models and waits till 
one of them Complete, Failed or Stopped (StopOnFailedModel=False only).
If a model training job Failed or Stopped and StopOnFailedModel is True, the whole process is broken.
Since the training and validation data are created for each fold, the resulting table (data_for_training) will consist data for each configured feature set, parameter set and all folds.
The total number of training jobs is number of featuresets * folds * number of parametersets. It can be huge.

In [19]:
%%writefile ModelTraining.py
#ModelTraining.py uses standard XGBoost evaluation metric(AUC), no need for metric definitions or custom image


#  Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
#  Licensed under the Apache License, Version 2.0 (the "License").
#  You may not use this file except in compliance with the License.
#  A copy of the License is located at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
#  or in the "license" file accompanying this file. This file is distributed
#  on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
#  express or implied. See the License for the specific language governing
#  permissions and limitations under the License.
from __future__ import print_function

import argparse
import json
import logging
import os
import pandas as pd
import pickle as pkl

from sagemaker_containers import entry_point
from sagemaker_xgboost_container.data_utils import get_dmatrix
from sagemaker_xgboost_container import distributed

import xgboost as xgb

import pandas as pd
import numpy as np

from sklearn.metrics import roc_auc_score

def _xgb_train(params, dtrain, evals, num_boost_round, early_stopping_rounds, model_dir, output_data_dir, GetFIFlg,GetTestScoreFlg,GetTestPredFlg, is_master):
    """Run xgb train on arguments given with rabit initialized.

    This is our rabit execution function.

    :param args_dict: Argument dictionary used to run xgb.train().
    :param is_master: True if current node is master host in distributed training,
                        or is running single node training job.
                        Note that rabit_run will include this argument.
    """
    progress = dict()
    booster = xgb.train(params=params,
                        dtrain=dtrain,
                        evals=evals,
                        maximize=True,
                        num_boost_round=num_boost_round,
                        early_stopping_rounds=early_stopping_rounds,
                        evals_result=progress,
                        verbose_eval=100)
    
    print('Eval results')    
    train_error=progress['train']['auc']
    eval_error=progress['validation']['auc']
    results_pd=pd.DataFrame({'train':train_error,'valid':eval_error},columns=['train','valid'])
    
    
    #feature importance
    if GetFIFlg=='Y':
        fi_weight =booster.get_score(importance_type='weight')
        fi_gain = booster.get_score(importance_type='gain')
        fi_cover= booster.get_score(importance_type='cover')
        fi_weight_pd = pd.DataFrame(fi_weight.items(),columns=['feature','weight'])
        fi_gain_pd = pd.DataFrame(fi_gain.items(),columns=['feature','gain'])
        fi_cover_pd = pd.DataFrame(fi_cover.items(),columns=['feature','cover'])
        fi_pd=pd.merge(fi_gain_pd, fi_weight_pd, on='feature', how='inner')
        fi_pd=pd.merge(fi_pd, fi_cover_pd, on='feature', how='inner')

    #Prediction on test data ...
    if 'Y' in (GetTestScoreFlg,GetTestPredFlg):
        df_prediction=pd.DataFrame()
        df_prediction['actual']=dtest.get_label()
        df_prediction['pred']=booster.predict(dtest)
   
        #Test scores from test prediction   
        df_score = pd.DataFrame()
        df_score['roc-auc-test']=[roc_auc_score(df_prediction['actual'], df_prediction['pred'])]
    
    if is_master:
        model_location = model_dir + '/xgboost-model'
        pkl.dump(booster, open(model_location, 'wb'))
        logging.info("Stored trained model at {}".format(model_location))
        
        if not os.path.exists(output_data_dir):
            os.makedirs(output_data_dir)

        result_location = os.path.join(output_data_dir, 'eval_results.csv')
        print('Saving eval results at {}'.format(result_location))
        logging.info('Saving eval results at {}'.format(result_location))
        results_pd.to_csv(result_location, header=True, index=False)
        
        if GetFIFlg=='Y':
            fi_location = os.path.join(output_data_dir, 'fi.csv')
            print('Saving feature importance at {}'.format(fi_location))
            logging.info('Saving feature importance at {}'.format(fi_location))
            fi_pd.to_csv(fi_location, header=True, index=False)
        
        if GetTestPredFlg=='Y':
            predictions_location = os.path.join(output_data_dir, 'test_predictions.csv')
            print('Saving test predictions at {}'.format(predictions_location))
            logging.info('Saving test predictions at {}'.format(predictions_location))
            df_prediction.to_csv(predictions_location, header=True, index=False)
            
        if GetTestScoreFlg=='Y':        
            test_score_location = os.path.join(output_data_dir, 'test_score.csv')
            print('Saving test score  at {}'.format(test_score_location))
            logging.info('Saving test score  at {}'.format(test_score_location))        
            df_score.to_csv(test_score_location, header=True, index=False)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()

    # Hyperparameters are described here.
    parser.add_argument('--max_depth', type=int,)
    parser.add_argument('--eta', type=float)
    parser.add_argument('--objective', type=str)
    parser.add_argument('--num_round', type=int)

    parser.add_argument('--early_stopping_rounds', type=int)
    parser.add_argument('--booster', type=str)
    parser.add_argument('--eval_metric', type=str)
    parser.add_argument('--seed', type=int, default=42)
    parser.add_argument('--scale_pos_weight', type=float)
    parser.add_argument('--colsample_bylevel', type=float)
    parser.add_argument('--colsample_bytree', type=float)
    parser.add_argument('--subsample', type=float)
    parser.add_argument('--max_delta_step', type=int)
    
    
    parser.add_argument('--GetFIFlg', type=str, default='N')
    parser.add_argument('--GetTestScoreFlg', type=str, default='N')
    parser.add_argument('--GetTestPredFlg', type=str, default='N')            
            

    # Sagemaker specific arguments. Defaults are set in the environment variables.
    
    parser.add_argument('--output_data_dir', type=str, default=os.environ.get('SM_OUTPUT_DATA_DIR'))
    parser.add_argument('--model_dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
    parser.add_argument('--validation', type=str, default=os.environ.get('SM_CHANNEL_VALIDATION'))
    parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
    parser.add_argument('--sm_hosts', type=str, default=os.environ.get('SM_HOSTS'))
    parser.add_argument('--sm_current_host', type=str, default=os.environ.get('SM_CURRENT_HOST'))

    args, _ = parser.parse_known_args()

    # Get SageMaker host information from runtime environment variables
    sm_hosts = json.loads(args.sm_hosts)
    sm_current_host = args.sm_current_host

    dtrain = get_dmatrix(args.train, 'csv')
    dval = get_dmatrix(args.validation, 'csv')
    watchlist = [(dtrain, 'train'), (dval, 'validation')] if dval is not None else [(dtrain, 'train')]

      
    dtest = get_dmatrix(args.test, 'csv')
    if not(dtest):
        if ((args.GetTestScoreFlg=='Y') | (args.GetTestPredFlg=='Y')):
            raise Exception('Please provide test data in a test channel for prediction and scores or set GetTestScoreFlg and GetTestPredFlg to N')
            
    train_hp = {
        'max_depth': args.max_depth,
        'eta': args.eta,
        'objective': args.objective,
        'booster': args.booster,
        'seed': args.seed,
        'eval_metric':args.eval_metric,
        'scale_pos_weight':args.scale_pos_weight,
        'colsample_bylevel': args.colsample_bylevel,
        'colsample_bytree': args.colsample_bytree,
        'subsample': args.subsample,
        'max_delta_step':args.max_delta_step
        }

    xgb_train_args = dict(
        params=train_hp,
        dtrain=dtrain,
        evals=watchlist,
        num_boost_round=args.num_round,
        early_stopping_rounds=args.early_stopping_rounds,
        model_dir=args.model_dir,
        output_data_dir=args.output_data_dir,
        GetFIFlg=args.GetFIFlg,
        GetTestScoreFlg=args.GetTestScoreFlg,
        GetTestPredFlg=args.GetTestPredFlg)

    if len(sm_hosts) > 1:
        # Wait until all hosts are able to find each other
        entry_point._wait_hostname_resolution()

        # Execute training function after initializing rabit.
        distributed.rabit_run(
            exec_fun=_xgb_train,
            args=xgb_train_args,
            include_in_training=(dtrain is not None),
            hosts=sm_hosts,
            current_host=sm_current_host,
            update_rabit_args=True
        )
    else:
        # If single node training, call training method directly.
        if dtrain:
            xgb_train_args['is_master'] = True
            _xgb_train(**xgb_train_args)
        else:
            raise ValueError("Training channel must have data to train model.")

#not clear what's this for multi-node training?
def model_fn(model_dir):
    """Deserialize and return fitted model.

    Note that this should have the same name as the serialized model in the _xgb_train method
    """
    model_file = 'xgboost-model'
    booster = pkl.load(open(os.path.join(model_dir, model_file), 'rb'))
    return booster

Overwriting ModelTraining.py


In [20]:
%%writefile ModelTraining_Gini_named_AUC_EvalMetric.py
#ModelTraining_Gini_named_AUC_EvalMetric.py uses XGBoost training with custom evaluation metric - gini, but the name of the custom function inside the script is auc, 
# no need for custom image_uri in XGBoost and metric definitions.


#  Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
#  Licensed under the Apache License, Version 2.0 (the "License").
#  You may not use this file except in compliance with the License.
#  A copy of the License is located at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
#  or in the "license" file accompanying this file. This file is distributed
#  on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
#  express or implied. See the License for the specific language governing
#  permissions and limitations under the License.
from __future__ import print_function

import argparse
import json
import logging
import os
import pandas as pd
import pickle as pkl

from sagemaker_containers import entry_point
from sagemaker_xgboost_container.data_utils import get_dmatrix
from sagemaker_xgboost_container import distributed

import xgboost as xgb

import pandas as pd
import numpy as np

from sklearn.metrics import roc_auc_score

def auc(y, pred):
    g = np.asarray(np.c_[y, pred, np.arange(len(y)) ], dtype=np.float)
    g = g[np.lexsort((g[:,2], -1*g[:,1]))]
    gs = g[:,0].cumsum().sum() / g[:,0].sum()
    gs -= (len(y) + 1) / 2.
    return gs / len(y)
def auc_xgb(pred, y):
    y = y.get_label()
    return 'auc', auc(y, pred) / auc(y, y)

def _xgb_train(params, dtrain, evals, num_boost_round, early_stopping_rounds, model_dir, output_data_dir, GetFIFlg,GetTestScoreFlg,GetTestPredFlg, is_master):
    """Run xgb train on arguments given with rabit initialized.

    This is our rabit execution function.

    :param args_dict: Argument dictionary used to run xgb.train().
    :param is_master: True if current node is master host in distributed training,
                        or is running single node training job.
                        Note that rabit_run will include this argument.
    """
    progress = dict()
    booster = xgb.train(params=params,
                        dtrain=dtrain,
                        evals=evals,
                        feval=auc_xgb,
                        maximize=True,
                        num_boost_round=num_boost_round,
                        early_stopping_rounds=early_stopping_rounds,
                        evals_result=progress,
                        verbose_eval=100)
    
    print('Eval results')    
    train_error=progress['train']['auc']
    eval_error=progress['validation']['auc']
    results_pd=pd.DataFrame({'train':train_error,'valid':eval_error},columns=['train','valid'])
    
    
    #feature importance
    if GetFIFlg=='Y':
        fi_weight =booster.get_score(importance_type='weight')
        fi_gain = booster.get_score(importance_type='gain')
        fi_cover= booster.get_score(importance_type='cover')
        fi_weight_pd = pd.DataFrame(fi_weight.items(),columns=['feature','weight'])
        fi_gain_pd = pd.DataFrame(fi_gain.items(),columns=['feature','gain'])
        fi_cover_pd = pd.DataFrame(fi_cover.items(),columns=['feature','cover'])
        fi_pd=pd.merge(fi_gain_pd, fi_weight_pd, on='feature', how='inner')
        fi_pd=pd.merge(fi_pd, fi_cover_pd, on='feature', how='inner')

    #Prediction on test data ...
    if 'Y' in (GetTestScoreFlg,GetTestPredFlg):
        df_prediction=pd.DataFrame()
        df_prediction['actual']=dtest.get_label()
        df_prediction['pred']=booster.predict(dtest)
   
        #Test scores from test prediction  It's a custom output, no need to use auc in the name 
        df_score = pd.DataFrame()
        df_score['gini-test']=[auc(df_prediction['actual'], df_prediction['pred'])/auc(df_prediction['actual'], df_prediction['actual'])]
    
    if is_master:
        model_location = model_dir + '/xgboost-model'
        pkl.dump(booster, open(model_location, 'wb'))
        logging.info("Stored trained model at {}".format(model_location))
        
        if not os.path.exists(output_data_dir):
            os.makedirs(output_data_dir)

        result_location = os.path.join(output_data_dir, 'eval_results.csv')
        print('Saving eval results at {}'.format(result_location))
        logging.info('Saving eval results at {}'.format(result_location))
        results_pd.to_csv(result_location, header=True, index=False)
        
        if GetFIFlg=='Y':
            fi_location = os.path.join(output_data_dir, 'fi.csv')
            print('Saving feature importance at {}'.format(fi_location))
            logging.info('Saving feature importance at {}'.format(fi_location))
            fi_pd.to_csv(fi_location, header=True, index=False)
        
        if GetTestPredFlg=='Y':
            predictions_location = os.path.join(output_data_dir, 'test_predictions.csv')
            print('Saving test predictions at {}'.format(predictions_location))
            logging.info('Saving test predictions at {}'.format(predictions_location))
            df_prediction.to_csv(predictions_location, header=True, index=False)
            
        if GetTestScoreFlg=='Y':        
            test_score_location = os.path.join(output_data_dir, 'test_score.csv')
            print('Saving test score  at {}'.format(test_score_location))
            logging.info('Saving test score  at {}'.format(test_score_location))        
            df_score.to_csv(test_score_location, header=True, index=False)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()

    # Hyperparameters are described here.
    parser.add_argument('--max_depth', type=int,)
    parser.add_argument('--eta', type=float)
    parser.add_argument('--objective', type=str)
    parser.add_argument('--num_round', type=int)

    parser.add_argument('--early_stopping_rounds', type=int)
    parser.add_argument('--booster', type=str)
    #parser.add_argument('--eval_metric', type=str)
    parser.add_argument('--seed', type=int, default=42)
    parser.add_argument('--scale_pos_weight', type=float)
    parser.add_argument('--colsample_bylevel', type=float)
    parser.add_argument('--colsample_bytree', type=float)
    parser.add_argument('--subsample', type=float)
    parser.add_argument('--max_delta_step', type=int)
    
    
    parser.add_argument('--GetFIFlg', type=str, default='N')
    parser.add_argument('--GetTestScoreFlg', type=str, default='N')
    parser.add_argument('--GetTestPredFlg', type=str, default='N')            
            

    # Sagemaker specific arguments. Defaults are set in the environment variables.
    
    parser.add_argument('--output_data_dir', type=str, default=os.environ.get('SM_OUTPUT_DATA_DIR'))
    parser.add_argument('--model_dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
    parser.add_argument('--validation', type=str, default=os.environ.get('SM_CHANNEL_VALIDATION'))
    parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
    parser.add_argument('--sm_hosts', type=str, default=os.environ.get('SM_HOSTS'))
    parser.add_argument('--sm_current_host', type=str, default=os.environ.get('SM_CURRENT_HOST'))

    args, _ = parser.parse_known_args()

    # Get SageMaker host information from runtime environment variables
    sm_hosts = json.loads(args.sm_hosts)
    sm_current_host = args.sm_current_host

    dtrain = get_dmatrix(args.train, 'csv')
    dval = get_dmatrix(args.validation, 'csv')
    watchlist = [(dtrain, 'train'), (dval, 'validation')] if dval is not None else [(dtrain, 'train')]

      
    dtest = get_dmatrix(args.test, 'csv')
    if not(dtest):
        if ((args.GetTestScoreFlg=='Y') | (args.GetTestPredFlg=='Y')):
            raise Exception('Please provide test data in a test channel for prediction and scores or set GetTestScoreFlg and GetTestPredFlg to N')
            
    train_hp = {
        'max_depth': args.max_depth,
        'eta': args.eta,
        'objective': args.objective,
        'booster': args.booster,
        'seed': args.seed,
        #'eval_metric':args.eval_metric,
        'disable_default_eval_metric': '1',
        'scale_pos_weight':args.scale_pos_weight,
        'colsample_bylevel': args.colsample_bylevel,
        'colsample_bytree': args.colsample_bytree,
        'subsample': args.subsample,
        'max_delta_step':args.max_delta_step
        }

    xgb_train_args = dict(
        params=train_hp,
        dtrain=dtrain,
        evals=watchlist,
        num_boost_round=args.num_round,
        early_stopping_rounds=args.early_stopping_rounds,
        model_dir=args.model_dir,
        output_data_dir=args.output_data_dir,
        GetFIFlg=args.GetFIFlg,
        GetTestScoreFlg=args.GetTestScoreFlg,
        GetTestPredFlg=args.GetTestPredFlg)

    if len(sm_hosts) > 1:
        # Wait until all hosts are able to find each other
        entry_point._wait_hostname_resolution()

        # Execute training function after initializing rabit.
        distributed.rabit_run(
            exec_fun=_xgb_train,
            args=xgb_train_args,
            include_in_training=(dtrain is not None),
            hosts=sm_hosts,
            current_host=sm_current_host,
            update_rabit_args=True
        )
    else:
        # If single node training, call training method directly.
        if dtrain:
            xgb_train_args['is_master'] = True
            _xgb_train(**xgb_train_args)
        else:
            raise ValueError("Training channel must have data to train model.")

#not clear what's this for multi-node training?
def model_fn(model_dir):
    """Deserialize and return fitted model.

    Note that this should have the same name as the serialized model in the _xgb_train method
    """
    model_file = 'xgboost-model'
    booster = pkl.load(open(os.path.join(model_dir, model_file), 'rb'))
    return booster

Overwriting ModelTraining_Gini_named_AUC_EvalMetric.py


In [21]:
%%writefile ModelTraining_Gini_EvalMetric.py
#ModelTraining_Gini_EvalMetric.py uses XGBoost training with custom evaluation metric - gini. Use custom image_uri and metric defnitions.

#  Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
#  Licensed under the Apache License, Version 2.0 (the "License").
#  You may not use this file except in compliance with the License.
#  A copy of the License is located at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
#  or in the "license" file accompanying this file. This file is distributed
#  on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
#  express or implied. See the License for the specific language governing
#  permissions and limitations under the License.
from __future__ import print_function

import argparse
import json
import logging
import os
import pandas as pd
import pickle as pkl

from sagemaker_containers import entry_point
from sagemaker_xgboost_container.data_utils import get_dmatrix
from sagemaker_xgboost_container import distributed

import xgboost as xgb

import pandas as pd
import numpy as np

from sklearn.metrics import roc_auc_score

def gini(y, pred):
    g = np.asarray(np.c_[y, pred, np.arange(len(y)) ], dtype=np.float)
    g = g[np.lexsort((g[:,2], -1*g[:,1]))]
    gs = g[:,0].cumsum().sum() / g[:,0].sum()
    gs -= (len(y) + 1) / 2.
    return gs / len(y)
def gini_xgb(pred, y):
    y = y.get_label()
    return 'gini', gini(y, pred) / gini(y, y)

def _xgb_train(params, dtrain, evals, num_boost_round, early_stopping_rounds, model_dir, output_data_dir, GetFIFlg,GetTestScoreFlg,GetTestPredFlg, is_master):
    """Run xgb train on arguments given with rabit initialized.

    This is our rabit execution function.

    :param args_dict: Argument dictionary used to run xgb.train().
    :param is_master: True if current node is master host in distributed training,
                        or is running single node training job.
                        Note that rabit_run will include this argument.
    """
    progress = dict()
    booster = xgb.train(params=params,
                        dtrain=dtrain,
                        evals=evals,
                        feval=gini_xgb,
                        maximize=True,
                        num_boost_round=num_boost_round,
                        early_stopping_rounds=early_stopping_rounds,
                        evals_result=progress,
                        verbose_eval=100)
    
    print('Eval results')    
    train_error=progress['train']['gini']
    eval_error=progress['validation']['gini']
    results_pd=pd.DataFrame({'train':train_error,'valid':eval_error},columns=['train','valid'])
    
    
    #feature importance
    if GetFIFlg=='Y':
        fi_weight =booster.get_score(importance_type='weight')
        fi_gain = booster.get_score(importance_type='gain')
        fi_cover= booster.get_score(importance_type='cover')
        fi_weight_pd = pd.DataFrame(fi_weight.items(),columns=['feature','weight'])
        fi_gain_pd = pd.DataFrame(fi_gain.items(),columns=['feature','gain'])
        fi_cover_pd = pd.DataFrame(fi_cover.items(),columns=['feature','cover'])
        fi_pd=pd.merge(fi_gain_pd, fi_weight_pd, on='feature', how='inner')
        fi_pd=pd.merge(fi_pd, fi_cover_pd, on='feature', how='inner')

    #Prediction on test data ...
    if 'Y' in (GetTestScoreFlg,GetTestPredFlg):
        df_prediction=pd.DataFrame()
        df_prediction['actual']=dtest.get_label()
        df_prediction['pred']=booster.predict(dtest)
   
        #Test scores from test prediction   
        df_score = pd.DataFrame()
        df_score['gini-test']=[gini(df_prediction['actual'], df_prediction['pred'])/gini(df_prediction['actual'],df_prediction['actual'])]
        
    
    if is_master:
        model_location = model_dir + '/xgboost-model'
        pkl.dump(booster, open(model_location, 'wb'))
        logging.info("Stored trained model at {}".format(model_location))
        
        if not os.path.exists(output_data_dir):
            os.makedirs(output_data_dir)

        result_location = os.path.join(output_data_dir, 'eval_results.csv')
        print('Saving eval results at {}'.format(result_location))
        logging.info('Saving eval results at {}'.format(result_location))
        results_pd.to_csv(result_location, header=True, index=False)
        
        if GetFIFlg=='Y':
            fi_location = os.path.join(output_data_dir, 'fi.csv')
            print('Saving feature importance at {}'.format(fi_location))
            logging.info('Saving feature importance at {}'.format(fi_location))
            fi_pd.to_csv(fi_location, header=True, index=False)
        
        if GetTestPredFlg=='Y':
            predictions_location = os.path.join(output_data_dir, 'test_predictions.csv')
            print('Saving test predictions at {}'.format(predictions_location))
            logging.info('Saving test predictions at {}'.format(predictions_location))
            df_prediction.to_csv(predictions_location, header=True, index=False)
            
        if GetTestScoreFlg=='Y':        
            test_score_location = os.path.join(output_data_dir, 'test_score.csv')
            print('Saving test score  at {}'.format(test_score_location))
            logging.info('Saving test score  at {}'.format(test_score_location))        
            df_score.to_csv(test_score_location, header=True, index=False)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()

    # Hyperparameters are described here.
    parser.add_argument('--max_depth', type=int,)
    parser.add_argument('--eta', type=float)
    parser.add_argument('--objective', type=str)
    parser.add_argument('--num_round', type=int)

    parser.add_argument('--early_stopping_rounds', type=int)
    parser.add_argument('--booster', type=str)
    #parser.add_argument('--eval_metric', type=str)
    parser.add_argument('--seed', type=int, default=42)
    parser.add_argument('--scale_pos_weight', type=float)
    parser.add_argument('--colsample_bylevel', type=float)
    parser.add_argument('--colsample_bytree', type=float)
    parser.add_argument('--subsample', type=float)
    parser.add_argument('--max_delta_step', type=int)
    
    
    parser.add_argument('--GetFIFlg', type=str, default='N')
    parser.add_argument('--GetTestScoreFlg', type=str, default='N')
    parser.add_argument('--GetTestPredFlg', type=str, default='N')            
            

    # Sagemaker specific arguments. Defaults are set in the environment variables.
    
    parser.add_argument('--output_data_dir', type=str, default=os.environ.get('SM_OUTPUT_DATA_DIR'))
    parser.add_argument('--model_dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
    parser.add_argument('--validation', type=str, default=os.environ.get('SM_CHANNEL_VALIDATION'))
    parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
    parser.add_argument('--sm_hosts', type=str, default=os.environ.get('SM_HOSTS'))
    parser.add_argument('--sm_current_host', type=str, default=os.environ.get('SM_CURRENT_HOST'))

    args, _ = parser.parse_known_args()

    # Get SageMaker host information from runtime environment variables
    sm_hosts = json.loads(args.sm_hosts)
    sm_current_host = args.sm_current_host

    dtrain = get_dmatrix(args.train, 'csv')
    dval = get_dmatrix(args.validation, 'csv')
    watchlist = [(dtrain, 'train'), (dval, 'validation')] if dval is not None else [(dtrain, 'train')]

      
    dtest = get_dmatrix(args.test, 'csv')
    if not(dtest):
        if ((args.GetTestScoreFlg=='Y') | (args.GetTestPredFlg=='Y')):
            raise Exception('Please provide test data in a test channel for prediction and scores or set GetTestScoreFlg and GetTestPredFlg to N')
            
    train_hp = {
        'max_depth': args.max_depth,
        'eta': args.eta,
        'objective': args.objective,
        'booster': args.booster,
        'seed': args.seed,
        #'eval_metric':args.eval_metric,
        'disable_default_eval_metric': '1',
        'scale_pos_weight':args.scale_pos_weight,
        'colsample_bylevel': args.colsample_bylevel,
        'colsample_bytree': args.colsample_bytree,
        'subsample': args.subsample,
        'max_delta_step':args.max_delta_step
        }

    xgb_train_args = dict(
        params=train_hp,
        dtrain=dtrain,
        evals=watchlist,
        num_boost_round=args.num_round,
        early_stopping_rounds=args.early_stopping_rounds,
        model_dir=args.model_dir,
        output_data_dir=args.output_data_dir,
        GetFIFlg=args.GetFIFlg,
        GetTestScoreFlg=args.GetTestScoreFlg,
        GetTestPredFlg=args.GetTestPredFlg)

    if len(sm_hosts) > 1:
        # Wait until all hosts are able to find each other
        entry_point._wait_hostname_resolution()

        # Execute training function after initializing rabit.
        distributed.rabit_run(
            exec_fun=_xgb_train,
            args=xgb_train_args,
            include_in_training=(dtrain is not None),
            hosts=sm_hosts,
            current_host=sm_current_host,
            update_rabit_args=True
        )
    else:
        # If single node training, call training method directly.
        if dtrain:
            xgb_train_args['is_master'] = True
            _xgb_train(**xgb_train_args)
        else:
            raise ValueError("Training channel must have data to train model.")

#not clear what's this for multi-node training?
def model_fn(model_dir):
    """Deserialize and return fitted model.

    Note that this should have the same name as the serialized model in the _xgb_train method
    """
    model_file = 'xgboost-model'
    booster = pkl.load(open(os.path.join(model_dir, model_file), 'rb'))
    return booster

Overwriting ModelTraining_Gini_EvalMetric.py


In [22]:
models_from_preprocessed_data=preprocessed_data['Model'].tolist()
models_from_model_params=model_params['Model'].tolist()
if len([x for x in models_from_preprocessed_data if x not in models_from_model_params])!=0:
    raise Exception('Different set of models in preprocessed_data and parametersets!')
#using merge because, in general, we can have different number of rows in each dataframe - folds in data and different sets of params
data_for_training=pd.merge(model_params, preprocessed_data, on='Model', how='inner')
data_for_training['Model']=data_for_training['Model']+'-'+data_for_training['fold'].astype(str)
data_for_training

INFO:numexpr.utils:NumExpr defaulting to 8 threads.


Unnamed: 0,Model,objective,eval_metric,booster,scale_pos_weight,colsample_bylevel,colsample_bytree,eta,subsample,max_depth,...,F3,F4,F5,F6,F7,F8,F9,F10,F11,F12
0,BaseModel-0,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,...,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
1,BaseModel-1,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,...,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
2,BaseModel-2,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,...,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
3,BaseModel-3,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,...,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
4,BaseModel-4,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,...,sqft,customer_cnt_active_policies,usagetype_encd,replacementcostdwellingind,pipe_froze_3_blk,landlordind,water_risk_3_blk,poolind,cova_limit,plumb_leak_3_blk
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
115,Model3-5,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,...,sqft,usagetype_encd,water_risk_3_blk,,,,,,,
116,Model3-6,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,...,sqft,usagetype_encd,water_risk_3_blk,,,,,,,
117,Model3-7,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,...,sqft,usagetype_encd,water_risk_3_blk,,,,,,,
118,Model3-8,binary:logistic,auc,gbtree,0.3,0.8,0.8,0.04,0.6,6,...,sqft,usagetype_encd,water_risk_3_blk,,,,,,,


If you want to use AWS SageMaker montoring system, charts and AWS SageMaker experiment then custom metrics should be registered. The metrics work only with a custom XGBoost image. See more detail at the top how to create it.
Comment using metric_definitions and image_uri in the next cell (look for xgb_script_mode_estimator = XGBoost) if you are Ok with the custom script output and do not use/need AWS SageMaker monitoring features.

In [23]:
if entry_point=='ModelTraining_Gini_EvalMetric.py':
    #[0]#011train-gini:0.00612#011validation-gini:0.01274
    metric_definitions = [
    {
        'Name': 'train-gini',
        'Regex': '.*\\[[0-9]+\\].*#011train-gini:([-+]?[0-9]*\\.?[0-9]+(?:[eE][-+]?[0-9]+)?).*'
    },    
    {
        'Name': 'validation-gini',
        'Regex': '.*\\[[0-9]+\\].*#011validation-gini:([-+]?[0-9]*\\.?[0-9]+(?:[eE][-+]?[0-9]+)?).*'
    }
] 
    image_uri=custom_image_uri
    score='gini'
else:
    metric_definitions = []
    image_uri=''
    score='auc'

In [24]:
CntRunningInst = 0

processors=list()

#regexpression to exclude features (F1..F25) from the list of parameters
regex = re.compile('F[ 0-9]')

for index, row in data_for_training.iterrows():
    model='%s-%s'%(row['Model'],index)
    print(model)
    #if run was stopped because of a failed model, using data in AWS Sagemaker Experiment allows to fiter out what was run and what's not
    #if len(trial_ds[trial_ds['DisplayName']=='bf-TrainingModels-%s'%(model)])>=1:
    #    continue
    #1. Verifying training and validation data exists (were created in preprocessing step or moved manually in a predefined location)

    if not(s3.exists(row['Training_data']) & s3.exists(row['Validation_data'])):           
        print('Training and Validation data do not exist. Skipping model %s'%(model))
        print('Check Training data in %s'%row['Training_data'])
        print('Check Validation data in %s'%row['Validation_data'])              
        continue
    #2. Do we have available instances slots to run the model? It depends on number of allowed in teh account simultaneously running specific instance types,
    #number of instance type configured per model and configured number of model running
    if CntRunningInst >= MaxNumOfRunningModels * training_instance_count: #not enough slots to add a model
        print('There is no slot to train  %s model. Waiting...'%model)
        #Waiting till a taining job complete
        while CntRunningInst >= MaxNumOfRunningModels * training_instance_count:
            print('.', end='')
            time.sleep(check_training_job_every_sec)            
            for p in processors:
                ModelFailed=False
                name=p.jobs[-1].describe()['TrainingJobName']
                status=p.jobs[-1].describe()['TrainingJobStatus']
                dummyFlag=not(StopOnFailedModel)
                #job completed, failed or stopped (and do not stop the process) then we a slot is free
                if (
                    (status=='Completed') | 
                    ( ((status=='Failed') | 
                       (status=='Stopped')
                      ) & 
                      dummyFlag
                    )
                   ):
                    print('')
                    print('Job %s  is %s'%(name,status))
                    print('Continue training...')
                    CntRunningInst = CntRunningInst - training_instance_count
                    processors.remove(p)
                    break
                elif ( ((status=='Failed') | (status=='Stopped')) & StopOnFailedModel) :
                    raise Exception('Model %s training failed!'%name)
    #3. there is a slot to add a model training job
    print('Creating training job for  %s model'%model)
    #parameters
    #techically early_stopping_rounds are not XGBoost parameters but it's an easy way to send them into the training/CV process
    hyperparameters = {
        'early_stopping_rounds':100,
        'seed': 42,
        'nfold': num_folds,
        'GetFIFlg':GetFIFlg,
        'GetTestScoreFlg':GetTestScoreFlg,
        'GetTestPredFlg':GetTestPredFlg 
    } 
    for i, param in enumerate(data_for_training.columns):
        #skip first column with Model name and dataset names or features
        #if do not exclude then they will be added into experiment analytics as parameters but not used in training anyway
        if ((param in ('Model','Training_data','Validation_data','Testing_data','Testing_labels')) | (bool(re.match(regex, param)))):
            continue
        hyperparameters[param] = row[param]
    print(hyperparameters)
    
    #training and validation data from preprocessing
    train_input = TrainingInput(row['Training_data'], content_type='text/csv')
    validation_input = TrainingInput(row['Validation_data'], content_type='text/csv')
    test_input = TrainingInput(row['Testing_data'], content_type='text/csv')
    #Estimator

    
    training_job_name = model+'-'+time.strftime('%Y-%m-%d-%H-%M-%S', time.gmtime())
    xgb_script_mode_estimator = XGBoost(
        entry_point=entry_point,
        image_uri=image_uri,
        hyperparameters=hyperparameters,
        role=sagemaker_execution_role, 
        instance_count=training_instance_count,
        instance_type=training_instance_type,
        framework_version='1.2-1',
        output_path='s3://%s/%s/'%(bucket,path_to_models),
        metric_definitions=metric_definitions #only workds if image_uri is a custom container
        )
    
    #Training
    xgb_script_mode_estimator.fit({'train': train_input, 'validation': validation_input,'test':test_input}, job_name=training_job_name, wait=False,
    experiment_config = {
        'ExperimentName': Experiment_name ,
        'TrialName' : Trial_name_training,
        'TrialComponentDisplayName' : '%s-%s'%(Trial_name_training,model.replace('_','-')),}
                                 )
              
    processors.append(xgb_script_mode_estimator)

    CntRunningInst = CntRunningInst + training_instance_count
    # to prevent throttling
    time.sleep(.5)

        
    
    

BaseModel-0-0


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


Creating training job for  BaseModel-0-0 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 0}


INFO:sagemaker:Creating training-job with name: BaseModel-0-0-2021-05-30-04-01-57
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


BaseModel-1-1
Creating training job for  BaseModel-1-1 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 1}


INFO:sagemaker:Creating training-job with name: BaseModel-1-1-2021-05-30-04-01-58
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


BaseModel-2-2
Creating training job for  BaseModel-2-2 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 2}


INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: BaseModel-2-2-2021-05-30-04-02-01
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


BaseModel-3-3
Creating training job for  BaseModel-3-3 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 3}


INFO:sagemaker:Creating training-job with name: BaseModel-3-3-2021-05-30-04-02-03
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


BaseModel-4-4
Creating training job for  BaseModel-4-4 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 4}


INFO:sagemaker:Creating training-job with name: BaseModel-4-4-2021-05-30-04-02-05
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


BaseModel-5-5
Creating training job for  BaseModel-5-5 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 5}


INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: BaseModel-5-5-2021-05-30-04-02-07


BaseModel-6-6


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


Creating training job for  BaseModel-6-6 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 6}


INFO:sagemaker:Creating training-job with name: BaseModel-6-6-2021-05-30-04-02-09
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


BaseModel-7-7
Creating training job for  BaseModel-7-7 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 7}


INFO:sagemaker:Creating training-job with name: BaseModel-7-7-2021-05-30-04-02-10
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


BaseModel-8-8
Creating training job for  BaseModel-8-8 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 8}


INFO:sagemaker:Creating training-job with name: BaseModel-8-8-2021-05-30-04-02-11
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


BaseModel-9-9
Creating training job for  BaseModel-9-9 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 9}


INFO:sagemaker:Creating training-job with name: BaseModel-9-9-2021-05-30-04-02-13
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


NoPlumbLeak-0-10
Creating training job for  NoPlumbLeak-0-10 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 0}


INFO:sagemaker:Creating training-job with name: NoPlumbLeak-0-10-2021-05-30-04-02-15
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


NoPlumbLeak-1-11
Creating training job for  NoPlumbLeak-1-11 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 1}


INFO:sagemaker:Creating training-job with name: NoPlumbLeak-1-11-2021-05-30-04-02-17
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


NoPlumbLeak-2-12
Creating training job for  NoPlumbLeak-2-12 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 2}


INFO:sagemaker:Creating training-job with name: NoPlumbLeak-2-12-2021-05-30-04-02-18
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


NoPlumbLeak-3-13
Creating training job for  NoPlumbLeak-3-13 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 3}


INFO:sagemaker:Creating training-job with name: NoPlumbLeak-3-13-2021-05-30-04-02-21
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


NoPlumbLeak-4-14
Creating training job for  NoPlumbLeak-4-14 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 4}


INFO:sagemaker:Creating training-job with name: NoPlumbLeak-4-14-2021-05-30-04-02-23
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


NoPlumbLeak-5-15
Creating training job for  NoPlumbLeak-5-15 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 5}


INFO:sagemaker:Creating training-job with name: NoPlumbLeak-5-15-2021-05-30-04-02-26
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


NoPlumbLeak-6-16
Creating training job for  NoPlumbLeak-6-16 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 6}


INFO:sagemaker:Creating training-job with name: NoPlumbLeak-6-16-2021-05-30-04-02-27
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


NoPlumbLeak-7-17
Creating training job for  NoPlumbLeak-7-17 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 7}


INFO:sagemaker:Creating training-job with name: NoPlumbLeak-7-17-2021-05-30-04-02-28


NoPlumbLeak-8-18
Creating training job for  NoPlumbLeak-8-18 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 8}


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: NoPlumbLeak-8-18-2021-05-30-04-02-30
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


NoPlumbLeak-9-19
Creating training job for  NoPlumbLeak-9-19 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 9}


INFO:sagemaker:Creating training-job with name: NoPlumbLeak-9-19-2021-05-30-04-02-32
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


Nocovalimit-0-20
Creating training job for  Nocovalimit-0-20 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 0}


INFO:sagemaker:Creating training-job with name: Nocovalimit-0-20-2021-05-30-04-02-36
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


Nocovalimit-1-21
Creating training job for  Nocovalimit-1-21 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 1}


INFO:sagemaker:Creating training-job with name: Nocovalimit-1-21-2021-05-30-04-02-37
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


Nocovalimit-2-22
Creating training job for  Nocovalimit-2-22 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 2}


INFO:sagemaker:Creating training-job with name: Nocovalimit-2-22-2021-05-30-04-02-40
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


Nocovalimit-3-23
Creating training job for  Nocovalimit-3-23 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 3}


INFO:sagemaker:Creating training-job with name: Nocovalimit-3-23-2021-05-30-04-02-42
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


Nocovalimit-4-24
Creating training job for  Nocovalimit-4-24 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 4}


INFO:sagemaker:Creating training-job with name: Nocovalimit-4-24-2021-05-30-04-02-43
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


Nocovalimit-5-25
Creating training job for  Nocovalimit-5-25 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 5}


INFO:sagemaker:Creating training-job with name: Nocovalimit-5-25-2021-05-30-04-02-45
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


Nocovalimit-6-26
Creating training job for  Nocovalimit-6-26 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 6}


INFO:sagemaker:Creating training-job with name: Nocovalimit-6-26-2021-05-30-04-02-48
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


Nocovalimit-7-27
Creating training job for  Nocovalimit-7-27 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 7}


INFO:sagemaker:Creating training-job with name: Nocovalimit-7-27-2021-05-30-04-02-50
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


Nocovalimit-8-28
Creating training job for  Nocovalimit-8-28 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 8}


INFO:sagemaker:Creating training-job with name: Nocovalimit-8-28-2021-05-30-04-02-54
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.


Nocovalimit-9-29
Creating training job for  Nocovalimit-9-29 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 9}


INFO:sagemaker:Creating training-job with name: Nocovalimit-9-29-2021-05-30-04-02-55


Nopool-0-30
There is no slot to train  Nopool-0-30 model. Waiting...
.......................

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoPlumbLeak-8-18-2021-05-30-04-02-30  is Completed
Continue training...
Creating training job for  Nopool-0-30 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 0}


INFO:sagemaker:Creating training-job with name: Nopool-0-30-2021-05-30-04-11-39


Nopool-1-31
There is no slot to train  Nopool-1-31 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job BaseModel-3-3-2021-05-30-04-02-03  is Completed
Continue training...
Creating training job for  Nopool-1-31 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 1}


INFO:sagemaker:Creating training-job with name: Nopool-1-31-2021-05-30-04-11-52


Nopool-2-32
There is no slot to train  Nopool-2-32 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoPlumbLeak-3-13-2021-05-30-04-02-21  is Completed
Continue training...
Creating training job for  Nopool-2-32 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 2}


INFO:sagemaker:Creating training-job with name: Nopool-2-32-2021-05-30-04-12-07


Nopool-3-33
There is no slot to train  Nopool-3-33 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job BaseModel-8-8-2021-05-30-04-02-11  is Completed
Continue training...
Creating training job for  Nopool-3-33 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 3}


INFO:sagemaker:Creating training-job with name: Nopool-3-33-2021-05-30-04-12-20


Nopool-4-34
There is no slot to train  Nopool-4-34 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job BaseModel-7-7-2021-05-30-04-02-10  is Completed
Continue training...
Creating training job for  Nopool-4-34 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 4}


INFO:sagemaker:Creating training-job with name: Nopool-4-34-2021-05-30-04-12-34


Nopool-5-35
There is no slot to train  Nopool-5-35 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job BaseModel-5-5-2021-05-30-04-02-07  is Completed
Continue training...
Creating training job for  Nopool-5-35 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 5}


INFO:sagemaker:Creating training-job with name: Nopool-5-35-2021-05-30-04-12-47


Nopool-6-36
There is no slot to train  Nopool-6-36 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job BaseModel-9-9-2021-05-30-04-02-13  is Completed
Continue training...
Creating training job for  Nopool-6-36 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 6}


INFO:sagemaker:Creating training-job with name: Nopool-6-36-2021-05-30-04-13-00


Nopool-7-37
There is no slot to train  Nopool-7-37 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoPlumbLeak-0-10-2021-05-30-04-02-15  is Completed
Continue training...
Creating training job for  Nopool-7-37 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 7}


INFO:sagemaker:Creating training-job with name: Nopool-7-37-2021-05-30-04-13-13


Nopool-8-38
There is no slot to train  Nopool-8-38 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoPlumbLeak-5-15-2021-05-30-04-02-26  is Completed
Continue training...
Creating training job for  Nopool-8-38 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 8}


INFO:sagemaker:Creating training-job with name: Nopool-8-38-2021-05-30-04-13-27


Nopool-9-39
There is no slot to train  Nopool-9-39 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoPlumbLeak-4-14-2021-05-30-04-02-23  is Completed
Continue training...
Creating training job for  Nopool-9-39 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 9}


INFO:sagemaker:Creating training-job with name: Nopool-9-39-2021-05-30-04-13-40


NoWaterRisk-0-40
There is no slot to train  NoWaterRisk-0-40 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job BaseModel-1-1-2021-05-30-04-01-58  is Completed
Continue training...
Creating training job for  NoWaterRisk-0-40 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 0}


INFO:sagemaker:Creating training-job with name: NoWaterRisk-0-40-2021-05-30-04-13-53


NoWaterRisk-1-41
There is no slot to train  NoWaterRisk-1-41 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoPlumbLeak-7-17-2021-05-30-04-02-28  is Completed
Continue training...
Creating training job for  NoWaterRisk-1-41 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 1}


INFO:sagemaker:Creating training-job with name: NoWaterRisk-1-41-2021-05-30-04-14-06


NoWaterRisk-2-42
There is no slot to train  NoWaterRisk-2-42 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job BaseModel-0-0-2021-05-30-04-01-57  is Completed
Continue training...
Creating training job for  NoWaterRisk-2-42 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 2}


INFO:sagemaker:Creating training-job with name: NoWaterRisk-2-42-2021-05-30-04-14-18


NoWaterRisk-3-43
There is no slot to train  NoWaterRisk-3-43 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoPlumbLeak-9-19-2021-05-30-04-02-32  is Completed
Continue training...
Creating training job for  NoWaterRisk-3-43 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 3}


INFO:sagemaker:Creating training-job with name: NoWaterRisk-3-43-2021-05-30-04-14-32


NoWaterRisk-4-44
There is no slot to train  NoWaterRisk-4-44 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nocovalimit-1-21-2021-05-30-04-02-37  is Completed
Continue training...
Creating training job for  NoWaterRisk-4-44 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 4}


INFO:sagemaker:Creating training-job with name: NoWaterRisk-4-44-2021-05-30-04-14-45


NoWaterRisk-5-45
There is no slot to train  NoWaterRisk-5-45 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nocovalimit-2-22-2021-05-30-04-02-40  is Completed
Continue training...
Creating training job for  NoWaterRisk-5-45 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 5}


INFO:sagemaker:Creating training-job with name: NoWaterRisk-5-45-2021-05-30-04-14-59


NoWaterRisk-6-46
There is no slot to train  NoWaterRisk-6-46 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job BaseModel-6-6-2021-05-30-04-02-09  is Completed
Continue training...
Creating training job for  NoWaterRisk-6-46 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 6}


INFO:sagemaker:Creating training-job with name: NoWaterRisk-6-46-2021-05-30-04-15-11


NoWaterRisk-7-47
There is no slot to train  NoWaterRisk-7-47 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoPlumbLeak-1-11-2021-05-30-04-02-17  is Completed
Continue training...
Creating training job for  NoWaterRisk-7-47 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 7}


INFO:sagemaker:Creating training-job with name: NoWaterRisk-7-47-2021-05-30-04-15-23


NoWaterRisk-8-48
There is no slot to train  NoWaterRisk-8-48 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nocovalimit-3-23-2021-05-30-04-02-42  is Completed
Continue training...
Creating training job for  NoWaterRisk-8-48 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 8}


INFO:sagemaker:Creating training-job with name: NoWaterRisk-8-48-2021-05-30-04-15-36


NoWaterRisk-9-49
There is no slot to train  NoWaterRisk-9-49 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials



Job Nocovalimit-4-24-2021-05-30-04-02-43  is Completed
Continue training...
Creating training job for  NoWaterRisk-9-49 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 9}


INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: NoWaterRisk-9-49-2021-05-30-04-15-49


Nolandlord-0-50
There is no slot to train  Nolandlord-0-50 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job BaseModel-2-2-2021-05-30-04-02-01  is Completed
Continue training...
Creating training job for  Nolandlord-0-50 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 0}


INFO:sagemaker:Creating training-job with name: Nolandlord-0-50-2021-05-30-04-16-02


Nolandlord-1-51
There is no slot to train  Nolandlord-1-51 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job BaseModel-4-4-2021-05-30-04-02-05  is Completed
Continue training...
Creating training job for  Nolandlord-1-51 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 1}


INFO:sagemaker:Creating training-job with name: Nolandlord-1-51-2021-05-30-04-16-14


Nolandlord-2-52
There is no slot to train  Nolandlord-2-52 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoPlumbLeak-2-12-2021-05-30-04-02-18  is Completed
Continue training...
Creating training job for  Nolandlord-2-52 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 2}


INFO:sagemaker:Creating training-job with name: Nolandlord-2-52-2021-05-30-04-16-25


Nolandlord-3-53
There is no slot to train  Nolandlord-3-53 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nocovalimit-0-20-2021-05-30-04-02-36  is Completed
Continue training...
Creating training job for  Nolandlord-3-53 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 3}


INFO:sagemaker:Creating training-job with name: Nolandlord-3-53-2021-05-30-04-16-38


Nolandlord-4-54
There is no slot to train  Nolandlord-4-54 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nocovalimit-5-25-2021-05-30-04-02-45  is Completed
Continue training...
Creating training job for  Nolandlord-4-54 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 4}


INFO:sagemaker:Creating training-job with name: Nolandlord-4-54-2021-05-30-04-16-50


Nolandlord-5-55
There is no slot to train  Nolandlord-5-55 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoPlumbLeak-6-16-2021-05-30-04-02-27  is Completed
Continue training...
Creating training job for  Nolandlord-5-55 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 5}


INFO:sagemaker:Creating training-job with name: Nolandlord-5-55-2021-05-30-04-17-02


Nolandlord-6-56
There is no slot to train  Nolandlord-6-56 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nocovalimit-6-26-2021-05-30-04-02-48  is Completed
Continue training...
Creating training job for  Nolandlord-6-56 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 6}


INFO:sagemaker:Creating training-job with name: Nolandlord-6-56-2021-05-30-04-17-14


Nolandlord-7-57
There is no slot to train  Nolandlord-7-57 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nocovalimit-7-27-2021-05-30-04-02-50  is Completed
Continue training...
Creating training job for  Nolandlord-7-57 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 7}


INFO:sagemaker:Creating training-job with name: Nolandlord-7-57-2021-05-30-04-17-26


Nolandlord-8-58
There is no slot to train  Nolandlord-8-58 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials



Job Nocovalimit-8-28-2021-05-30-04-02-54  is Completed
Continue training...
Creating training job for  Nolandlord-8-58 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 8}


INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: Nolandlord-8-58-2021-05-30-04-17-39


Nolandlord-9-59
There is no slot to train  Nolandlord-9-59 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nocovalimit-9-29-2021-05-30-04-02-55  is Completed
Continue training...
Creating training job for  Nolandlord-9-59 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 9}


INFO:sagemaker:Creating training-job with name: Nolandlord-9-59-2021-05-30-04-17-51


NoPipeFroze-0-60
There is no slot to train  NoPipeFroze-0-60 model. Waiting...
.........

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nopool-1-31-2021-05-30-04-11-52  is Completed
Continue training...
Creating training job for  NoPipeFroze-0-60 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 0}


INFO:sagemaker:Creating training-job with name: NoPipeFroze-0-60-2021-05-30-04-20-52


NoPipeFroze-1-61
There is no slot to train  NoPipeFroze-1-61 model. Waiting...
..

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nopool-0-30-2021-05-30-04-11-39  is Completed
Continue training...
Creating training job for  NoPipeFroze-1-61 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 1}


INFO:sagemaker:Creating training-job with name: NoPipeFroze-1-61-2021-05-30-04-21-22


NoPipeFroze-2-62
There is no slot to train  NoPipeFroze-2-62 model. Waiting...
...

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nopool-3-33-2021-05-30-04-12-20  is Completed
Continue training...
Creating training job for  NoPipeFroze-2-62 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 2}


INFO:sagemaker:Creating training-job with name: NoPipeFroze-2-62-2021-05-30-04-22-09


NoPipeFroze-3-63
There is no slot to train  NoPipeFroze-3-63 model. Waiting...
..

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nopool-2-32-2021-05-30-04-12-07  is Completed
Continue training...
Creating training job for  NoPipeFroze-3-63 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 3}


INFO:sagemaker:Creating training-job with name: NoPipeFroze-3-63-2021-05-30-04-22-39


NoPipeFroze-4-64
There is no slot to train  NoPipeFroze-4-64 model. Waiting...
..

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nopool-8-38-2021-05-30-04-13-27  is Completed
Continue training...
Creating training job for  NoPipeFroze-4-64 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 4}


INFO:sagemaker:Creating training-job with name: NoPipeFroze-4-64-2021-05-30-04-23-09


NoPipeFroze-5-65
There is no slot to train  NoPipeFroze-5-65 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nopool-9-39-2021-05-30-04-13-40  is Completed
Continue training...
Creating training job for  NoPipeFroze-5-65 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 5}


INFO:sagemaker:Creating training-job with name: NoPipeFroze-5-65-2021-05-30-04-23-22


NoPipeFroze-6-66
There is no slot to train  NoPipeFroze-6-66 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials



Job Nopool-4-34-2021-05-30-04-12-34  is Completed
Continue training...
Creating training job for  NoPipeFroze-6-66 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 6}


INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: NoPipeFroze-6-66-2021-05-30-04-23-34


NoPipeFroze-7-67
There is no slot to train  NoPipeFroze-7-67 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nopool-5-35-2021-05-30-04-12-47  is Completed
Continue training...
Creating training job for  NoPipeFroze-7-67 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 7}


INFO:sagemaker:Creating training-job with name: NoPipeFroze-7-67-2021-05-30-04-23-46


NoPipeFroze-8-68
There is no slot to train  NoPipeFroze-8-68 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoWaterRisk-0-40-2021-05-30-04-13-53  is Completed
Continue training...
Creating training job for  NoPipeFroze-8-68 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 8}


INFO:sagemaker:Creating training-job with name: NoPipeFroze-8-68-2021-05-30-04-23-59


NoPipeFroze-9-69
There is no slot to train  NoPipeFroze-9-69 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoWaterRisk-1-41-2021-05-30-04-14-06  is Completed
Continue training...
Creating training job for  NoPipeFroze-9-69 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 9}


INFO:sagemaker:Creating training-job with name: NoPipeFroze-9-69-2021-05-30-04-24-11


Norplcostdwel-0-70
There is no slot to train  Norplcostdwel-0-70 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoWaterRisk-2-42-2021-05-30-04-14-18  is Completed
Continue training...
Creating training job for  Norplcostdwel-0-70 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 0}


INFO:sagemaker:Creating training-job with name: Norplcostdwel-0-70-2021-05-30-04-24-23


Norplcostdwel-1-71
There is no slot to train  Norplcostdwel-1-71 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoWaterRisk-3-43-2021-05-30-04-14-32  is Completed
Continue training...
Creating training job for  Norplcostdwel-1-71 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 1}


INFO:sagemaker:Creating training-job with name: Norplcostdwel-1-71-2021-05-30-04-24-35


Norplcostdwel-2-72
There is no slot to train  Norplcostdwel-2-72 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoWaterRisk-5-45-2021-05-30-04-14-59  is Completed
Continue training...
Creating training job for  Norplcostdwel-2-72 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 2}


INFO:sagemaker:Creating training-job with name: Norplcostdwel-2-72-2021-05-30-04-24-48


Norplcostdwel-3-73
There is no slot to train  Norplcostdwel-3-73 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nopool-7-37-2021-05-30-04-13-13  is Completed
Continue training...
Creating training job for  Norplcostdwel-3-73 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 3}


INFO:sagemaker:Creating training-job with name: Norplcostdwel-3-73-2021-05-30-04-25-00


Norplcostdwel-4-74
There is no slot to train  Norplcostdwel-4-74 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.



Job NoWaterRisk-4-44-2021-05-30-04-14-45  is Completed
Continue training...
Creating training job for  Norplcostdwel-4-74 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 4}


INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: Norplcostdwel-4-74-2021-05-30-04-25-12


Norplcostdwel-5-75
There is no slot to train  Norplcostdwel-5-75 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoWaterRisk-6-46-2021-05-30-04-15-11  is Completed
Continue training...
Creating training job for  Norplcostdwel-5-75 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 5}


INFO:sagemaker:Creating training-job with name: Norplcostdwel-5-75-2021-05-30-04-25-25


Norplcostdwel-6-76
There is no slot to train  Norplcostdwel-6-76 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoWaterRisk-7-47-2021-05-30-04-15-23  is Completed
Continue training...
Creating training job for  Norplcostdwel-6-76 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 6}


INFO:sagemaker:Creating training-job with name: Norplcostdwel-6-76-2021-05-30-04-25-37


Norplcostdwel-7-77
There is no slot to train  Norplcostdwel-7-77 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoWaterRisk-8-48-2021-05-30-04-15-36  is Completed
Continue training...
Creating training job for  Norplcostdwel-7-77 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 7}


INFO:sagemaker:Creating training-job with name: Norplcostdwel-7-77-2021-05-30-04-25-49


Norplcostdwel-8-78
There is no slot to train  Norplcostdwel-8-78 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job NoWaterRisk-9-49-2021-05-30-04-15-49  is Completed
Continue training...
Creating training job for  Norplcostdwel-8-78 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 8}


INFO:sagemaker:Creating training-job with name: Norplcostdwel-8-78-2021-05-30-04-26-01


Norplcostdwel-9-79
There is no slot to train  Norplcostdwel-9-79 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nolandlord-0-50-2021-05-30-04-16-02  is Completed
Continue training...
Creating training job for  Norplcostdwel-9-79 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 9}


INFO:sagemaker:Creating training-job with name: Norplcostdwel-9-79-2021-05-30-04-26-13


Nousagetype-0-80
There is no slot to train  Nousagetype-0-80 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nolandlord-1-51-2021-05-30-04-16-14  is Completed
Continue training...
Creating training job for  Nousagetype-0-80 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 0}


INFO:sagemaker:Creating training-job with name: Nousagetype-0-80-2021-05-30-04-26-25


Nousagetype-1-81
There is no slot to train  Nousagetype-1-81 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nolandlord-2-52-2021-05-30-04-16-25  is Completed
Continue training...
Creating training job for  Nousagetype-1-81 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 1}


INFO:sagemaker:Creating training-job with name: Nousagetype-1-81-2021-05-30-04-26-37


Nousagetype-2-82
There is no slot to train  Nousagetype-2-82 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nolandlord-3-53-2021-05-30-04-16-38  is Completed
Continue training...
Creating training job for  Nousagetype-2-82 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 2}


INFO:sagemaker:Creating training-job with name: Nousagetype-2-82-2021-05-30-04-26-49


Nousagetype-3-83
There is no slot to train  Nousagetype-3-83 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.



Job Nolandlord-4-54-2021-05-30-04-16-50  is Completed
Continue training...
Creating training job for  Nousagetype-3-83 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 3}


INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating training-job with name: Nousagetype-3-83-2021-05-30-04-27-01


Nousagetype-4-84
There is no slot to train  Nousagetype-4-84 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nopool-6-36-2021-05-30-04-13-00  is Completed
Continue training...
Creating training job for  Nousagetype-4-84 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 4}


INFO:sagemaker:Creating training-job with name: Nousagetype-4-84-2021-05-30-04-27-13


Nousagetype-5-85
There is no slot to train  Nousagetype-5-85 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nolandlord-5-55-2021-05-30-04-17-02  is Completed
Continue training...
Creating training job for  Nousagetype-5-85 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 5}


INFO:sagemaker:Creating training-job with name: Nousagetype-5-85-2021-05-30-04-27-25


Nousagetype-6-86
There is no slot to train  Nousagetype-6-86 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nolandlord-6-56-2021-05-30-04-17-14  is Completed
Continue training...
Creating training job for  Nousagetype-6-86 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 6}


INFO:sagemaker:Creating training-job with name: Nousagetype-6-86-2021-05-30-04-27-37


Nousagetype-7-87
There is no slot to train  Nousagetype-7-87 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nolandlord-7-57-2021-05-30-04-17-26  is Completed
Continue training...
Creating training job for  Nousagetype-7-87 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 7}


INFO:sagemaker:Creating training-job with name: Nousagetype-7-87-2021-05-30-04-27-49


Nousagetype-8-88
There is no slot to train  Nousagetype-8-88 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nolandlord-8-58-2021-05-30-04-17-39  is Completed
Continue training...
Creating training job for  Nousagetype-8-88 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 8}


INFO:sagemaker:Creating training-job with name: Nousagetype-8-88-2021-05-30-04-28-01


Nousagetype-9-89
There is no slot to train  Nousagetype-9-89 model. Waiting...
.

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.



Job Nolandlord-9-59-2021-05-30-04-17-51  is Completed
Continue training...
Creating training job for  Nousagetype-9-89 model
{'early_stopping_rounds': 100, 'seed': 42, 'nfold': '10', 'GetFIFlg': 'Y', 'GetTestScoreFlg': 'Y', 'GetTestPredFlg': 'Y', 'objective': 'binary:logistic', 'eval_metric': 'auc', 'booster': 'gbtree', 'scale_pos_weight': 0.3, 'colsample_bylevel': 0.8, 'colsample_bytree': 0.8, 'eta': 0.04, 'subsample': 0.6, 'max_depth': 6, 'num_round': 5000, 'fold': 9}


INFO:sagemaker:Creating training-job with name: Nousagetype-9-89-2021-05-30-04-28-13


Model1-0-90
There is no slot to train  Model1-0-90 model. Waiting...
.......

ClientError: An error occurred (ThrottlingException) when calling the DescribeTrainingJob operation (reached max retries: 4): Rate exceeded

In [25]:
#Waiting till the rest of the training jobs are complete
eu.wait_training_jobs(processors=processors,check_every_sec=check_training_job_every_sec,print_every_n_output=12,wait_min=60)

All Training Jobs are Completed


## Experiment results

Reading from AWS SageMaker experiment, saving to an experiment log file and visualization. The most complex part of the script below is to aggregate together the results of  all models trained on a different folds but for the same featureset and parameter set.

In [26]:
#wait till the data are updated in AWS experiment
time.sleep(10)

In [27]:
#models training and validation data from experiment 
trial_component_analytics = ExperimentAnalytics(
    experiment_name=Experiment_name   
)
trial_comp_ds = trial_component_analytics.dataframe()
trial_ds=trial_comp_ds[trial_comp_ds['DisplayName'].str.contains(Trial_name_training)].copy()

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


In [32]:
trial_ds.tail()

Unnamed: 0,TrialComponentName,DisplayName,SourceArn,GetFIFlg,GetTestPredFlg,GetTestScoreFlg,SageMaker.ImageUri,SageMaker.InstanceCount,SageMaker.InstanceType,SageMaker.VolumeSizeInGB,...,validation - MediaType,validation - Value,SageMaker.DebugHookOutput - MediaType,SageMaker.DebugHookOutput - Value,SageMaker.ModelArtifact - MediaType,SageMaker.ModelArtifact - Value,Trials,Experiments,Model,ind
85,BaseModel-8-8-2021-05-30-04-02-11-aws-training...,FeatureSet-TrainingModels-BaseModel-8-8,arn:aws:sagemaker:us-west-2:757107622481:train...,"""Y""","""Y""","""Y""",757107622481.dkr.ecr.us-west-2.amazonaws.com/s...,1.0,ml.c5.xlarge,30.0,...,text/csv,s3://kdproperty/Data/Experiments/FeatureSet/va...,,s3://kdproperty/Models/Experiments/FeatureSet/,,s3://kdproperty/Models/Experiments/FeatureSet/...,[FeatureSet-TrainingModels],[FeatureSet],BaseModel,8
86,BaseModel-3-3-2021-05-30-04-02-03-aws-training...,FeatureSet-TrainingModels-BaseModel-3-3,arn:aws:sagemaker:us-west-2:757107622481:train...,"""Y""","""Y""","""Y""",757107622481.dkr.ecr.us-west-2.amazonaws.com/s...,1.0,ml.c5.xlarge,30.0,...,text/csv,s3://kdproperty/Data/Experiments/FeatureSet/va...,,s3://kdproperty/Models/Experiments/FeatureSet/,,s3://kdproperty/Models/Experiments/FeatureSet/...,[FeatureSet-TrainingModels],[FeatureSet],BaseModel,3
87,NoPlumbLeak-8-18-2021-05-30-04-02-30-aws-train...,FeatureSet-TrainingModels-NoPlumbLeak-8-18,arn:aws:sagemaker:us-west-2:757107622481:train...,"""Y""","""Y""","""Y""",757107622481.dkr.ecr.us-west-2.amazonaws.com/s...,1.0,ml.c5.xlarge,30.0,...,text/csv,s3://kdproperty/Data/Experiments/FeatureSet/va...,,s3://kdproperty/Models/Experiments/FeatureSet/,,s3://kdproperty/Models/Experiments/FeatureSet/...,[FeatureSet-TrainingModels],[FeatureSet],NoPlumbLeak,18
88,NoPlumbLeak-3-13-2021-05-30-04-02-21-aws-train...,FeatureSet-TrainingModels-NoPlumbLeak-3-13,arn:aws:sagemaker:us-west-2:757107622481:train...,"""Y""","""Y""","""Y""",757107622481.dkr.ecr.us-west-2.amazonaws.com/s...,1.0,ml.c5.xlarge,30.0,...,text/csv,s3://kdproperty/Data/Experiments/FeatureSet/va...,,s3://kdproperty/Models/Experiments/FeatureSet/,,s3://kdproperty/Models/Experiments/FeatureSet/...,[FeatureSet-TrainingModels],[FeatureSet],NoPlumbLeak,13
89,Nocovalimit-3-23-2021-05-30-04-02-42-aws-train...,FeatureSet-TrainingModels-Nocovalimit-3-23,arn:aws:sagemaker:us-west-2:757107622481:train...,"""Y""","""Y""","""Y""",757107622481.dkr.ecr.us-west-2.amazonaws.com/s...,1.0,ml.c5.xlarge,30.0,...,text/csv,s3://kdproperty/Data/Experiments/FeatureSet/va...,,s3://kdproperty/Models/Experiments/FeatureSet/,,s3://kdproperty/Models/Experiments/FeatureSet/...,[FeatureSet-TrainingModels],[FeatureSet],Nocovalimit,23


In [29]:
#number of models in the experiment is a combination of:
# 1.featuresets (defined by Model name).
# 2.folds (num_folds parameter)
# 3.parameters sets (number of rows in model_params) 
# DisplayName in the experiment analytics is a combination of Model name, fold and line number(index) in data_for_training, 
# which is a combination of folds input data for specific datasets and parameters sets
# index is a value after the last dash
# fold is a value after one before last dash
#The purpose of teh code below is to have in a separate columns Model name as in model_features and model_params, index of model_params and folds for each combinations
# index is data_for_training can be converted to index in model_params via bining
#we suppose to have the same number of folds for each set of features and parameters
trial_ds['Model']=trial_ds['DisplayName'].str.replace(Trial_name_training+'-','')
trial_ds['Model']=trial_ds['Model']#.str.replace('-','_')
trial_ds['ind']=pd.to_numeric(trial_ds['Model'].apply(lambda x: x.rpartition('-')[2]))
trial_ds['Model']=trial_ds['Model'].apply(lambda x: x.rpartition('-')[0])
trial_ds['fold']=pd.to_numeric(trial_ds['Model'].apply(lambda x: x.rpartition('-')[2]))
trial_ds['Model']=trial_ds['Model'].apply(lambda x: x.rpartition('-')[0])
bin_labels=list()
bins=len(model_params)
for i in range(0,bins,1):
    bin_labels.append(str(i))
#number of columns with folds scores depends on the number of folds (num_folds) We do not know in advance how many of them exist in the results
folds_columns=[]
folds_train_columns=[]
folds_test_columns=[]
folds_valid_columns=[]
folds_gain_columns=[]
folds_weight_columns=[]
folds_cover_columns=[]
for i in range(0,int(num_folds),1):
    folds_columns.append(str(i))
    folds_train_columns.append('train-%s-fold'%i)
    folds_valid_columns.append('valid-%s-fold'%i)
    folds_test_columns.append('test-%s-fold'%i)
    folds_gain_columns.append('gain-%s'%i)
    folds_weight_columns.append('weight-%s'%i)
    folds_cover_columns.append('cover-%s'%i)

1. Best scores aggregated from folds

In [30]:
#Validation scores
valid_ModelsResults = trial_ds[['Model','ind','fold','validation%s - Last'%('-'+score if score=='gini' else ':'+score)]].copy()
valid_ModelsResults.columns=['Model','ind','fold','validation:%s'%score]
valid_ModelsResults=valid_ModelsResults[['Model','fold','ind','validation:%s'%score]]
valid_ModelsResults=valid_ModelsResults.sort_values(['Model','ind','fold'], ascending=[False,True,True])
valid_ModelsResults['ind']=pd.cut(valid_ModelsResults['ind'],bins=bins,labels=bin_labels)
valid_ModelsResults = pd.pivot_table(valid_ModelsResults, index=['Model','ind'], columns=['fold'])
valid_ModelsResults.reset_index( drop=False, inplace=True )
valid_ModelsResults.columns = valid_ModelsResults.columns.droplevel(0)
valid_ModelsResults.columns =['Model','ind']+folds_valid_columns
valid_ModelsResults['valid-%s-mean'%score]=valid_ModelsResults[folds_valid_columns].mean(axis=1)
valid_ModelsResults['valid-%s-std'%score]=valid_ModelsResults[folds_valid_columns].std(axis=1)
valid_ModelsResults['valid-%s-sem'%score]=valid_ModelsResults[folds_valid_columns].sem(axis=1)
valid_ModelsResults

Unnamed: 0,Model,ind,valid-0-fold,valid-1-fold,valid-2-fold,valid-3-fold,valid-4-fold,valid-5-fold,valid-6-fold,valid-7-fold,valid-8-fold,valid-9-fold,valid-gini-mean,valid-gini-std,valid-gini-sem
0,BaseModel,0,0.40443,0.35781,0.37166,0.4017,0.40248,0.38515,0.37313,0.37354,,,0.383737,0.017479,0.00618
1,BaseModel,1,,,,,,,,,0.35992,0.39159,0.375755,0.022394,0.015835
2,NoPipeFroze,8,0.37702,0.32724,0.34381,0.38137,0.37668,0.35745,0.34482,,,,0.358341,0.020727,0.007834
3,NoPipeFroze,9,,,,,,,,0.32885,0.33897,0.37352,0.347113,0.023422,0.013523
4,NoPlumbLeak,1,0.40339,0.3552,0.37384,0.40365,0.4049,,,,,,0.388196,0.022603,0.010108
5,NoPlumbLeak,2,,,,,,0.38139,0.37372,0.37049,0.35745,0.3927,0.37515,0.013077,0.005848
6,NoWaterRisk,5,0.38467,0.33901,0.3592,0.39104,0.38096,,,,,,0.370976,0.021507,0.009618
7,NoWaterRisk,6,,,,,,0.37432,0.35142,0.34369,0.34637,0.37934,0.359028,0.016581,0.007415
8,Nocovalimit,2,0.40125,0.34987,0.36678,,,,,,,,0.372633,0.026185,0.015118
9,Nocovalimit,3,,,,0.40223,0.40114,0.37663,0.37322,0.36463,0.35659,0.38902,0.380494,0.017618,0.006659


In [33]:
len(valid_ModelsResults)

18

In [None]:
#Training scores
train_ModelsResults = trial_ds[['Model','ind','fold','train%s - Last'%('-'+score if score=='gini' else ':'+score)]].copy()
train_ModelsResults.columns=['Model','ind','fold','train:%s'%score]
train_ModelsResults=train_ModelsResults[['Model','fold','ind','train:%s'%score]]
train_ModelsResults=train_ModelsResults.sort_values(['Model','ind','fold'], ascending=[False,True,True])
train_ModelsResults['ind']=pd.cut(train_ModelsResults['ind'],bins=bins,labels=bin_labels)
train_ModelsResults = pd.pivot_table(train_ModelsResults, index=['Model','ind'], columns=['fold'])
train_ModelsResults.reset_index( drop=False, inplace=True )
train_ModelsResults.columns = train_ModelsResults.columns.droplevel(0)
train_ModelsResults.columns =['Model','ind']+folds_train_columns
train_ModelsResults['train-%s-mean'%score]=train_ModelsResults[folds_train_columns].mean(axis=1)
train_ModelsResults['train-%s-std'%score]=train_ModelsResults[folds_train_columns].std(axis=1)
train_ModelsResults['train-%s-sem'%score]=train_ModelsResults[folds_train_columns].sem(axis=1)
train_ModelsResults

In [None]:
#All together
BestResults = pd.merge(train_ModelsResults, valid_ModelsResults, on=['Model','ind'], how='inner')
BestResults

In [None]:
#additional info not registered in AWS Sage Maker experiment
#but saved in csv files in output folder from training
#1.Feature Importance
#2.Test Scores
#3.Prediction
#4.Training output (evaluation results)
ModelEvalResults=pd.DataFrame()
FI = pd.DataFrame()
ModelsTestScores = pd.DataFrame()
for index, row in trial_ds.iterrows():
    model=row['Model']
    fold=int(row['fold'])
    ind=int(row['ind'])
    print('%s-%s-%s'%(model,fold,index))
    eval_results_file=row['SageMaker.ModelArtifact - Value'].replace('model.tar.gz','output.tar.gz').replace('s3://%s/'%bucket,'')
    print(eval_results_file)
    if s3.exists(row['SageMaker.ModelArtifact - Value'].replace('model.tar.gz','output.tar.gz')):
        sagemaker_session.download_data(path=temp_folder, bucket=bucket, key_prefix=eval_results_file)
        print('Processing...')
        eval_results_file=os.path.join(temp_folder, 'output.tar.gz')
        with tarfile.open(eval_results_file) as tar:
            tar.extractall(path=temp_folder)
        eval_results_file=os.path.join(temp_folder, 'eval_results.csv')
        eval_results=pd.read_csv(eval_results_file, error_bad_lines=False, index_col=False)
        eval_results['Model']=model
        eval_results['fold']=fold
        eval_results['ind']=ind
        ModelEvalResults = pd.concat([ModelEvalResults,eval_results])
        #FI
        if (GetFIFlg=='Y'):
            fi_model_file=os.path.join(temp_folder, 'fi.csv')     
            if  (os.path.isfile(fi_model_file)):
                    fi_model=pd.read_csv(fi_model_file, error_bad_lines=False, index_col=False)
                    fi_model['feature']=fi_model['feature'].map(GetMap(model))
                    fi_model['Model']=model
                    fi_model['fold']=fold
                    fi_model['ind']=ind     
                    FI = pd.concat([FI,fi_model])           
        #Test Scores 
        if (GetTestScoreFlg=='Y'):
            test_score_file=os.path.join(temp_folder, 'test_score.csv')
            if (os.path.isfile(test_score_file)):
                test_score=pd.read_csv(test_score_file, error_bad_lines=False, index_col=False)
                test_score['Model']=model
                test_score['fold']=fold
                test_score['ind']=ind 
                ModelsTestScores = pd.concat([ModelsTestScores,test_score])       
    else:
        print('File does not exist')       
ModelEvalResults=ModelEvalResults[['Model','fold','ind','train','valid']]  
if (GetFIFlg=='Y'):
    FI=FI[['Model','fold','ind','feature','gain','weight','cover']]
if (GetTestScoreFlg=='Y'):
    ModelsTestScores=ModelsTestScores[['Model','fold','ind','roc-auc-test' if score=='auc' and entry_point!='ModelTraining_Gini_named_AUC_EvalMetric.py' else 'gini-test']]

2.Test dataset Scores

In [None]:
#Pivoting Test Scores (original structure has each fold in a row) and averaging
if len(ModelsTestScores)>0:
    ModelsTestScores=ModelsTestScores.sort_values(['Model','ind','fold'], ascending=[False,True,True])
    ModelsTestScores['ind']=pd.cut(ModelsTestScores['ind'],bins=bins,labels=bin_labels)
    ModelsTestScores['ind']=ModelsTestScores['ind'].astype(int)
    ModelsTestScores = pd.pivot_table(ModelsTestScores, index=['Model','ind'], columns=['fold'])
    ModelsTestScores.reset_index( drop=False, inplace=True )
    ModelsTestScores.columns = ModelsTestScores.columns.droplevel(0)
    ModelsTestScores.columns =['Model','ind']+folds_test_columns
    ModelsTestScores['test-%s-mean'%score]=ModelsTestScores[folds_test_columns].mean(axis=1)
    ModelsTestScores['test-%s-std'%score]=ModelsTestScores[folds_test_columns].std(axis=1)
    ModelsTestScores['test-%s-sem'%score]=ModelsTestScores[folds_test_columns].sem(axis=1)
    #Saving into the Experiment log file models results
    eu.SaveToExperimentLog(Experiments_file, '%s TestScores'%Experiment_name, ModelsTestScores)
    print(ModelsTestScores)
    

3. Feature Importance

In [None]:
#Pivoting Feature Importance (original structure has each fold in a row) and averaging
if len(FI)>0:
    FI_gain=FI[['Model','ind','fold','feature','gain']]
    FI_gain=FI_gain.sort_values(['Model','ind','fold'], ascending=[False,True,True])
    FI_gain['ind']=pd.cut(pd.to_numeric(FI_gain['ind']),bins=bins,labels=bin_labels)
    FI_gain = pd.pivot_table(FI_gain, index=['Model','ind','feature'], columns=['fold'])
    FI_gain.reset_index( drop=False, inplace=True )
    FI_gain.columns = FI_gain.columns.droplevel(0)
    FI_gain.columns =['Model','ind','feature']+folds_gain_columns
    FI_gain['gain-mean']=FI_gain[folds_gain_columns].mean(axis=1)
    FI_gain['gainc-std']=FI_gain[folds_gain_columns].std(axis=1)
    FI_gain['gain-sem']=FI_gain[folds_gain_columns].sem(axis=1)
    #
    FI_weight=FI[['Model','ind','fold','feature','weight']]
    FI_weight=FI_weight.sort_values(['Model','ind','fold'], ascending=[False,True,True])
    FI_weight['ind']=pd.cut(pd.to_numeric(FI_weight['ind']),bins=bins,labels=bin_labels)
    FI_weight = pd.pivot_table(FI_weight, index=['Model','ind','feature'], columns=['fold'])
    FI_weight.reset_index( drop=False, inplace=True )
    FI_weight.columns = FI_weight.columns.droplevel(0)
    FI_weight.columns =['Model','ind','feature']+folds_weight_columns
    FI_weight['weight-mean']=FI_weight[folds_weight_columns].mean(axis=1)
    FI_weight['weightc-std']=FI_weight[folds_weight_columns].std(axis=1)
    FI_weight['weight-sem']=FI_weight[folds_weight_columns].sem(axis=1)   
    #
    FI_cover=FI[['Model','ind','fold','feature','cover']]
    FI_cover=FI_cover.sort_values(['Model','ind','fold'], ascending=[False,True,True])
    FI_cover['ind']=pd.cut(pd.to_numeric(FI_cover['ind']),bins=bins,labels=bin_labels)
    FI_cover = pd.pivot_table(FI_cover, index=['Model','ind','feature'], columns=['fold'])
    FI_cover.reset_index( drop=False, inplace=True )
    FI_cover.columns = FI_cover.columns.droplevel(0)
    FI_cover.columns =['Model','ind','feature']+folds_cover_columns
    FI_cover['cover-mean']=FI_cover[folds_cover_columns].mean(axis=1)
    FI_cover['coverc-std']=FI_cover[folds_cover_columns].std(axis=1)
    FI_cover['cover-sem']=FI_cover[folds_cover_columns].sem(axis=1) 
    FI=pd.merge(FI_gain, FI_weight, on=['Model','ind','feature'], how='inner')
    FI=pd.merge(FI, FI_cover, on=['Model','ind','feature'], how='inner')
    FI['ind']=pd.to_numeric(FI['ind'])

3a.Visualization Feature Importance from cross validation folds

In [None]:
if len(FI):
    lst_chart_filenames = list()    
    for index, row in model_params.iterrows():
        if len(FI[( (FI['Model']==row['Model']) & (FI['ind']==index))])>0:
            data=FI[( (FI['Model']==row['Model']) & (FI['ind']==index))].sort_values('gain-mean',ascending=False)
            fig, axs = plt.subplots(nrows=1, ncols=3,figsize=(20,5)) 
            fig.suptitle('%s %s'%(row['Model'],index))
            fig.subplots_adjust(bottom=0.5)
            
            ax = axs[0]
            ax.errorbar(data['feature'], data['gain-mean'], color = 'blue',  ecolor='lightgray', elinewidth=3, capsize=0,yerr=data['gain-sem'], fmt='o')
            ax.set_title('Gain')
            ax.set_xticklabels(data['feature'].values,rotation=90)
            ax.grid(axis='both')
 
            data=data.sort_values('weight-mean',ascending=False)
            ax = axs[1]
            ax.errorbar(data['feature'], data['weight-mean'], color = 'blue',  ecolor='lightgray', elinewidth=3, capsize=0,yerr=data['weight-sem'], fmt='o')
            ax.set_title('Weight')
            ax.set_xticklabels(data['feature'].values,rotation=90)
            ax.grid(axis='both')
                         
            data=data.sort_values('cover-mean',ascending=False)
            ax = axs[2]
            ax.errorbar(data['feature'], data['cover-mean'], color = 'blue',  ecolor='lightgray', elinewidth=3, capsize=0,yerr=data['weight-sem'], fmt='o')
            ax.set_title('Cover')
            ax.set_xticklabels(data['feature'].values,rotation=90)
            ax.grid(axis='both')
            
            chart_filename=temp_folder+'%s %s.png'%(row['Model'],index)
            lst_chart_filenames.append(chart_filename)
            fig.savefig(chart_filename,format='png')

In [None]:
if len(FI): 
    #Saving into the Experiment log file models results
    eu.SaveToExperimentLog(Experiments_file, '%s FI'%Experiment_name, FI)
    eu.SaveChartToExperimentLog(Experiments_file, '%s FI'%Experiment_name, len(FI), 20, lst_chart_filenames)

4.Training and validation errors from folds: averaging

In [None]:
#Pivoting and averaging evaluation results by fold
CVResults=ModelEvalResults.copy()
CVResults['ind']=pd.to_numeric(pd.cut(CVResults['ind'],bins=bins,labels=bin_labels))
CVResults['index'] = CVResults.index
CVResults = pd.pivot_table(CVResults, index=['index','Model','ind'], columns=['fold'])
#folds for teh same parameters and features can have different length of training
CVResults = CVResults.dropna()
CVResults.reset_index( drop=False, inplace=True )
CVResults.columns = CVResults.columns.droplevel(0)
CVResults.columns =['index','Model','ind']+folds_train_columns+folds_valid_columns
CVResults = CVResults.drop('index', axis=1)
CVResults['train-%s-mean'%score]=CVResults[folds_train_columns].mean(axis=1)
CVResults['train-%s-std'%score]=CVResults[folds_train_columns].std(axis=1)
CVResults['train-%s-sem'%score]=CVResults[folds_train_columns].sem(axis=1)

CVResults['valid-%s-mean'%score]=CVResults[folds_valid_columns].mean(axis=1)
CVResults['valid-%s-std'%score]=CVResults[folds_valid_columns].std(axis=1)
CVResults['valid-%s-sem'%score]=CVResults[folds_valid_columns].sem(axis=1)
CVResults = CVResults.drop(folds_valid_columns, axis=1)
CVResults = CVResults.drop(folds_train_columns, axis=1)
CVResults.tail()

5.Visualization aggregated from folds model scores

In [None]:
#Excluding from chart models which did not learn anything (0.5 is random guessing)
#data = BestResults[BestResults['valid-%s-mean'%score]>0.5].copy()
#if len(ModelsTestScores)>0:
#    data_test = ModelsTestScores[ModelsTestScores['test-%s-mean'%score]>0.5].copy()

data = BestResults.copy()
if len(ModelsTestScores)>0:
    data_test = ModelsTestScores.copy()   
    
#list of models for xticks
data['xticks']=data['Model']+' '+data['ind'].astype(str) 
xticks=data['xticks'].unique().tolist()


# The x position 
r1 = np.arange(len(data))
if len(ModelsTestScores)>0:
    fig, axs = plt.subplots(nrows=3, ncols=1, sharex=True,figsize=(20,10))
else:
    fig, axs = plt.subplots(nrows=2, ncols=1, sharex=True,figsize=(20,10))
ax = axs[0]
ax.errorbar(r1, data['valid-%s-mean'%score], color = 'cyan',  ecolor='lightgray', elinewidth=3, capsize=0,yerr=data['valid-%s-sem'%score], fmt='o')
ax.set_title('valid-%s-mean'%score)
ax.grid(axis='both')

ax = axs[1]
ax.errorbar(r1, data['train-%s-mean'%score],  color = 'blue',  ecolor='lightgray', elinewidth=3,capsize=0, yerr=data['train-%s-sem'%score],  fmt='o')
ax.set_title('train-%s-mean'%score)
ax.set_xticks([r  for r in range(len(data))])
ax.set_xticklabels(xticks,rotation=90)
ax.grid(axis='both')
fig.suptitle('Means of %s with standard error of the mean'%score)
if len(data_test)>0:
    ax = axs[2]
    ax.errorbar(r1, data_test['test-%s-mean'%score],  color = 'green',  ecolor='lightgray', elinewidth=3,capsize=0, yerr=data_test['test-%s-sem'%score],  fmt='o')
    ax.set_title('test-%s-mean'%score)
    ax.set_xticks([r  for r in range(len(data_test))])
    ax.set_xticklabels(xticks,rotation=90)
    ax.grid(axis='both')
    
lst_model_scores_chart_filenames=list()
chart_filename=temp_folder+'Models Scores.png'
lst_model_scores_chart_filenames.append(chart_filename)
fig.savefig(chart_filename,format='png')    

6. Corrected t-test compares VALIDATION scores of individual folds in a choosen model to the rest of the models folds

In [None]:
#set a specific BaseModel name and index or just select with min or max score
#The rest of the models will be compared to baseModel and baseind
BaseModel='BaseModel'
BaseInd='0'
#BaseModel=BestResults[BestResults['valid-%s-mean'%score]==BestResults['valid-%s-mean'%score].max()]['Model'].values[0]
#BaseInd=BestResults[BestResults['valid-%s-mean'%score]==BestResults['valid-%s-mean'%score].max()]['ind'].values[0]
BaseModelResults=BestResults[((BestResults['Model']==BaseModel) & (BestResults['ind']==BaseInd))][folds_valid_columns].values[0].tolist()

In [None]:
#corrected t-test for each record in BestResults
for index, model in BestResults.iterrows():
    if ((model['Model']!=BaseModel) | (model['ind']!=BaseInd)):
        AnalyzedModelResults=model[folds_valid_columns].values.tolist()
        (t, critical_value, pvalue) = eu.corrected_paired_ttest(BaseModelResults,AnalyzedModelResults, n1, n2, alpha)
        BestResults.at[index,'corrected t-statistic']= t
        BestResults.at[index,'corrected pvalue'] = pvalue 
        if pvalue>=alpha:
            BestResults.at[index,'corrected t-test Comment'] = 'No difference with %s with %s significance level'%(BaseModel,alpha)
        else:
            BestResults.at[index,'corrected t-test Comment'] = 'There is a difference with %s with %s significance level'%(BaseModel,alpha)
BestResults[['Model','ind','valid-%s-mean'%score,'corrected t-statistic','corrected pvalue','corrected t-test Comment']]

In [None]:
#joining the results of the experiment with the experiment configuration
BestResults = pd.concat([BestResults, model_params.drop('Model',axis=1)], axis=1)
BestResults = pd.merge(BestResults, model_features, on=['Model'], how='inner')
BestResults

7. Corrected Confidence interval of the difference between model VALIDATION scores

In [None]:
CI_name = list()
CI_mean = list()
CI_lower = list()
CI_upper = list()
for index, model in BestResults.iterrows():
    if ((model['Model']!=BaseModel) | (model['ind']!=BaseInd)):
        AnalyzedModelResults=model[folds_valid_columns].values.tolist()
        diff=[np.abs(y - x) for y, x in zip(BaseModelResults,AnalyzedModelResults)]
        CI=eu.corrected_confidence_interval(BaseModelResults,AnalyzedModelResults, n1, n2, 1-alpha)
        CI_name.append(model['Model']+' '+str(model['ind']))
        CI_mean.append(np.mean(diff))
        CI_lower.append(CI[0])
        CI_upper.append(CI[1])          
        BestResults.at[index,'BaseModel Diff mean'] = np.mean(diff)
        BestResults.at[index,'BaseModel Corrected CI lower'] = CI[0]
        BestResults.at[index,'BaseModel Corrected CI upper'] = CI[1]
CI_df = pd.DataFrame(list(zip(CI_name, CI_mean, CI_lower, CI_upper)), columns=['Model','mean','lower','upper'])

In [None]:
dim=np.arange(0,CI_df['upper'].max() + CI_df['upper'].max()/10,CI_df['upper'].max()/10)
plt.figure(figsize=(20,10))
for lower,mean,upper,x in zip(CI_df['lower'],CI_df['mean'],CI_df['upper'],range(len(CI_df))):
    plt.plot((x,x),(lower,upper),'r_-',markersize=20,color='blue')
    plt.plot(x,mean,'ro',color='red')
plt.xticks(range(len(CI_df)),list(CI_df['Model']),rotation=90)
plt.yticks(dim)
plt.grid(axis='both')

#plt.margins(x=2)
_=plt.title('Correcetd Confidence Interval of validation scores differences')
lst_chart_filenames=list()
chart_filename=temp_folder+'Correcetd Confidence Interval of validation scores differences.png'
lst_chart_filenames.append(chart_filename)

plt.savefig(chart_filename,format='png')

8. Students t-test compares VALIDATION scores of individual folds in a choosen model to the rest of the models folds

In [None]:
#t-test for each record in BestResults
for index, model in BestResults.iterrows():
    if ((model['Model']!=BaseModel) | (model['ind']!=BaseInd)):
        AnalyzedModelResults=model[folds_valid_columns].values.tolist()
        t=stats.ttest_rel(BaseModelResults,AnalyzedModelResults)
        BestResults.at[index,'t-statistic']= t.statistic
        BestResults.at[index,'pvalue'] = t.pvalue 
        if t.pvalue>=alpha:
            BestResults.at[index,'t-test Comment'] = 'No difference with %s with %s alpha'%(BaseModel,alpha)
        else:
            BestResults.at[index,'t-test Comment'] = 'There is a difference with %s with %s alpha'%(BaseModel,alpha)
BestResults[['Model','ind','valid-%s-mean'%score,'t-statistic','pvalue','t-test Comment']]

9. Confidence interval of the difference between model Validation  scores

In [None]:
CI_name = list()
CI_mean = list()
CI_lower = list()
CI_upper = list()
for index, model in BestResults.iterrows():
    if ((model['Model']!=BaseModel) | (model['ind']!=BaseInd)):
        AnalyzedModelResults=model[folds_valid_columns].values.tolist()
        diff=[np.abs(y - x) for y, x in zip(BaseModelResults,AnalyzedModelResults)]
        CI=stats.t.interval(1-alpha, len(diff)-1, loc=np.mean(diff), scale=stats.sem(diff))
        CI_name.append(model['Model']+' '+str(model['ind']))
        CI_mean.append(np.mean(diff))
        CI_lower.append(CI[0])
        CI_upper.append(CI[1])          
        BestResults.at[index,'BaseModel Diff mean'] = np.mean(diff)
        BestResults.at[index,'BaseModel Corrected CI lower'] = CI[0]
        BestResults.at[index,'BaseModel Corrected CI upper'] = CI[1]
CI_df = pd.DataFrame(list(zip(CI_name, CI_mean, CI_lower, CI_upper)), columns=['Model','mean','lower','upper'])

In [None]:
dim=np.arange(0,CI_df['upper'].max() + CI_df['upper'].max()/10,CI_df['upper'].max()/10)
plt.figure(figsize=(20,10))
for lower,mean,upper,x in zip(CI_df['lower'],CI_df['mean'],CI_df['upper'],range(len(CI_df))):
    plt.plot((x,x),(lower,upper),'r_-',markersize=20,color='blue')
    plt.plot(x,mean,'ro',color='red')
plt.xticks(range(len(CI_df)),list(CI_df['Model']),rotation=90)
plt.yticks(dim)
plt.grid(axis='both')

#plt.margins(x=2)
_=plt.title('Confidence Interval of validation scores differences')
chart_filename=temp_folder+'Confidence Interval of validation scores differences.png'
lst_chart_filenames.append(chart_filename)

plt.savefig(chart_filename,format='png')

In [None]:
#Saving into the Experiment log file models results
eu.SaveToExperimentLog(Experiments_file, '%s BestResults'%Experiment_name, BestResults)
eu.SaveChartToExperimentLog(Experiments_file, '%s BestResults'%Experiment_name, len(BestResults), 40, lst_chart_filenames)
eu.SaveChartToExperimentLog(Experiments_file, '%s BestResults'%Experiment_name, len(BestResults)+100, 40, lst_model_scores_chart_filenames) 

10. t-test compares TEST scores of individual folds in a choosen model to the rest of the models folds

In [None]:
#set a specific BaseModel name and index or just select with min or max score
#The rest of the models will be compared to baseModel and baseind
if len(ModelsTestScores)>0:
    #BaseModel=ModelTestScores[ModelTestScores['mean']==ModelTestScores['mean'].max()]['Model'].values[0]
    #BaseInd=ModelTestScores[ModelTestScores['mean']==ModelTestScores['mean'].max()]['ind'].values[0]
    BaseModel='BaseModel'
    BaseInd=0
    BaseModelResults=ModelsTestScores[((ModelsTestScores['Model']==BaseModel) & (ModelsTestScores['ind']==BaseInd))][folds_test_columns].values[0].tolist()

In [None]:
#t-test for each record in BestResults
if len(ModelsTestScores)>0:
    for index, model in ModelsTestScores.iterrows():
        if ((model['Model']!=BaseModel) | (model['ind']!=BaseInd)):
            AnalyzedModelResults=model[folds_test_columns].values.tolist()
            t=stats.ttest_rel(BaseModelResults,AnalyzedModelResults)
            ModelsTestScores.at[index,'t-statistic']= t.statistic
            ModelsTestScores.at[index,'pvalue'] = t.pvalue 
            if t.pvalue>=alpha:
                ModelsTestScores.at[index,'Comment'] = 'No difference with %s with %s significance level'%(BaseModel,alpha)
            else:
                ModelsTestScores.at[index,'Comment'] = 'There is a difference with %s with %s significance level'%(BaseModel,alpha)
    print(ModelsTestScores[['Model','ind','test-%s-mean'%score,'t-statistic','pvalue','Comment']])

11. Confidence interval of the difference between model TEST scores

In [None]:
if len(ModelsTestScores)>0:
    CI_name = list()
    CI_mean = list()
    CI_lower = list()
    CI_upper = list()
    for index, model in ModelsTestScores.iterrows():
        if ((model['Model']!=BaseModel) | (model['ind']!=BaseInd)):
            AnalyzedModelResults=model[folds_test_columns].values.tolist()
            diff=[np.abs(y - x) for y, x in zip(BaseModelResults,AnalyzedModelResults)]
            CI=stats.t.interval(1-alpha, len(diff)-1, loc=np.mean(diff), scale=stats.sem(diff))
            CI_name.append(model['Model']+' '+str(model['ind']) )
            CI_mean.append(np.mean(diff))
            CI_lower.append(CI[0])
            CI_upper.append(CI[1])  
            ModelsTestScores.at[index,'BaseModel Diff mean'] = np.mean(diff)
            ModelsTestScores.at[index,'BaseModel Corrected CI lower'] = CI[0]
            ModelsTestScores.at[index,'BaseModel Corrected CI upper'] = CI[1]                       
    CI_df = pd.DataFrame(list(zip(CI_name, CI_mean, CI_lower, CI_upper)), columns=['Model','mean','lower','upper'])

In [None]:
if len(ModelsTestScores)>0:
    plt.figure(figsize=(20,10))
    dim=np.arange(0,CI_df['upper'].max() + CI_df['upper'].max()/10,CI_df['upper'].max()/10)
    for lower,mean,upper,x in zip(CI_df['lower'],CI_df['mean'],CI_df['upper'],range(len(CI_df))):
        plt.plot((x,x),(lower,upper),'r_-',markersize=20,color='blue')
        plt.plot(x,mean,'ro',color='red')
    plt.xticks(range(len(CI_df)),list(CI_df['Model']),rotation=90)
    plt.yticks(dim)
    plt.grid(axis='both')
    #plt.margins(x=2)
    _=plt.title('Confidence Interval of test scores differences')
    lst_chart_filenames=list()
    chart_filename=temp_folder+'Confidence Interval of test scores differences.png'
    lst_chart_filenames.append(chart_filename)
    plt.savefig(chart_filename,format='png')

The difference between the means of model scores for the entire population present in this confidence interval. If there is no difference, then the interval contains zero (0). If zero is NOT in the range of values, the difference is statistically significant. 

In [None]:
if len(ModelsTestScores)>0:
    #Saving into the Experiment log file models results
    eu.SaveToExperimentLog(Experiments_file, '%s TestScores'%Experiment_name, ModelsTestScores)
    eu.SaveChartToExperimentLog(Experiments_file, '%s TestScores'%Experiment_name, len(ModelsTestScores), 20, lst_chart_filenames)

12. Training and validation errors (output from the model) to estimate overfitting

In [None]:
lst_chart_filenames=list()
for index, row in model_params.iterrows():
    if len(CVResults[( (CVResults['Model']==row['Model']) & (CVResults['ind']==index))])>0:
        data=CVResults[( (CVResults['Model']==row['Model']) & (CVResults['ind']==index))]
        ax=data[['train-%s-mean'%score,'valid-%s-mean'%score]].plot(title=row['Model']+'-'+str(index))
        ax.fill_between(data.index.values, (data['train-%s-mean'%score].values-data['train-%s-sem'%score].values), (data['train-%s-mean'%score].values + data['train-%s-sem'%score].values), color='b', alpha=.1)
        ax.fill_between(data.index.values, (data['valid-%s-mean'%score].values-data['valid-%s-sem'%score].values), (data['valid-%s-mean'%score].values + data['valid-%s-sem'%score].values), color='r', alpha=.1)
        chart_filename=temp_folder+'train-valid scores %s-%s.png'%(row['Model'],index)
        lst_chart_filenames.append(chart_filename)        
        ax.figure.savefig(chart_filename,format='png')
  

In [None]:
#Saving into the Experiment log file models results
eu.SaveToExperimentLog(Experiments_file, '%s CVResults'%Experiment_name, CVResults.tail(10))
eu.SaveChartToExperimentLog(Experiments_file, '%s CVResults'%Experiment_name, 10, 20, lst_chart_filenames)

In [None]:
#Saving models artifacts into the Experiment Log file
ModelsFiles=trial_ds[['Model','fold','ind','SageMaker.ModelArtifact - Value']]
ModelsFiles.columns=['Model','fold','ind','ModelFile']
ModelsFiles=ModelsFiles.sort_values(['Model','ind','fold'], ascending=[False,True,True])
ModelsFiles['ind']=pd.cut(ModelsFiles['ind'],bins=bins,labels=bin_labels)
eu.SaveToExperimentLog(Experiments_file, '%s ModelFiles'%Experiment_name, ModelsFiles)