### AWS Sagemaker Experiments Cost
Purpose: calculate the cost of experiments based on time spent for prpcessing, training and inference.
The code loops thru the configured list of experiments, reads the names of related jobs and extracts the run time for each as well instance type and number of instances.
Based on the configured price per hour for each instance type the total experiments cost is calculated

In [31]:
Experiments_file='/home/kate/Research/YearBuilt/Experiments/DevExperiments.xlsx'
#Experimets_tab: #,Experiment,other columns not related to this specific process
Experiments_tab='Experiments'
#https://aws.amazon.com/sagemaker/pricing/
cost_map = {'ml.c5.xlarge': 0.238, 'ml.t3.large': 0.1165,'ml.t3.2xlarge':0.4659,'ml.m5.xlarge':0.269}

In [32]:
import sys
import boto3
import time
import pandas as pd
import numpy as np
import sagemaker
from smexperiments.experiment import Experiment

from smexperiments.trial import Trial
from smexperiments.trial_component import TrialComponent

In [33]:
#sys.path.append('/home/kate/Research/YearBuilt/Notebooks/Experiments')
import ExperimentsUtils as eu

1. Reading experiments from an excel file

In [34]:
Experiments=pd.read_excel(open(Experiments_file, 'rb'), sheet_name=Experiments_tab)

In [35]:
Experiments   

Unnamed: 0,#,Experiment,Objective,Status,Result,Dataset,Target
0,1,eta,eta research,Done,0.018 is good enough but 0.02 is better,dwelling_basedata_v4.csv,hasclaim_water
1,2,max-depth,max_depth research,Done,max_depth 3 reduces overfitting visible,dwelling_basedata_v4.csv,hasclaim_water
2,3,bytree,colsample_bytree research,Done,0.8 has minimum overfitting,dwelling_basedata_v4.csv,hasclaim_water
3,4,weight,scale_pos_weight research,Done,Original 0.3 is the best,dwelling_basedata_v4.csv,hasclaim_water
4,5,bylevel,colsample_bylevel research,Done,0.6,dwelling_basedata_v4.csv,hasclaim_water
5,6,subsample,,Done,0.6,dwelling_basedata_v4.csv,hasclaim_water
6,7,weight2,extream vaalues of scale_pos_weight research,Done,Only overfoitting with large numbers of the pa...,dwelling_basedata_v4.csv,hasclaim_water
7,8,delta,max_delta_step,Done,1 - not enough and 10 is too much,dwelling_basedata_v4.csv,hasclaim_water
8,9,delta2,Need to check values between 1 and 10,Done,"no difference between 2 and 10. Overall, scale...",dwelling_basedata_v4.csv,hasclaim_water
9,10,bf,Best Features round 1,Done,"WaterScore, OtherPolicies, ProtectionClass and...",dwelling_basedata_v4.csv,hasclaim_water


In [36]:
#should be run as a first step
#role arn is used when run from a local machine
role = 'arn:aws:iam::757107622481:role/service-role/AmazonSageMaker-ExecutionRole-20200819T131882'
region = 'us-west-2'
sm_sess = sagemaker.session.Session()
sm = boto3.Session().client('sagemaker')

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


In [37]:
#Experiments summaries
from sagemaker.analytics import ExperimentAnalytics

2. Extracting run time for each job registered in the experiments

In [38]:
experiments_l = list()
job_l = list()
job_type_l = list()
TransformingTimeInSeconds_l = list()
ProcessingTimeInSeconds_l = list()
BillableTimeInSeconds_l = list()
TrainingTimeInSeconds_l = list()
InstanceType_l = list()
InstanceCount_l = list()
for e in Experiments['Experiment']:
    print('Processing experiment: %s'%e)
    analytics = ExperimentAnalytics(experiment_name=e)
    analytics_ds = analytics.dataframe()
    if analytics_ds.empty:
        continue
    for j in analytics_ds['SourceArn'].values:
        experiments_l.append(e)
        job_name=j[j.index('/')+1:len(j)]
        job_l.append(job_name)
        if 'training-job' in j:
            job_type_l.append('training-job')
            job_desc=sm.describe_training_job(TrainingJobName=job_name)
            try:
                BillableTimeInSeconds_l.append(job_desc['BillableTimeInSeconds'])
            except:
                BillableTimeInSeconds_l.append(0)
            try:
                TrainingTimeInSeconds_l.append(job_desc['TrainingTimeInSeconds'])
            except:
                TrainingTimeInSeconds_l.append(0)
            try:
                InstanceType_l.append(job_desc['ResourceConfig']['InstanceType'])
            except:
                InstanceType_l.append(0)
            try:
                InstanceCount_l.append(job_desc['ResourceConfig']['InstanceCount'])
            except:
                InstanceCount_l.append(0)
            ProcessingTimeInSeconds_l.append(0)
            TransformingTimeInSeconds_l.append(0)            
        elif 'processing-job' in j:
            job_type_l.append('processing-job')        
            job_desc=sm.describe_processing_job(ProcessingJobName=job_name)
            try:
                duration = job_desc['ProcessingEndTime'] - job_desc['ProcessingStartTime']
                ProcessingTimeInSeconds_l.append(float(str(duration.seconds) +'.'+ str(duration.microseconds)))
            except:
                ProcessingTimeInSeconds_l.append(0)
            try:
                InstanceType_l.append(job_desc['ProcessingResources']['ClusterConfig']['InstanceType'])
            except:
                InstanceType_l.append(0)
            try:
                InstanceCount_l.append(job_desc['ProcessingResources']['ClusterConfig']['InstanceCount'])
            except:
                InstanceCount_l.append(0)
            BillableTimeInSeconds_l.append(0)
            TrainingTimeInSeconds_l.append(0)      
            TransformingTimeInSeconds_l.append(0)
        elif 'transform-job' in j:
            job_type_l.append('transform-job')        
            job_desc=sm.describe_transform_job(TransformJobName=job_name)
            try:
                duration = job_desc['TransformEndTime'] - job_desc['TransformStartTime']
                TransformingTimeInSeconds_l.append(float(str(duration.seconds) +'.'+ str(duration.microseconds)))
            except:
                TransformingTimeInSeconds_l.append(0)
            try:
                InstanceType_l.append(job_desc['TransformResources']['InstanceType'])
            except:
                InstanceType_l.append(0)
            try:
                InstanceCount_l.append(job_desc['TransformResources']['InstanceCount'])
            except:
                InstanceCount_l.append(0)
            BillableTimeInSeconds_l.append(0)
            TrainingTimeInSeconds_l.append(0)  
            ProcessingTimeInSeconds_l.append(0)
JobsSummary = pd.DataFrame(list(zip(experiments_l, job_l, job_type_l, ProcessingTimeInSeconds_l, BillableTimeInSeconds_l, TrainingTimeInSeconds_l, TransformingTimeInSeconds_l, InstanceType_l, InstanceCount_l)), 
columns =['Experiment','Job Name', 'Job Type', 'ProcessingTimeInSeconds', 'BillableTimeInSeconds', 'TrainingTimeInSeconds', 'TransformingTimeInSeconds','InstanceType', 'InstanceCount'])
JobsSummary

INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: eta


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: max-depth


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: bytree


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: weight


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: bylevel


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: subsample


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: weight2


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: delta


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: delta2


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: bf


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: bf1


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: bf2


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: Fold1Eval


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: AllFoldsEval


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: pd


INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials


Processing experiment: prediction


Unnamed: 0,Experiment,Job Name,Job Type,ProcessingTimeInSeconds,BillableTimeInSeconds,TrainingTimeInSeconds,TransformingTimeInSeconds,InstanceType,InstanceCount
0,max-depth,basemodel-0-2021-02-02-23-57-21,training-job,0.000,2168,2168,0.000,ml.c5.xlarge,1
1,max-depth,basemodel-1-2021-02-02-23-57-23,training-job,0.000,1890,1890,0.000,ml.c5.xlarge,1
2,max-depth,basemodel-7-2021-02-02-23-57-36,training-job,0.000,1707,1707,0.000,ml.c5.xlarge,1
3,max-depth,basemodel-3-2021-02-02-23-57-27,training-job,0.000,1740,1740,0.000,ml.c5.xlarge,1
4,max-depth,basemodel-5-2021-02-02-23-57-32,training-job,0.000,1590,1590,0.000,ml.c5.xlarge,1
...,...,...,...,...,...,...,...,...,...
166,pd,sagemaker-scikit-learn-2021-02-28-19-03-47-082,processing-job,183.260,0,0,0.000,ml.t3.2xlarge,1
167,pd,propertyagefold0-2021-02-28-18-56-39-119,transform-job,0.000,0,0,231.442,ml.m5.xlarge,16
168,pd,sagemaker-scikit-learn-2021-02-28-18-40-11-992,processing-job,747.462,0,0,0.000,ml.t3.2xlarge,1
169,prediction,propertyagefold0-2021-02-28-19-18-25-542,transform-job,0.000,0,0,121.821,ml.m5.xlarge,5


3. Adding hourle instance price and recalculating total experiments cost. SAving to the log file

In [44]:
JobsSummary['TotalTimeHrs']=JobsSummary['InstanceCount']*JobsSummary['ProcessingTimeInSeconds']/60/60+JobsSummary['InstanceCount']*JobsSummary['BillableTimeInSeconds']/60/60+JobsSummary['InstanceCount']*JobsSummary['TransformingTimeInSeconds']/60/60
JobsSummary['PricePerHour']=JobsSummary['InstanceType'].map(cost_map)
JobsSummary['TotalPrice']=JobsSummary['TotalTimeHrs']*JobsSummary['PricePerHour']
JobsSummary.head()

Unnamed: 0,Experiment,Job Name,Job Type,ProcessingTimeInSeconds,BillableTimeInSeconds,TrainingTimeInSeconds,TransformingTimeInSeconds,InstanceType,InstanceCount,TotalTimeHrs,PricePerHour,TotalPrice
0,max-depth,basemodel-0-2021-02-02-23-57-21,training-job,0.0,2168,2168,0.0,ml.c5.xlarge,1,0.602222,0.238,0.143329
1,max-depth,basemodel-1-2021-02-02-23-57-23,training-job,0.0,1890,1890,0.0,ml.c5.xlarge,1,0.525,0.238,0.12495
2,max-depth,basemodel-7-2021-02-02-23-57-36,training-job,0.0,1707,1707,0.0,ml.c5.xlarge,1,0.474167,0.238,0.112852
3,max-depth,basemodel-3-2021-02-02-23-57-27,training-job,0.0,1740,1740,0.0,ml.c5.xlarge,1,0.483333,0.238,0.115033
4,max-depth,basemodel-5-2021-02-02-23-57-32,training-job,0.0,1590,1590,0.0,ml.c5.xlarge,1,0.441667,0.238,0.105117


In [47]:
#Saving into the Experiment log 
eu.SaveToExperimentLog(Experiments_file, 'Experiments Cost', JobsSummary)

## Total experiments time if the instances weer run sequntually, not in parallel

In [46]:
JobsSummary['TotalTimeHrs'].sum()

38.48246888888889

## Total experiments cost

In [45]:
JobsSummary['TotalPrice'].sum()

9.206323516666666