# Predicting CPU perfomance (Multimodel use case)

### Description
This notebook show how to build a set of models for one query. Every model is related to the unique values of one column.

For this query to table box.stat.unix.dstatLt1 

response = serrea_api.query(query="from box.stat.unix.dstatLt1 select ifthenelse(swapUsed>0,swapUsed/(swapUsed+swapFree)*100, 0.0)"
                           " as _swap select memUsed / (memUsed + memBuff + memCach + memFree) * 100 as _mem group every "
                           "1m by machine every 1h select avg(100 - cpuIdl) as cpu select avg(_mem) as mem select avg(_swap) "
                           "as swap select avg(cpuUsr) as user select avg(cpuSys) as system select avg(dskRead) as dskRead "
                           "select avg(dskWrit) as dskWrit select avg(netRecv) as netRecv select avg(netSend) as netSend "
                           "select avg(loadOne) as loadOne select avg(loadFiv) as loadFiv select avg(loadFif) as loadFif "
                           "where machine = 'aws-api-euw1-54-155-137-86' or machine = 'aws-api-euw1-54-155-154-74'"
                           "or machine = 'aws-api-euw1-54-155-156-119' or machine = 'aws-apiodata-euw1-52-18-211-116' 
                           "or machine = 'aws-apiodata-euw1-52-49-216-97' or machine = 'aws-asilo-euw1-54-74-172-17' 
                           "or machine = 'aws-batracentral-euw1-54-155-147-61'"
                           "or machine = 'aws-batracentral-euw1-54-228-243-123' or machine = 'aws-batracentral-euw1-54-73-163-217'"
                           "or machine = 'aws-datanode-euw1-54-155-73-49' or machine = 'aws-datanode-euw1-54-170-191-194'"
                           "or machine = 'aws-datanode-euw1-54-170-249-16' or machine = 'aws-datanode-euw1-54-170-33-138'"
                           "or machine = 'aws-datanode-euw1-54-170-54-24' or machine = 'aws-datanode-euw1-54-170-61-12'"
                           "or machine = 'aws-datanode-euw1-54-216-128-44' or machine = 'aws-datanode-euw1-54-216-136-88'"
                           "or machine = 'aws-datanode-euw1-54-216-160-50'or machine = 'aws-datanode-euw1-54-216-180-152'"
                           "or machine = 'machine aws-datanode-euw1-54-216-189-12'",
                     dates= {'from': "2017-02-01 00:00:00", 'to': "2017-05-01 00:00:00"},
                     response="csv",
                     stream=False)
                     
We are going to build one model for each machine to predict their cpu performance


# H2O.ai

In [None]:
import os
ARTIFACTS_PATH = '../../artifacts/h2o/'
os.makedirs(ARTIFACTS_PATH, exist_ok=True) # Create path if not exists

In [None]:
import h2o
h2o.init()

In [None]:
modelsdir = "multimodel/"

import pandas as pd
df = pd.read_csv("../../data/cpu/training.csv")

y = 'cpu'
x = ['mem', 'swap', 'system', 'dskRead', 'dskWrit', 'loadOne', 'loadFiv', 'loadFif', 'netRecv', 'netSend']


## Creating Models

In [None]:

#first of all: erase existent files in the models folder
import glob
import os
print('Erase existing files')
files = glob.glob(modelsdir + '/*')
for f in files:
    os.remove(f)
    
from h2o.estimators import H2OGradientBoostingEstimator   

k = 0
mse = []
maxNumberMachines = 20
for i in df['machine'].unique()[0:maxNumberMachines]:
    print('Machine number: ' + str(k))
    k = k + 1
    print('Training model for machine ' + i + ':')
    reduced_df = df[df['machine'] == i]
    hf = h2o.H2OFrame(reduced_df)
    trainPercentage = 60
    validationPercentage = 20
    train = hf[range(0, hf.shape[0] * trainPercentage // 100, 1), :]
    validation = hf[range(hf.shape[0] * trainPercentage// 100, hf.shape[0] * (trainPercentage + validationPercentage) // 100, 1), :]
    test = hf[range(hf.shape[0] * 80 // 100, hf.shape[0], 1), :]

   
    gbm_model = H2OGradientBoostingEstimator(distribution="gaussian", seed = 42, nfolds = 5, ntrees = 100, learn_rate = 0.1,
                                               sample_rate = 0.6, col_sample_rate = 0.7, ignore_const_cols = False)

    gbm_model.train(x = x, y = y, training_frame = train, validation_frame = validation, model_id = i)


    #saving model mojo to mlmodels folder
    model_file = gbm_model.download_mojo(path = ARTIFACTS_PATH + '/',
                                           get_genmodel_jar = False)

    os.rename(model_file, modelsdir + '/' + i + ".zip")

In [17]:

#first of all: erase existent files in the models folder
import glob
import os
print('Erase existing files')
files = glob.glob(modelsdir + '/*')
for f in files:
    os.remove(f)
    
from h2o.estimators import H2OGradientBoostingEstimator   

k = 0
mse = []
maxNumberMachines = 20
for i in df['machine'].unique()[0:maxNumberMachines]:
    print('Machine number: ' + str(k))
    k = k + 1
    print('Training model for machine ' + i + ':')
    reduced_df = df[df['machine'] == i]
    hf = h2o.H2OFrame(reduced_df)
    trainPercentage = 60
    validationPercentage = 20
    train = hf[range(0, hf.shape[0] * trainPercentage // 100, 1), :]
    validation = hf[range(hf.shape[0] * trainPercentage// 100, hf.shape[0] * (trainPercentage + validationPercentage) // 100, 1), :]
    test = hf[range(hf.shape[0] * 80 // 100, hf.shape[0], 1), :]

   
    gbm_model = H2OGradientBoostingEstimator(distribution="gaussian", seed = 42, nfolds = 5, ntrees = 100, learn_rate = 0.1,
                                               sample_rate = 0.6, col_sample_rate = 0.7, ignore_const_cols = False)

    gbm_model.train(x = x, y = y, training_frame = train, validation_frame = validation, model_id = i)


    #saving model mojo to mlmodels folder
    model_file = gbm_model.download_mojo(path = ARTIFACTS_PATH + '/',
                                           get_genmodel_jar = False)

    os.rename(model_file, modelsdir + '/' + i + ".zip")

Erase existing files
Machine number: 0
Training model for machine aws-api-euw1-54-155-137-86:
Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm Model Build progress: |███████████████████████████████████████████████| 100%
Machine number: 1
Training model for machine aws-api-euw1-54-155-154-74:
Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm Model Build progress: |███████████████████████████████████████████████| 100%
Machine number: 2
Training model for machine aws-api-euw1-54-155-156-119:
Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm Model Build progress: |███████████████████████████████████████████████| 100%
Machine number: 3
Training model for machine aws-apiodata-euw1-52-18-211-116:
Parse progress: |█████████████████████████████████████████████████████████| 100%
gbm Model Build progress: |███████████████████████████████████████████████| 100%
Machine number: 4
Training model 