# Applied Machine Learning
## Inference
- Author: Lorien Pratt
- Copyright: Quantellia LLC 2019.  All Rights Reserved

This notebook runs machine learning inference on an already-trained model, using data obtained from interactive front-end widgets

### Setup

Load the interact library, which is used to create interactive widgets.

In [1]:
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual, Label, HBox, SliderStyle
import ipywidgets as widgets

Enable integration between Python and R using the [__`RPy2`__](https://rpy2.readthedocs.io/en/version_2.8.x/) python package developed by Laurent Gautier and the rest of the rpy2 team.

In [2]:
load_ext rpy2.ipython

In [3]:
import rpy2.rinterface 

In [4]:
%R my_initials<-"jing" # Set my initials to use for model and data files

array(['jing'], 
      dtype='|S4')

Start up R and H2o

In [5]:
# This is only needed the first time you run on this server
#%%R
#install.packages("h2o")

In [6]:
%R require(h2o) #Note this will generate warnings, but these are really informational, not warnings


----------------------------------------------------------------------

Your next step is to start H2O:
    > h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai

----------------------------------------------------------------------


Attaching package: ‘h2o’



    cor, sd, var



    &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
    colnames<-, ifelse, is.character, is.factor, is.numeric, log,
    log10, log1p, log2, round, signif, trunc




array([1], dtype=int32)

Create the R model inference function

In [7]:
#%%R
## stub function for testing
## This version of the MLmodel uses a vector of inputs instead of individual arguments
#MLmodelV<-function(a){return(a[1] + a[2])} 
#MLmodelV(c(3,2))

### Start h2o

In [8]:
# Note this will generate several messages; they are not errors
%R h2o.init()


H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /tmp/RtmpTx4gKx/h2o_jupyter_started_from_r.out
    /tmp/RtmpTx4gKx/h2o_jupyter_started_from_r.err


Starting H2O JVM and connecting: . Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 818 milliseconds 
    H2O cluster timezone:       Etc/UTC 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.26.0.10 
    H2O cluster version age:    5 days  
    H2O cluster name:           H2O_started_from_R_jupyter_mcy252 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   0.39 GB 
    H2O cluster total cores:    1 
    H2O cluster allowed cores:  1 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    H2O API Extensions:         Amazon S3, XGBoost, Algos, AutoML, C

### Read in the model file from disk

In [9]:
%%R
# Make name of model file from my initials
# Note that loadModel requires you specify the model name, which is the file within the directory
# that was created during saveModel. This is weird.
model_filename<-paste0("models/",my_initials,"_auto_model/model_1")
model=h2o.loadModel(model_filename)

Make the ModelV function so that it puts the input data into a dataframe and converts that to _hex, which is what's needed to use the model to do a prediction (inference)

In [10]:
%%R
MLmodelV<-function(a){
    # Convert to a data frame, needed for conversion to hex, needed to
    # do predictions
    # transpose (t) is so that we get one row instead of one column in the df
    test_data_df<-data.frame(t(matrix(a)))
    # Make the column names in this data frame the same as the training data names.  The model
    # object kindly stores these for us (see str(model) to inspect this)
    names(test_data_df)<- model@parameters$x
    # Convert from a data frame to the hex format needed for h2o prediction
    # the capture.output wrapper is to suppress the progress bars that h2o normally generates
    capture.output(test_data_hex<-as.h2o(test_data_df,quite_mode=TRUE),file="NUL")
    # Run machine learning inference: what does this model predict for this data?
    capture.output(result <-h2o.predict(model, test_data_hex),file="NUL") 
    return(as.numeric(result[1,1]))
}

# Testing code:
#str(model@parameters$x)
#model@parameters$x
##actual_column <- as.logical(as.vector(as.numeric(test_hex[ ,ncol(test_hex)])))
#actual_column <- as.vector(as.numeric(test_hex[targetcol]))
#predict_column <- as.vector(predictions[ ,'predict'])
#MLmodelV<-function(a){
#    return(a[1] + a[2])} 
#MLmodelV(c(3,2))

In [11]:
## Only for testing
# Make a test data set to send to the model file, so we don't have to test with the UI
#test_data=[1,2,3,4,5,6,7]
#%Rpush test_data
#%R result<-MLmodelV(test_data);
#%R -o result 
#print(result)

In [12]:
## Only for testing
#%%R
#x<-h2o.predict(model, test_data_hex)

Create the python function that interfaces to R; it also vectorizes the input values and prints the mpg result

In [13]:
def f2(a,b,c,d,e,f,g):
    q=[a,b,c,d,e,f,g] # Create a vector from the individual arguments before sending to R
    %Rpush q
    %%R result<-MLmodelV(q);
    %R -o result 
    print('Estimated mpg: ', result)
f2(1,2,4,5,6,7,8)

Estimated mpg:  [ 32.50120676]


### Set up the UI to accesss the ML model

Create styles to be used for the layout

In [14]:
# Set up descripton fields to not be truncated
# See http://www.blog.pythonlibrary.org/2018/10/24/working-with-jupyter-notebook-widgets/
style = {'description_width': 'initial'}
style = {'description_width': '300px'} # width of the box for the descriptoin
layout=widgets.Layout(width='600px')   # Size of the entire box for the widget

Create widgets (title and sliders) for the UI

In [15]:
# Here's the result of analyzing each field to see what its range is. We'll use this to determine
# what the right range for inference sliders is
#print(summary(auto$cylinders)) # we'll call this 3-8 on an integer range
#print(summary(auto$displacement)) # we'll call this 0 - 500 on an integer range
#print(summary(auto$horsepower)) # we'll call this 0 - 500 on an integer range
#print(summary(auto$weight)) # we'll call this 1000 - 10000 on an integer range
#print(summary(auto$acceleration)) # we'll call this 0-100 on a float range
#print(summary(auto$model.year)) # we'll call this 60-90 on an integer range
#print(summary(auto$origin)) # we'll call this 1,2,3 on an integer range
auto_title_widget=widgets.HTML('<h2>Use a machine learning model to estimate car miles per gallon (MPG)</h2>',layout=layout)
auto_cylinders_widget=widgets.IntSlider(min=3,max=8, description='cylinders',style=style, 
                                        layout=layout)
auto_displacement_widget=widgets.IntSlider(min=0,max=500,description='engine displacement (cu inches)',style=style,
                                          layout=layout)
auto_horsepower_widget=widgets.IntSlider(min=0,max=500,description='horsepower',style=style,
                                          layout=layout)
auto_weight_widget=widgets.IntSlider(min=1000,max=10000,description='weight (in pounds)',style=style,
                                          layout=layout)
auto_acceleration_widget=widgets.FloatSlider(min=0,max=100,description='acceleration (time (sec) from 0-60)',style=style,
                                          layout=layout)
auto_modelyear_widget=widgets.IntSlider(min=60,max=90,description='model year',style=style,
                                          layout=layout)
auto_origin_widget=widgets.IntSlider(min=1,max=3,description='origin (1=american, 2=european, 3=japanese)',style=style,
                                          layout=layout)

Assemble the widgets into a UI

In [16]:
# Create a list of widgets to be passed to the UI below
widget_list = [ auto_title_widget,
                   auto_cylinders_widget,
                   auto_displacement_widget,
                   auto_horsepower_widget,
                   auto_weight_widget,
                   auto_acceleration_widget,
                   auto_modelyear_widget,
                   auto_origin_widget
                  ]

# Organize the widgets into a box
# Make a bit bigger layout than the widgets to avoid a horizontal scroll bar
ui = widgets.GridBox( widget_list,layout=widgets.Layout(width='800px') ) 

out = widgets.interactive_output(f2, {'a': auto_cylinders_widget, 
                                      'b': auto_displacement_widget,
                                      'c': auto_horsepower_widget,
                                      'd': auto_weight_widget,
                                      'e': auto_acceleration_widget,
                                      'f': auto_modelyear_widget,
                                      'g': auto_origin_widget
                                     })
# Other arguments I might think about using for the grid box
#widgets.GridBox(items, layout=widgets.Layout(grid_template_columns="repeat(3, 100px)"))

Turn off progress bars in H2o, so they won't show up during inference

In [17]:
%%R
h2o.no_progress()

Invoke the UI, which combines all of the above elements

In [18]:
display(ui, out)

R3JpZEJveChjaGlsZHJlbj0oSFRNTCh2YWx1ZT11JzxoMj5Vc2UgYSBtYWNoaW5lIGxlYXJuaW5nIG1vZGVsIHRvIGVzdGltYXRlIGNhciBtaWxlcyBwZXIgZ2FsbG9uIChNUEcpPC9oMj4nLCDigKY=


T3V0cHV0KG91dHB1dHM9KHt1J291dHB1dF90eXBlJzogdSdzdHJlYW0nLCB1J25hbWUnOiB1J3N0ZG91dCcsIHUndGV4dCc6IHUnRXN0aW1hdGVkIG1wZzogIFsgMzEuNTY5NjkwNDFdXG4nfSzigKY=
