# Fundamentals of SmartSim and Online Inferencing of Machine Learning Model

In this lab we are learning the fundamental concepts in SmartSim and online inferencing of machine learning model 

We are importing Experiment library from SmartSIm

In [8]:
from smartsim import Experiment

# Experiment
The Experiment acts as both a factory class for constructing the stages of an experiment (Model, Ensemble, Orchestrator, etc.) as well as an interface to interact with the entities created by the experiment.

In [9]:
# Init Experiment and specify to launch locally
exp = Experiment(name="first-experiment", launcher="local")

# Model

Models are subclasses of SmartSimEntities and are created through the Experiment API. Models represent any computational kernel. Models are flexible enough to support many different applications, however, to be used with our clients (SmartRedis) the application will have to be written in Python, C, C++, or Fortran.


## Run settings
Models are given RunSettings objects that specify how a kernel should be executed with regard to the workload manager (e.g. Slurm) and the available compute resources on the system.

In [10]:
# settings to execute the command "echo hello!"
settings = exp.create_run_settings(exe="echo", exe_args="hello!", run_command=None)

# create the simple model instance so we can run it.
M1 = exp.create_model(name="tutorial-model", run_settings=settings)

starting model on our experiment

In [11]:
exp.start(M1, block=True, summary=True)

05:57:17 vm1 SmartSim[2492] INFO 

=== Launch Summary ===
Experiment: first-experiment
Experiment Path: /home/azureuser/first-experiment
Launcher: local
Models: 1
Database Status: inactive

=== Models ===
tutorial-model
Executable: /usr/bin/echo
Executable Arguments: hello!





                                                                                

05:57:29 vm1 SmartSim[2492] INFO tutorial-model(2698): Completed


Reading the output and error data directory

In [12]:
outputfile = './tutorial-model.out'
errorfile = './tutorial-model.err'

print("Content of tutorial-model.out:")
with open(outputfile, 'r') as fin:
    print(fin.read())
print("Content of tutorial-model.err:")
with open(errorfile, 'r') as fin:
    print(fin.read())

Content of tutorial-model.out:
hello!

Content of tutorial-model.err:



Running two concurrent models on an experiment

In [13]:
run_settings_1 = exp.create_run_settings(exe="echo", exe_args="hello!", run_command=None)
run_settings_2 = exp.create_run_settings(exe="sleep", exe_args="5", run_command=None)
model_1 = exp.create_model("tutorial-model-1", run_settings_1)
model_2 = exp.create_model("tutorial-model-2", run_settings_2)
exp.start(model_1, model_2)

05:57:43 vm1 SmartSim[2492] INFO tutorial-model-1(2703): Completed
05:57:46 vm1 SmartSim[2492] INFO tutorial-model-2(2704): Running
05:57:47 vm1 SmartSim[2492] INFO tutorial-model-2(2704): Completed


In [14]:
outputfilenew = './tutorial-model-1.out'
outputfilenew1 = './tutorial-model-2.out'

print("Content of tutorial-model-1.out:")
with open(outputfilenew, 'r') as fin:
    print(fin.read())
print("Content of tutorial-model-2.out:")
with open(outputfilenew1, 'r') as fin:
    print(fin.read())

Content of tutorial-model-1.out:
hello!

Content of tutorial-model-2.out:



# Ensembles

SmartSim has the ability to launch an Ensemble of Model applications simultaneously

In [15]:
# define how we want each ensemble member to execute
# in this case we create settings to execute "sleep 3"
ens_settings = exp.create_run_settings(exe="sleep", exe_args="3")

In [16]:
ensemble = exp.create_ensemble("ensemble-replica",
                               replicas=4,
                               run_settings=ens_settings)

exp.start(ensemble, summary=True)

05:58:03 vm1 SmartSim[2492] INFO 

=== Launch Summary ===
Experiment: first-experiment
Experiment Path: /home/azureuser/first-experiment
Launcher: local
Ensembles: 1
Database Status: inactive

=== Ensembles ===
ensemble-replica
Members: 4
Batch Launch: False





                                                                                

05:58:18 vm1 SmartSim[2492] INFO ensemble-replica_0(2710): Completed
05:58:18 vm1 SmartSim[2492] INFO ensemble-replica_2(2712): Completed
05:58:19 vm1 SmartSim[2492] INFO ensemble-replica_1(2711): Completed
05:58:19 vm1 SmartSim[2492] INFO ensemble-replica_3(2713): Completed
05:58:20 vm1 SmartSim[2492] INFO ensemble-replica_1(2711): Completed
05:58:20 vm1 SmartSim[2492] INFO ensemble-replica_3(2713): Completed


In [17]:
ens_settings1 = exp.create_run_settings(exe="echo", exe_args="hello-world")


ensemble1 = exp.create_ensemble("ensemble-replica",
                               replicas=4,
                               run_settings=ens_settings1)

exp.start(ensemble1, summary=True)

05:58:38 vm1 SmartSim[2492] INFO 

=== Launch Summary ===
Experiment: first-experiment
Experiment Path: /home/azureuser/first-experiment
Launcher: local
Ensembles: 1
Database Status: inactive

=== Ensembles ===
ensemble-replica
Members: 4
Batch Launch: False





                                                                                

05:58:51 vm1 SmartSim[2492] INFO ensemble-replica_0(2719): Completed
05:58:51 vm1 SmartSim[2492] INFO ensemble-replica_1(2720): Completed
05:58:51 vm1 SmartSim[2492] INFO ensemble-replica_2(2721): Completed
05:58:53 vm1 SmartSim[2492] INFO ensemble-replica_3(2722): Completed


pip install smartredis[dev]

# Orchestrator

The Orchestrator is an in-memory database that is launched prior to all other entities within an Experiment. The Orchestrator can be used to store and retrieve data during the course of an experiment and across multiple entities. In order to stream data into or receive data from the Orchestrator, one of the SmartSim clients (SmartRedis) has to be used within a Model.

In [18]:
from smartredis import Client
import numpy as np

REDIS_PORT=6899

In [19]:
# start a new Experiment for this section
exp = Experiment("tutorial-smartredis", launcher="local")

# create and start an instance of the Orchestrator database
db = exp.create_database(db_nodes=1,
                         port=REDIS_PORT,
                         interface="lo")
# create an output directory for the database log files
exp.generate(db)

# start the database
exp.start(db)

05:59:12 vm1 SmartSim[2492] INFO Working in previously created experiment


In [20]:
# connect a SmartRedis client at the address supplied by the launched
# Orchestrator instance.
# Cluster=False as the Orchestrator was deployed on a single compute host (local)
client = Client(address=db.get_address()[0], cluster=False)

In [21]:
send_tensor = np.ones((4,3,3))

client.put_tensor("tutorial_tensor_1", send_tensor)

receive_tensor = client.get_tensor("tutorial_tensor_1")

print('Receive tensor:\n\n', receive_tensor)

Receive tensor:

 [[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]


In [22]:
exp.stop(db)

05:59:55 vm1 SmartSim[2492] INFO Stopping model orchestrator_0 with job name orchestrator_0-CNELZGS4HLQT


# Inference ML model using SmartSim

Combined with the SmartRedis clients, the Orchestrator is capable of hosting and executing AI models written in Python on CPU or GPU. The Orchestrator supports models written with TensorFlow, Pytorch, TensorFlow-Lite, or models saved in an ONNX format (e.g. sci-kit learn). Here we are using TensorFlow

In [1]:
# some helper libraries for the tutorial
import io
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import logging
import numpy as np

# import smartsim and smartredis
from smartredis import Client
from smartsim import Experiment

In [2]:
exp = Experiment("Inference-Tutorial", launcher="local")

In [3]:
db = exp.create_database(port=6780, interface="lo")
exp.start(db)

Now we can create a very simple fully connected network on TensorFlow.Keras API

In [4]:
## TensorFlow and Keras
import tensorflow as tf
from tensorflow import keras
tf.get_logger().setLevel(logging.ERROR)

# create a simple Fully connected network in Keras
model = keras.Sequential(
    layers=[
        keras.layers.InputLayer(input_shape=(28, 28), name="input"),
        keras.layers.Flatten(input_shape=(28, 28), name="flatten"),
        keras.layers.Dense(128, activation="relu", name="dense"),
        keras.layers.Dense(10, activation="softmax", name="output"),
    ],
    name="FCN",
)

# Compile model with optimizer
model.compile(optimizer="adam",
            loss="sparse_categorical_crossentropy",
            metrics=["accuracy"])

After a model is created (trained or not), the graph of the model is frozen and saved to file so the client method client.set_model_from_file can load it into the database.

SmartSim includes a utility to freeze the graph of a TensorFlow or Keras model in smartsim.ml.tf. To use TensorFlow or Keras in SmartSim, specify TF as the argument for backend in the call to client.set_model or client.set_model_from_file.

In [5]:
client = Client(address=db.get_address()[0], cluster=False)

from smartsim.ml.tf import freeze_model

# SmartSim utility for Freezing the model and saving it to a file.
model_path, inputs, outputs = freeze_model(model, os.getcwd(), "fcn.pb")

# use the same client we used for PyTorch to set the TensorFlow model
# this time the method for setting a model from a saved file is shown.
# TensorFlow backed requires named inputs and outputs on graph
# this differs from PyTorch and ONNX.
client.set_model_from_file(
    "keras_fcn", model_path, "TF", device="CPU", inputs=inputs, outputs=outputs
)

# put random random input tensor into the database
input_data = np.random.rand(1, 28, 28).astype(np.float32)
client.put_tensor("input", input_data)

# run the Fully Connected Network model on the tensor we just put
# in and store the result of the inference at the "output" key
client.run_model("keras_fcn", "input", "output")

# get the result of the inference
pred = client.get_tensor("output")
print(pred)

[[0.13783866 0.08369803 0.10155473 0.08805773 0.0864891  0.16028616
  0.11356978 0.08432745 0.03862334 0.10555508]]
