# MDML Tutorial
The Manufacturing Data and Machine Learning platform (MDML) is a set of open source software to aid researchers in streaming, visualizing, and analyzing data in near real time. The MDML was created to support researchers in their work by creating infrastructure to perform common tasks with sensor and experiment data. 

## MDML Client
An instance of the MDML can be interacted with through the python client. This is installed with `pip install mdml_client`. The client uses an `Experiment` object for any communications with the MDML instance. There are a few [helper functions](#helper_functions) outside of this class that can help with certain tasks.

In [21]:
# Importing libraries used in this tutorial
import time
import random
import mdml_client as mdml
from funcx.sdk.client import FuncXClient

## Connecting to an MDML instance
The first step in using the MDML is connecting to it. This is done by creating an `Experiment` object. Below we create this object using credentials already configured on the MDML instance. The username and password is `tutorial`. The experiment ID is `TUTORIAL` - this ID is unique and is what separates data from different experiments. We will build a dictionary containing all of the connection information since it will come in handy later when we run an analysis. [Functionality for TUTORIAL-xxx is in the MDML but how do we want to generate the random part?] 

In [22]:
# Connection parameters
params = {
    "EXPERIMENT_ID": "TUTORIAL", 
    "USERNAME": "tutorial",
    "PASSWORD": "tutorial",
    "MDML_HOST": "52.4.135.44"
}

# Connecting to MDML
exp = mdml.experiment(params["EXPERIMENT_ID"], params["USERNAME"], params["PASSWORD"], params["MDML_HOST"])

## Configuring an experiment
The MDML knows what to do with your streamed data because of the experiment's configuration file. This file can be created manually. However, for the purposes of this tutorial, we will let the MDML create the configuration for us and ignore its syntax. In either case, adding your configuration is done in two steps. The first step adds the config locally and runs some checks on the syntax. The second step actually sends the configuration to the MDML instance. Any data streamed before the experiment's configuration has been ingested by the MDML will be ignored.    

In [23]:
# Add experiment configuration file locally. The second parameter is the experiment run ID. 
exp.add_config(experiment_run_id="notebook_tutorial", auto=True)

# Sending config to MDML
exp.send_config()

True

## Streaming Data

Now that the MDML has a configuration we can start sending data. This example publishes three values: `time`, `val1`, and `val2` under a device named `DEVICE1`. `val1` and `val2` are random integers between 0-50 and 51-100, respectively. The `time` value is used by MDML's timeseries database, InfluxDB. InfluxDB uses Unix time in nanoseconds as the timestamp format for organizing data. It is strongly recommended to send a timestamp with each data message. If not, InfluxDB will use the time that the data was inserted. See [Helper Functions](#helper_functions) for a description of mdml.unix_time().

In [25]:
# Create random data
dat = {
    "time": mdml.unix_time(ret_int=True),
    "val1": random.randint(0,50), 
    "val2": random.randint(51,100)
}
# Send data to the MDML
exp.publish_data("DEVICE1", dat, add_device=True)
# The add_device option is added here since we want the MDML to automatically create/build the configuration file.
# Including it in subsequent calls is okay but not needed.

# Spreading out points to visualize
time.sleep(3)

# Send a second set of points
dat = [mdml.unix_time(ret_int=True), random.randint(0,50), random.randint(51,100)]
exp.publish_data("DEVICE1", dat)

# Spreading out points to visualize
time.sleep(3)

# Send a third set of points
dat = [mdml.unix_time(ret_int=True), random.randint(0,50), random.randint(51,100)]
exp.publish_data("DEVICE1", dat)

## Real-time analysis with FuncX

[Ryan: would you like to write a line or two about funcx here?]


## Login with FuncX
FuncX requires that users log in to ensure that no one can run functions on endpoints they are not allowed to. This access control also extends to FuncX functions. Since the MDML handles the invokation of functions as well as retrieving return values, your FuncX authentication token is needed.

In [10]:
# First login with FuncX to retrieve token
exp.globus_login()

## Creating a function for FuncX

In [11]:
# Defining a function that adds two numbers streamed to the MDML
def sum_vars(params):
    import mdml_client as mdml
    query = [{
        "device": "DEVICE1",
        "variables": [],
        "last" : 1
    }]

    # Connect to MDML
    exp = mdml.experiment(params["EXPERIMENT_ID"], params["USERNAME"], params["PASSWORD"], params["MDML_HOST"])
    # Query for data
    dat = exp.query(query, verify_cert=False)
    # Pull out the first (only) row of the data
    row = dat['DEVICE1'][0]
    # Sum together
    var_sum = int(row['val1']) + int(row['val2'])
    # Return MDML-friendly data structure
    return {"sum": var_sum}

## Registering the function with FuncX

In [12]:
# Registering the function with FuncX 
fxc = FuncXClient()
funcx_func_uuid = fxc.register_function(sum_vars,
    description="Sum 2 variables")

## Running the function with MDML
Now, all we need to run an analysis are the FuncX UUIDs for our function and endpoint and any parameters. We are using a public FuncX endpoint created for this tutorial. The dictionary with our MDML connection infor will be used as the parameters here.

In [14]:
# Using FuncX's tutorial endpoint
funcx_endp_uuid = "cbdf256e-4a5a-4c56-84c0-d3a051a8eaa9" # public tutorial endpoint

# Send message to start analysis
exp.publish_analysis("ANALYSIS", funcx_func_uuid, funcx_endp_uuid, params)

## Ending an MDML experiment

Lastly, a reset message is sent to the MDML to end the experiment. This archives all of the data into a tar file. You can access this tar file through the MDML's object store. By default, resetting an experiment removes data from the timeseries database which means it will no longer appear in dashboards or be queriable for analyses. 

In [20]:
exp.reset()
exp.disconnect()

Disconnected from MDML.


<a id='helper_functions'></a>

## Helper Functions
* __mdml.unix_time()__: Returns the unix time in nanoseconds (time format used in the MDML) as a string. Use param `ret_int=True` to return as an integer.  
* __mdml.read_image()__: Transforms an image file into a byte string needed to send with Experiment.publish_image()