# Tensorflow Basic demo

This notebook contains a lightweight demonstration of the current Dioptra demo setup with MLflow.

## Setup

Below we import the necessary Python modules and ensure the proper environment variables are set so that all the code blocks will work as expected,

In [None]:
# Import packages from the Python standard library
import os
import pprint
import time
import warnings
from pathlib import Path
from typing import Tuple

# Filter out warning messages
warnings.filterwarnings("ignore")

# Please enter custom username here.
USERNAME = "howard"

# Experiment name (note the username_ prefix convention)
EXPERIMENT_NAME = f"{USERNAME}_basic"

# Address for connecting the docker container to exposed ports on the host device
HOST_DOCKER_INTERNAL = "host.docker.internal"
# HOST_DOCKER_INTERNAL = "172.17.0.1"

# Dioptra API ports
RESTAPI_PORT = "30080"
MLFLOW_TRACKING_PORT = "35000"

# Default address for accessing the RESTful API service
RESTAPI_ADDRESS = (
    f"http://{HOST_DOCKER_INTERNAL}:{RESTAPI_PORT}"
    if os.getenv("IS_JUPYTER_SERVICE")
    else f"http://localhost:{RESTAPI_PORT}"
)

# Override the DIOPTRA_RESTAPI_URI variable, used to connect to RESTful API service
os.environ["DIOPTRA_RESTAPI_URI"] = RESTAPI_ADDRESS

# Default address for accessing the MLFlow Tracking server
MLFLOW_TRACKING_URI = (
    f"http://{HOST_DOCKER_INTERNAL}:{MLFLOW_TRACKING_PORT}"
    if os.getenv("IS_JUPYTER_SERVICE")
    else f"http://localhost:{MLFLOW_TRACKING_PORT}"
)

# Override the MLFLOW_TRACKING_URI variable, used to connect to MLFlow Tracking service
os.environ["MLFLOW_TRACKING_URI"] = MLFLOW_TRACKING_URI

# Base API address
RESTAPI_API_BASE = f"{RESTAPI_ADDRESS}/api"

# Path to workflows archive
WORKFLOWS_TAR_GZ = Path("workflows.tar.gz")

# Import third-party Python packages
import numpy as np
import requests
from mlflow.tracking import MlflowClient

# Import utils.py file
import utils

# Create random number generator
rng = np.random.default_rng(54399264723942495723666216079516778448)

## Submit and run jobs

The entrypoints that we will be running in this example are implemented in the Python source files under `src/` and the `MLproject` file.
To run these entrypoints within Dioptra's architecture, we need to package those files up into an archive and submit it to the Dioptra RESTful API to create a new job.
For convenience, the `Makefile` provides a rule for creating the archive file for this example, just run `make workflows`,

In [None]:
%%bash

# Create the workflows.tar.gz file
make workflows

To connect with the endpoint, we will use a client class defined in the `utils.py` file that is able to connect with the Dioptra RESTful API using the HTTP protocol.
We connect using the client below, which uses the environment variable `DIOPTRA_RESTAPI_URI` to figure out how to connect to the Dioptra RESTful API,

In [None]:
restapi_client = utils.DioptraClient()

We need to register an experiment under which to collect our job runs.
The code below checks if the relevant experiment exists.
If it does, then it just returns info about the experiment, if it doesn't, it then registers the new experiment.

In [None]:
response_experiment = restapi_client.get_experiment_by_name(name=EXPERIMENT_NAME)

if response_experiment is None or "Not Found" in response_experiment.get("message", []):
    response_experiment = restapi_client.register_experiment(name=EXPERIMENT_NAME)

response_experiment

We also need to register the name of the queue that is being watched for our jobs.
The code below checks if the relevant queue named `"tensorflow_cpu"` exists.
If it does, then it just returns info about the queue, if it doesn't, it then registers the new queue.

In [None]:
response_queue = restapi_client.get_queue_by_name(name="tensorflow_cpu")

if response_queue is None or "Not Found" in response_queue.get("message", []):
    response_queue = restapi_client.register_queue(name="tensorflow_cpu")

response_queue

# Baseline Demo: Defining Job Parameters:

Here we will submit a basic job through MLflow.

In [None]:
# Helper function
def mlflow_run_id_is_not_known(response):
    return response["mlflowRunId"] is None and response["status"] not in [
        "failed",
        "finished",
    ]


# Submit baseline job:          
basic_job = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="hello_world",
    entry_point_kwargs=" ".join([
    ]),
)

print("Basic job submitted.")
print("")
pprint.pprint(basic_job)

# Retrieve mlflow run_id
while mlflow_run_id_is_not_known(basic_job):
    time.sleep(1)
    basic_job = restapi_client.get_job_by_id(basic_job["jobId"])

Now we can query the job to view its output:

In [None]:
# Next we can see the baseline output from the job:

mlflow_client = MlflowClient()
basic_job_query  = mlflow_client.get_run(basic_job["mlflowRunId"])

pprint.pprint(basic_job_query.data.params)
pprint.pprint(basic_job_query.data.tags)

To customize job parameters, add `"-P job_property=<job_value>"` to the `entry_point_kwargs` field in the job submission script:

In [None]:
# Submit baseline job:          
basic_job = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    entry_point="hello_world",
    entry_point_kwargs=' '.join([
        '-P output_log_string="Hello_again!"'
    ]),
)

print("Basic job submitted.")
print("")
pprint.pprint(basic_job)


# Retrieve mlflow run_id
while mlflow_run_id_is_not_known(basic_job):
    time.sleep(1)
    basic_job = restapi_client.get_job_by_id(basic_job["jobId"])

Next we can see the baseline output from the job.
The output has changed due to the new user parameter.

In [None]:
mlflow_client = MlflowClient()
basic_job_query  = mlflow_client.get_run(basic_job["mlflowRunId"])

pprint.pprint(basic_job_query.data.params)