# Tensorflow Road Signs YOLO Demo

>⚠️ **Warning:** This demo assumes that you have access to an on-prem deployment of Dioptra that provides a copy of the Road Signs dataset and a CUDA-compatible GPU.
> This demo cannot be run on a typical personal computer.

The demo provided in the Jupyter notebook `demo.ipynb` contains an example of how to set up and train a model based on the YOLO v1 architecture and use it to perform object detection on the Road Signs dataset.

## Setup

In [None]:
# Import packages from the Python standard library
import os
import pprint
import warnings
from pathlib import Path

# Please enter custom username here.
USERNAME = "dioptra_user"

# Filter out warning messages
warnings.filterwarnings("ignore")

# Address for connecting the docker container to exposed ports on the host device
HOST_DOCKER_INTERNAL = "host.docker.internal"
# HOST_DOCKER_INTERNAL = "172.17.0.1"

# Testbed API ports
RESTAPI_PORT = "20080"
MLFLOW_TRACKING_PORT = "25000"

# Default address for accessing the RESTful API service
if os.getenv("DIOPTRA_RESTAPI_URI") is None:
    RESTAPI_ADDRESS = (
        f"http://{HOST_DOCKER_INTERNAL}:{RESTAPI_PORT}"
        if os.getenv("IS_JUPYTER_SERVICE")
        else f"http://localhost:{RESTAPI_PORT}"
    )

    # Override the AI_RESTAPI_URI variable, used to connect to RESTful API service
    os.environ["DIOPTRA_RESTAPI_URI"] = RESTAPI_ADDRESS

else:
    RESTAPI_ADDRESS = os.environ["DIOPTRA_RESTAPI_URI"]

# Default address for accessing the MLFlow Tracking server
if os.getenv("MLFLOW_TRACKING_URI") is None:
    MLFLOW_TRACKING_URI = (
        f"http://{HOST_DOCKER_INTERNAL}:{MLFLOW_TRACKING_PORT}"
        if os.getenv("IS_JUPYTER_SERVICE")
        else f"http://localhost:{MLFLOW_TRACKING_PORT}"
    )

    # Override the MLFLOW_TRACKING_URI variable, used to connect to MLFlow Tracking service
    os.environ["MLFLOW_TRACKING_URI"] = MLFLOW_TRACKING_URI

else:
    MLFLOW_TRACKING_URI = os.environ["MLFLOW_TRACKING_URI"]

# Path to custom task plugins archives
CUSTOM_PLUGINS_BACKEND_CONFIGS_TAR_GZ = Path("custom-plugins-backend-configs.tar.gz")
CUSTOM_PLUGINS_EVALUATION_TAR_GZ = Path("custom-plugins-evaluation.tar.gz")
CUSTOM_PLUGINS_ROADSIGNS_YOLO_TAR_GZ = Path("custom-plugins-roadsigns-yolo.tar.gz")
CUSTOM_PLUGINS_TRACKING_TAR_GZ = Path("custom-plugins-tracking.tar.gz")

# Base API address
RESTAPI_API_BASE = f"{RESTAPI_ADDRESS}/api"

# Path to workflows archive
WORKFLOWS_TAR_GZ = Path("workflows.tar.gz")

# Experiment name (note the username_ prefix convention)
EXPERIMENT_NAME = f"{USERNAME}_roadsigns_yolo"

# Path to read-only datasets directory (inside container)
DATASET_DIR = "/nfs/data"

# Import third-party Python packages
import numpy as np

# Import utils.py file
import utils

# Create random number generator
rng = np.random.default_rng(54399264723942495723666216079516778448)

## Dataset

This demo provides updated labels and a predefined train/test split for the Road Signs dataset (https://makeml.app/datasets/road-signs).
Use the script `data/download_roadsigns_data.py` to download the original dataset and unpack and organize the files into the predefined train/test split.
The script also prepends all the filenames with a 5-digit prefix (`00001` through `00136`) that groups the images into "tracks".
The `00000` prefix is reserved for images that do not belong to a track.

A track is a sequence of correlated images sampled from a video clip of the same physical, real-world object.
Because the images within each track are highly correlated, each track should either be placed in the training set or the testing set as a group.
Splitting the images from a single track across the training and testing set will result in data leakage.

Object bounding boxes are provided in the PascalVOC format, which separates the images and annotations into the following folders:

    annotations/   (xml files)
    images/        (png files)

The PascalVOC format uses filenames to associate images with their corresponding annotation.
For example, the image file `images/00000_road10.png` would have an associated annotation file `annotations/00000_road10.xml`.

To use the download script, ensure that you have the following packages installed in your current Python environment:

- click
- pandas
- requests
- rich

Then run the following to download the data:

```sh
python ./data/download_roadsigns_data.py --data-dir ./data --clean --upgrade
```

After downloading the dataset using the `data/download_roadsigns_data.py` script, the data will have the following folder structure:

    Road-Sign-Detection-v2
    ├── testing
    │   ├── annotations
    │   └── images
    └── training
        ├── annotations
        └── images

## Submit and run jobs

The entrypoints that we will be running in this example are implemented in the Python source files under `src/` and the `MLproject` file.
To run these entrypoints within the testbed architecture, we need to package those files up into an archive and submit it to the Testbed RESTful API to create a new job.
For convenience, the `Makefile` provides a rule for creating the archive file for this example, just run `make workflows`,

In [None]:
%%bash

# Create the workflows.tar.gz file
make workflows

To connect with the endpoint, we will use a client class defined in the `utils.py` file that is able to connect with the Testbed RESTful API using the HTTP protocol.
We connect using the client below, which uses the environment variable `DIOPTRA_RESTAPI_URI` to figure out how to connect to the Testbed RESTful API,

In [None]:
restapi_client = utils.DioptraClient()

We need to register an experiment under which to collect our job runs.
The code below checks if the relevant experiment exists.
If it does, then it just returns info about the experiment, if it doesn't, it then registers the new experiment.

In [None]:
response_experiment = restapi_client.get_experiment_by_name(name=EXPERIMENT_NAME)

if response_experiment is None or "Not Found" in response_experiment.get("message", []):
    response_experiment = restapi_client.register_experiment(name=EXPERIMENT_NAME)

response_experiment

We should also check which queues are available for running our jobs to make sure that the resources that we need are available.
The code below queries the Testbed API and returns a list of active queues.

In [None]:
restapi_client.list_queues()

The code below can be used to register the `tensorflow_gpu` queue if you have GPU workers but you have not registered this queue within Dioptra yet. 
If you are using a different queue name, then you should update `name="tensorflow_gpu"` accordingly.

In [None]:
response_queue = restapi_client.get_queue_by_name(name="tensorflow_gpu")

if response_queue is None or "Not Found" in response_queue.get("message", []):
    response_queue = restapi_client.register_queue(name="tensorflow_gpu")

response_queue

This example also makes use of the `backend_configs`, `evaluation`, `roadsigns_yolo`, and `tracking` packages stored locally under the `task-plugins/dioptra_custom` directory.
To register these custom task plugins, we first need to package them up into an archive.
For convenience, the `Makefile` provides a rule for creating the custom task plugins archive file, just run `make custom-plugins`,

In [None]:
%%bash

# Create the workflows.tar.gz file
make custom-plugins

Now that the custom task plugin packages are packaged into archive files, next we register them by uploading the files to the REST API.
Note that we need to provide the name to use for custom task plugin package, and this name must be unique under the custom task plugins namespace.
For a full list of the custom task plugins, use `restapi_client.list_custom_task_plugins()`.

In [None]:
restapi_client.list_custom_task_plugins()

The code below will upload any custom task plugins that are **new** and not already registered.

In [None]:
response_backend_configs_custom_plugins = restapi_client.get_custom_task_plugin(name="backend_configs")

if response_backend_configs_custom_plugins is None or "Not Found" in response_backend_configs_custom_plugins.get("message", []):
    response_backend_configs_custom_plugins = restapi_client.upload_custom_plugin_package(
        custom_plugin_name="backend_configs",
        custom_plugin_file=CUSTOM_PLUGINS_BACKEND_CONFIGS_TAR_GZ,
    )

print(response_backend_configs_custom_plugins)

response_evaluation_custom_plugins = restapi_client.get_custom_task_plugin(name="evaluation")

if response_evaluation_custom_plugins is None or "Not Found" in response_evaluation_custom_plugins.get("message", []):
    response_evaluation_custom_plugins = restapi_client.upload_custom_plugin_package(
        custom_plugin_name="evaluation",
        custom_plugin_file=CUSTOM_PLUGINS_EVALUATION_TAR_GZ,
    )

print(response_evaluation_custom_plugins)

response_roadsigns_custom_plugins = restapi_client.get_custom_task_plugin(name="roadsigns_yolo")

if response_roadsigns_custom_plugins is None or "Not Found" in response_roadsigns_custom_plugins.get("message", []):
    response_roadsigns_custom_plugins = restapi_client.upload_custom_plugin_package(
        custom_plugin_name="roadsigns_yolo",
        custom_plugin_file=CUSTOM_PLUGINS_ROADSIGNS_YOLO_TAR_GZ,
    )

print(response_roadsigns_custom_plugins)

response_tracking_custom_plugins = restapi_client.get_custom_task_plugin(name="tracking")

if response_tracking_custom_plugins is None or "Not Found" in response_tracking_custom_plugins.get("message", []):
    response_tracking_custom_plugins = restapi_client.upload_custom_plugin_package(
        custom_plugin_name="tracking",
        custom_plugin_file=CUSTOM_PLUGINS_TRACKING_TAR_GZ,
    )

print(response_tracking_custom_plugins)

If at any point you need to update one or more files within the `backend_configs`, `evaluation`, `roadsigns_yolo`, or `tracking` packages, you will need to unregister/delete the custom task plugin first using the REST API.
This can be done as follows,

```python
# Delete the 'backend_configs' custom task plugin package
restapi_client.delete_custom_task_plugin(name="backend_configs")

# Delete the 'evaluation' custom task plugin package
restapi_client.delete_custom_task_plugin(name="evaluation")

# Delete the `roadsigns_yolo_estimators` package
restapi_client.delete_custom_task_plugin(name="roadsigns_yolo")

# Delete the `tracking` package
restapi_client.delete_custom_task_plugin(name="tracking")
```

After you have deleted the task plugin in the testbed, re-run the `make custom-plugins` code block to update the package archive, then upload the updated plugin by re-running the `restapi_client.upload_custom_plugin_package` block.

We are now ready to use transfer learning to create a YOLO v1 object detection model for the Road Signs dataset.
Copy the code below into a new code block, and update the parameters as needed.
A full list of adjustable parameters can be found in the `MLproject` file. 

```python
# Submit transfer learning job for the mobilenet_v2 + yolo network architecture
response_efficientnetb1_two_headed_transfer_learn = restapi_client.submit_job(
    workflows_file=WORKFLOWS_TAR_GZ,
    experiment_name=EXPERIMENT_NAME,
    queue="tensorflow_gpu",
    timeout="72h",
    entry_point="transfer_learn",
    entry_point_kwargs=" ".join([
        "-P epochs=400",
        "-P batch_size=128",
        "-P image_size=448,448,3",
        f"-P training_dir={DATASET_DIR}/Road-Sign-Detection-v2/training",
        f"-P validation_dir={DATASET_DIR}/Road-Sign-Detection-v2/testing",
        f"-P register_model_name={EXPERIMENT_NAME}_efficientnetb1_two_headed",
    ]),
)

print("Transfer learning job for EfficientNet (B1) + two-headed YOLO detector submitted")
print("")
pprint.pprint(response_efficientnetb1_two_headed_transfer_learn)
```