## Object Detection Pipeline on UCS using Darknet & YOLO

This notebook focuses on implementing object detection as a Kubeflow pipeline on Cisco UCS by using Darknet which is a open-source neural network framework, YOLO (You Only Look Once) which is a real-time object detection system.

The training is done as a TFJob for better efficiency of training.

## Clone Cisco Kubeflow starter pack repository

In [1]:
BRANCH_NAME="dev" #Provide git branch "master" or "dev"
! git clone -b $BRANCH_NAME https://github.com/CiscoAI/cisco-kubeflow-starter-pack.git

Cloning into 'cisco-kubeflow-starter-pack'...
remote: Enumerating objects: 218, done.[K
remote: Counting objects: 100% (218/218), done.[K
remote: Compressing objects: 100% (117/117), done.[K
remote: Total 6246 (delta 93), reused 183 (delta 64), pack-reused 6028[K
Receiving objects: 100% (6246/6246), 42.37 MiB | 49.35 MiB/s, done.
Resolving deltas: 100% (2474/2474), done.


## Install required packages

In [3]:
#!python3 -m pip install --upgrade pip
!pip install kfp pillow --user

Defaulting to user installation because normal site-packages is not writeable
Collecting pip
  Downloading pip-20.2.4-py2.py3-none-any.whl (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 35.3 MB/s eta 0:00:01
[?25hInstalling collected packages: pip
Successfully installed pip-20.2.4
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Collecting kfp
  Using cached kfp-1.1.1.tar.gz (162 kB)
Collecting pillow
  Using cached Pillow-8.0.1-cp36-cp36m-manylinux1_x86_64.whl (2.2 MB)
Collecting requests_toolbelt>=0.8.0
  Using cached requests_toolbelt-0.9.1-py2.py3-none-any.whl (54 kB)
Collecting kfp-server-api<2.0.0,>=0.2.5
  Using cached kfp-server-api-1.0.4.tar.gz (51 kB)
Collecting tabulate
  Using cached tabulate-0.8.7-py3-none-any.whl (24 kB)
Collecting click
  Using cached click-7.1.2-py2.py3-none-any.whl (82 kB)
Collecting Deprecated
  Using cached Deprecated-1.2.10-py2.py3-none-any.whl (8.7 kB)
Collecting strip-hints
  Using cac

Successfully installed Deprecated-1.2.10 click-7.1.2 docstring-parser-0.7.3 kfp-1.1.1 kfp-pipeline-spec-0.1.2 kfp-server-api-1.0.4 pillow-8.0.1 requests-toolbelt-0.9.1 strip-hints-0.1.9 tabulate-0.8.7


## Restart kernel

In [None]:
from IPython.display import display_html
display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)

## Import libraries

In [9]:
import os
import json
import time
import yaml
import calendar
import requests
import logging
import numpy as np

#Kubeflow
import kfp
from kfp.aws import use_aws_secret
import kfp.compiler as compiler

#Kubernetes
from kubernetes import client

#Tensorflow
import tensorflow as tf

## Load pipeline components

Declare the paths of respective YAML configuration files of each of the pipeline components, in order to load each component into a variable for pipeline execution. 

In [10]:
path='cisco-kubeflow-starter-pack/apps/computer-vision/object-detection/onprem/pipeline/components/v2/'
component_root_dwn= path+'download/'
component_root_train= path+'tfjob/'
component_root_convert= path+'tflite-conversion/'
component_root_inference= path+'inference/'

download_op = kfp.components.load_component_from_file(os.path.join(component_root_dwn, 'component.yaml'))
tfjob_create_op = kfp.components.load_component_from_file(os.path.join(component_root_train, 'component.yaml'))
tflite_convert_op = kfp.components.load_component_from_file(os.path.join(component_root_convert, 'component.yaml'))
inference_op = kfp.components.load_component_from_file(os.path.join(component_root_inference, 'component.yaml'))

## Define volume claim & volume mount for storage during pipeline execution

Persistent volume claim & volume mount is created for the purpose of storing entities such as Dataset, model files, etc, and to share the stored resources between the various components of the pipeline during it's execution. 

In [11]:
nfs_pvc = client.V1PersistentVolumeClaimVolumeSource(claim_name='nfs')
nfs_volume = client.V1Volume(name='nfs', persistent_volume_claim=nfs_pvc)
nfs_volume_mount = client.V1VolumeMount(mount_path='/mnt/', name='nfs')

## Define pipeline function

In [12]:
def object_detection_pipeline(
    s3_path="s3://darknet-datasets",        # AWS S3 bucket URL. Ex: s3://<bucket-name>/ 
    namespace='kubeflow',               # Namespace on which trained model is to be deployed for prediction
    timestamp="",                       # Current timestamp
    cfg_data="voc.data",                # Config file containing file name specifications of train, test and validate datasets
    weights="yolov3-voc_50000.weights", # Weights which are already pre-trained upto 50000 iterations is used. Therefore,  
                                        # training happens from 50000 iterations upto a limit of max_batches (say 50200) specified 
                                        # in cfg_file. 
   
    classes_file="voc.names",           # File containing the names of object classes (such as person, bus, car,etc)
    push_to_s3="True"                   # Pushes the converted tflite model & trained weights to S3 bucket if set to 'True'
                                        # Proceeds with testing inference using converted tflite model if set to 'False'
):
    
    # Download component
    dwn_task = download_op(s3_path=s3_path,
                           cfg_data=cfg_data
                          ).apply(use_aws_secret(secret_name='aws-secret', aws_access_key_id_name='AWS_ACCESS_KEY_ID', aws_secret_access_key_name='AWS_SECRET_ACCESS_KEY'))
    dwn_task.add_volume(nfs_volume)
    dwn_task.add_volume_mount(nfs_volume_mount) 
    
    # TF-job component
    
    tfjob_create_task = tfjob_create_op(timestamp=timestamp)
    tfjob_create_task.add_volume(nfs_volume)
    tfjob_create_task.add_volume_mount(nfs_volume_mount)
    tfjob_create_task.after(dwn_task)
    
    # Tflite Conversion component
    
    tflite_convert_task = tflite_convert_op(s3_path=s3_path,
                                            push_to_s3=push_to_s3,
                                            classes_file=classes_file
                                       ).apply(use_aws_secret(secret_name='aws-secret', aws_access_key_id_name='AWS_ACCESS_KEY_ID', aws_secret_access_key_name='AWS_SECRET_ACCESS_KEY'))
    tflite_convert_task.add_volume(nfs_volume)
    tflite_convert_task.add_volume_mount(nfs_volume_mount)
    tflite_convert_task.after(tfjob_create_task)

    
    # Inference component
    
    inference_task = inference_op()
    inference_task.add_volume(nfs_volume)
    inference_task.add_volume_mount(nfs_volume_mount)
    inference_task.after(tflite_convert_task)

## Compile pipeline function

Compile the pipeline function to create a tar ball for the pipeline.

In [13]:
# Compile pipeline
try:
    compiler.Compiler().compile(object_detection_pipeline, 'object-detection.tar.gz')
except RuntimeError as err:
    logging.debug(err)
    logging.info("Argo workflow failed validation check but it can still be used to run experiments.")

## Create pipeline experiment

In [14]:
kp_client = kfp.Client()
EXPERIMENT_NAME = 'Object Detection'
experiment = kp_client.create_experiment(name=EXPERIMENT_NAME)

## Initialize pipeline parameters & run pipeline

In [15]:
#Pipeline parameters
timestamp = str(calendar.timegm(time.gmtime()))

# Execute pipeline
run = kp_client.run_pipeline(experiment.id, 'object-detection', 'object-detection.tar.gz', 
                          params={"timestamp": timestamp})