# Computer Vision Object Detection Yolo

This notebook contains example code for onboarding an object detection model with Arthur. The model used is a pre-trained yolo object detection model.

In [None]:
import cv2
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime
import pytz
import uuid
import boto3
import zipfile
import sys
import random
import json
%matplotlib inline

# arthur imports
from arthurai import ArthurAI
from arthurai.common.constants import InputType, OutputType, Stage, ValueType, Enrichment

**Notes:** 
- This model is based on an implementation of YOLO model by @experiencor (https://github.com/experiencor/keras-yolo3).
- Training and validation data is sourced from the VOC2012 database (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/).

## Load Data

Download images, and load meta data.

### Bounding Box Format

The meta data for training and validation sets bounding boxes preformatted in the way that Arthur expects to receive data.

Arthur expects that bounding boxes are lists with the following elements:
`[class_id, confidence, top_left_x, top_left_y, width, height]`

See contents of DataFrames below for example.

In [None]:
# download images and trained model
# this may take a couple minutes
from botocore import UNSIGNED
from botocore.client import Config

s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))
s3.download_file('s3-bucket-arthur-public', 'sandbox/cv_object_detection_yolo/train.zip', 'data/train.zip')
s3.download_file('s3-bucket-arthur-public', 'sandbox/cv_object_detection_yolo/val.zip', 'data/val.zip')
s3.download_file('s3-bucket-arthur-public', 'sandbox/cv_object_detection_yolo/yolo_voc.h5', 'model/yolo_voc.h5')

In [None]:
# extract images
with zipfile.ZipFile('data/train.zip', 'r') as zip_ref:
    zip_ref.extractall('data/')
with zipfile.ZipFile('data/val.zip', 'r') as zip_ref:
    zip_ref.extractall('data/')

In [None]:
# load training metadata
# includes ground truth and predictions
train_df = pd.read_csv('data/train_meta.csv')
train_df['label'] = train_df['label'].apply(lambda x: json.loads(x))  # lists load a strings, convert
train_df['objects_detected'] = train_df['objects_detected'].apply(lambda x: json.loads(x))  # lists load a strings, convert
train_df

In [None]:
# load validation metadata
# includes only ground truth
val_df = pd.read_csv('data/val_meta.csv')
val_df['label'] = val_df['label'].apply(lambda x: json.loads(x)) # lists load as strings, convert
val_df

## Load Model

We have a pretrained model stored under `model` directory. `model/entrypoint.py` handles loading the model, as well as helper function for generating predictions.

In [None]:
sys.path.append('model')
from model.entrypoint import predict, class_labels, draw_boxes

In [None]:
# load sample image
sample = train_df.loc[100]
sample_image = cv2.imread(sample['image'])

# show sample image
plt.imshow(np.flip(sample_image, 2))
plt.show()

In [None]:
# load ground truth
gt = sample['label']
annot_image = draw_boxes(sample_image, gt)

# show annotated image
plt.imshow(np.flip(annot_image, 2))
plt.show()

In [None]:
# make prediction
predictions = predict(sample_image)
annot_image = draw_boxes(sample_image, predictions)

# show annotated image
plt.imshow(np.flip(annot_image, 2))
plt.show()

In [None]:
# predictions returned in list of lists, how Arthur expects to receive bounding box data
predictions

## Onboard Model to Arthur

In [None]:
# credentials are being passed to the client via environment variables
connection = ArthurAI()

In [None]:
# define model metadata
model_meta = {
    "partner_model_id": f"YOLO_ObjectDetection_QS-{datetime.now().strftime('%Y%m%d%H%M%S')}",
    "display_name": "YOLO Object Detection",
    "input_type": InputType.Image,
    "output_type": OutputType.ObjectDetection,
    "pixel_width": 500,
    "pixel_height": 375
}
model = connection.model(**model_meta)

model.add_image_attribute("image")

predicted_attribute_name = "objects_detected"
ground_truth_attribute_name = "label"
model.add_object_detection_output_attributes(
    predicted_attribute_name, 
    ground_truth_attribute_name, 
    class_labels)

model.review()

In [None]:
model_id = model.save()
with open("quickstart_model_id.txt", "w") as f:
    f.write(model_id)

In [None]:
# you can fetch a model by ID. for example pull the last-created model:
# with open("quickstart_model_id.txt", "r") as f:
#     model_id = f.read()
# model = connection.get_model(model_id)

## Set Reference Data

In order to calculate data drift, Arthur requires uploading a reference data set. This is typically the dataset used for training the model.

Any inference logged will then be compared against this reference dataset to determine its drift score.

Reference data should be a dataframe, with columns for all the model attributes. In this case it is the single `PIPELINE_INPUT` attribute, `image`, as well as the predictions and labels, `objects_detected` and `label`.

In [None]:
# use training data to 
model.set_reference_data(data=train_df)

## Send Inferences

We will now generarate predictions from the validation set, and log the predictions with Arthur.

In [None]:
num_to_send = 10

inference_df = val_df.sample(num_to_send)
inference_df['objects_detected'] = inference_df['image'].apply(lambda x: predict(cv2.imread(x)))
inference_df

In [None]:
# send inferences to arthur
import pytz
model.send_inferences(inference_df, inference_timestamps=[datetime(2021, 8, 5, tzinfo=pytz.utc) for _ in range(10)])