# Object Detection with YOLO
Here we present an example from the [Accelerated Computer Vision class offered by Rachel Hu](https://github.com/aws-samples/aws-machine-learning-university-accelerated-cv).

As mentioned previously, there is a constant stream of new models coming to the forefront of practice in machine learning. Below a Python library for machine learning called `gluoncv` and the `model_zoo` it includes is used in order to try out how these new models compare in image recognition tasks to the AWS Rekognition API.

After installing this library, we start by importing the relevant packages.

In [2]:
!pip install -q gluoncv

You should consider upgrading via the '/home/ec2-user/anaconda3/envs/mxnet_p36/bin/python -m pip install --upgrade pip' command.[0m


In [4]:
from gluoncv import model_zoo

Next we load `yolo_darknet53_voc` as the model that we will use.  For more pretrained models, please refer to [GluonCV Model Zoo](https://gluon-cv.mxnet.io/model_zoo/index.html).

In [5]:
net = model_zoo.get_model('yolo3_darknet53_voc', pretrained=True)

Downloading /home/ec2-user/.mxnet/models/yolo3_darknet53_voc-f5ece5ce.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/yolo3_darknet53_voc-f5ece5ce.zip...


223070KB [00:03, 56302.35KB/s]                            


To apply the model, we need to transform the image we are inputting into the model to a format that the model is built to accept. For that we import another package and use a function which is built to do this called `load_test`. This function returns two results. The first is a ndarray with shape (batch_size, RGB_channels, height, width). It can be fed into the model directly. The second one contains the images in numpy format. 

In [8]:
from gluoncv.data.transforms.presets import yolo

In [25]:
x, img = yolo.load_test('jeremyjacobson.png')

Now, we apply  our model like any other Python function.

In [13]:
net(x)

(
 [[[14.]
   [14.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]
   [-1.]]]
 <NDArray 1x100x1 @cpu(0)>,
 
 [[[ 0.99293345]
   [ 0.01254434]
   [-1.        ]
   [-1.        

We see that the output consists of three arrays: detected bounding boxes, the corresponding predicted class IDs, and confidence scores. Their shapes are (batch_size, num_bboxes, 1), (batch_size, num_bboxes, 1), and (batch_size, num_bboxes, 4), respectively.

Let's store them as variables and then analyze the output.

In [19]:
class_IDs, scores, bounding_boxs = net(x)

Let's go through the scores and print the ones with greater than 80% confidence.

In [20]:
for c, s in zip(class_IDs.reshape(-1,), scores.reshape(-1,)):
    if s.asscalar() < 0.8:
        break
    print ("Class ID : {}".format(c.asscalar()), "score : {}".format(s.asscalar()))

Class ID : 14.0 score : 0.992933452129364


We see that class 14 is the only class listed, and the model is confident that it is the correct classification with a confidence score of 0.99. What is it?


In [23]:
net.classes[14]

'person'

So it is certain that I am a person.