# CPU,GPU, COCO dataset - Video Analytics workshop



### What is this chapter about
In this course we will analyze an image with tensorflow. Tensordlow is open source machine learning library for research and production. If we want to understand how to process and analyze  a video file or video stream generated by camera we need to understand how a image is being processed.

We are not going to create a new model from scratch (requires more time) but we will use existing and free model. You can list all  [available models based on COCO dataset](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#coco-trained-models)





### COCO dataset 
[COCO dataset](http://cocodataset.org) is is formatted in JSON and it is an great object detection dataset with more than 330K images with labels. It is one of the most popular dataset and it being used in many research literatures. The datasets covers additional information such as “info”, “licenses”, “images”, “annotations”, “categories” or “segment info”.


![alt text](http://cocodataset.org/images/coco-examples.jpg)


### Hardware acceleration

As you probably already know there is a huge difference between running VA on GPU or CPU. We will compare both approaches.

*   The implementation of computing tasks in hardware to decrease latency and increase throughput is known as hardware acceleration. (wikipedia)
*   Advantages of hardware include speedup, reduced power consumption,[1] lower latency, increased parallelism[2] and bandwidth, and better utilization of area ...  (wikipedia)






## CPU approach
We will start with CPU setup so make sure you have following setup in you Runtime settings.
Click on "Runtime" then "Change Runtime Type" 


![alt text](https://raw.githubusercontent.com/VladoDemcak/accenture-va-workshop/master/images/runtime-setupv1.png =350x)

### Preparing for implementation

As we mentioned in the beginning of the course,  tensorflow needs a model. It' a binary file we need to provide during the implementation phase. So the very first step has to be to download the files with additional files which define structure or categories. 

We will download  optimized tensorflow model 



In [0]:
!echo "Updating required programs..."
!sudo apt-get update
!echo "Instalation of required programs has finished!"


!echo "Downloading VA prerequisites..."
!wget https://raw.githubusercontent.com/VladoDemcak/accenture-va-workshop/master/data/optimized_rcnn_inception_graph.pb
!echo "VA prerequisites have been downloaded"

### Implementation

At this time we have all required files needed for the course.

No we can focuse on analyzer itself. Everything up to this point was basically "shell" commands. Now we will use python code for implementing our logic.

#### We will need to do following steps:


1.   Import required dependencies
2.   Implement helpers for doing business logic
3.   Create dummy POJO class for representing a Detection with label and boundingbox and score
4.   Implement analyzer itself



In [0]:
import tensorflow
import numpy
import time
import cv2
from decimal import Decimal
from google.protobuf import text_format
from urllib.request import urlopen
import matplotlib.pyplot as plt
import matplotlib.cm as cm

In [0]:
# helper functions
# dont be afraid :) it's just default list of categories for default tensorflow model we have downloaded
def categories():
    return {
        1: 'person',
        2: 'bicycle',
        3: 'car',
        4: 'motorcycle',
        5: 'airplane',
        6: 'bus',
        7: 'train',
        8: 'truck',
        9: 'boat',
        10: 'traffic light',
        11: 'fire hydrant',
        13: 'stop sign',
        14: 'parking meter',
        15: 'bench',
        16: 'bird',
        17: 'cat',
        18: 'dog',
        19: 'horse',
        20: 'sheep',
        21: 'cow',
        22: 'elephant',
        23: 'bear',
        24: 'zebra',
        25: 'giraffe',
        27: 'backpack',
        28: 'umbrella',
        31: 'handbag',
        32: 'tie',
        33: 'suitcase',
        34: 'frisbee',
        35: 'skis',
        36: 'snowboard',
        37: 'sports ball',
        38: 'kite',
        39: 'baseball bat',
        40: 'baseball glove',
        41: 'skateboard',
        42: 'surfboard',
        43: 'tennis racket',
        44: 'bottle',
        46: 'wine glass',
        47: 'cup',
        48: 'fork',
        49: 'knife',
        50: 'spoon',
        51: 'bowl',
        52: 'banana',
        53: 'apple',
        54: 'sandwich',
        55: 'orange',
        56: 'broccoli',
        57: 'carrot',
        58: 'hot dog',
        59: 'pizza',
        60: 'donut',
        61: 'cake',
        62: 'chair',
        63: 'couch',
        64: 'potted plant',
        65: 'bed',
        67: 'dining table',
        70: 'toilet',
        72: 'tv',
        73: 'laptop',
        74: 'mouse',
        75: 'remote',
        76: 'keyboard',
        77: 'cell phone',
        78: 'microwave',
        79: 'oven',
        80: 'toaster',
        81: 'sink',
        82: 'refrigerator',
        84: 'book',
        85: 'clock',
        86: 'vase',
        87: 'scissors',
        88: 'teddy bear',
        89: 'hair drier',
        90: 'toothbrush'
    }


def draw_on_image(image, detections):
    height, width, channels = image.shape

    for detection in detections:
        p1 = (int(detection.bounding_box[0] * width), int(detection.bounding_box[1] * height))
        p2 = (int(detection.bounding_box[2] * width), int(detection.bounding_box[3] * height))
        cv2.putText(
            img=image,
            text="{} {:.2f}".format(detection.classification, detection.confidence),
            org=(p1[0], p1[1] - 5),
            fontFace=cv2.FONT_HERSHEY_SIMPLEX,
            fontScale=1.0,
            thickness=3,
            color=(255, 0, 0)
        )
        cv2.rectangle(img=image, pt1=p1, pt2=p2, color=(255, 0, 0), thickness=2, lineType=1)

    return image


For storing an attribute we will need to create a dummy POJO object without additional logic. 
Hence we need `Detection` class:

In [0]:
#  simple POJO object as a representation of detection
class Detection:

  def __init__(self,
               bounding_box,
               classification,
               confidence):
    self.bounding_box = bounding_box
    self.classification = classification
    self.confidence = confidence

  @staticmethod
  def of(bbox, classification, confidence):
    return Detection(
        bounding_box=[Decimal(coordinate).quantize(Decimal('.0001')) for coordinate in bbox],
        classification=classification,
        confidence=confidence
    )

But for detecting objects in an image we will need some logic. So we will create detector class with one function (`detect` function) which takes `image` as an input. 

With that we are able to create one instance of detector and use it multiple times.

In [0]:
class TensorflowDetector:
    model_path = "optimized_rcnn_inception_graph.pb"

    def __init__(self, threshold, categories):
        self.threshold = threshold
        self.categories = categories

        detection_graph = tensorflow.Graph()
        with detection_graph.as_default():
            od_graph_def = tensorflow.GraphDef()
            with tensorflow.gfile.GFile(self.model_path, 'rb') as fid:
                serialized_graph = fid.read()
                od_graph_def.ParseFromString(serialized_graph)

                # force CPU device placement for NMS ops
                for node in od_graph_def.node:
                    if 'BatchMultiClassNonMaxSuppression' in node.name:
                        node.device = '/device:CPU:0'

                tensorflow.import_graph_def(od_graph_def, name='')

            ops = tensorflow.get_default_graph().get_operations()
            all_tensor_names = {output.name for op in ops for output in op.outputs}
            self.tensor_dict = {}
            for key in [
                'num_detections', 'detection_boxes', 'detection_scores',
                'detection_classes', 'detection_masks'
            ]:
                tensor_name = key + ':0'
                if tensor_name in all_tensor_names:
                    self.tensor_dict[key] = tensorflow.get_default_graph().get_tensor_by_name(
                        tensor_name)
            self.image_tensor = tensorflow.get_default_graph().get_tensor_by_name('image_tensor:0')
        self.graph = detection_graph
        with self.graph.as_default():
            config = tensorflow.ConfigProto()
            config.gpu_options.allow_growth = True
            config.allow_soft_placement = True
            config.gpu_options.per_process_gpu_memory_fraction = 1.0
            self.sess = tensorflow.Session(config=config)

    def detect(self, image):

        start = time.time()
        output_dict = self.sess.run(self.tensor_dict,
                                    feed_dict={self.image_tensor: numpy.expand_dims(image[:, :, ::-1], 0)})
        print("inference took {}s".format(time.time() - start))

        # all outputs are float32 numpy arrays, so convert types as appropriate
        output_dict['num_detections'] = int(output_dict['num_detections'][0])
        output_dict['detection_classes'] = output_dict[
            'detection_classes'][0].astype(numpy.uint8)
        output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
        output_dict['detection_scores'] = output_dict['detection_scores'][0]
        if 'detection_masks' in output_dict:
            output_dict['detection_masks'] = output_dict['detection_masks'][0]

        detections = []
        for score, label, bbox in zip(output_dict['detection_scores'], output_dict['detection_classes'],
                                      output_dict['detection_boxes']):

            if score > self.threshold:
                ymin, xmin, ymax, xmax = [coordinate.item() for coordinate in bbox]
                detection = Detection.of(
                    bbox=[xmin, ymin, xmax, ymax],
                    classification=self.categories[label.item()],
                    confidence=score
                )
                detections.append(detection)

        return detections


### Test inference time

**Inference time** is time required for processing an image.

Now, we are going to execute VA on a image. We will insert a http link to `.jpg` image and the code below will download the image from the Internet and analyze the image. 

Since we want to avoid anomalies (peek...) we will run the detecting several times (exactlye 10 times) in order to increase precision of our experiment. 

In [0]:
%matplotlib inline

def image_url(user_url):
    if not user_url:
        return "http://images.amcnetworks.com/ifccenter.com/wp-content/uploads/2017/06/borat_1280x720.jpg"
    return user_url
  

if __name__ == "__main__":

    detector = TensorflowDetector(
        threshold=0.4, # everything with given confidence is fine and we will show it
        categories=categories()
    )
    
    user_url = input("Insert URL of an image you want to analyze and confirm with ENTER. Leave empty and default image will be used as input. ")
    url = image_url(user_url)
    print("I will use image from: {}".format(url))

    fig = plt.figure(figsize=(80, 20))
    for i in range(0, 10):
        resp = urlopen(url)
        image = numpy.asarray(bytearray(resp.read()), dtype="uint8")
        image = cv2.imdecode(image, cv2.IMREAD_COLOR)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        detections = detector.detect(image=image)
        image_with_overlays = draw_on_image(image=image, detections=detections)
        ax = fig.add_subplot(2, 5, i + 1, xticks=[], yticks=[])  # position of specific graph
        ax.imshow(image_with_overlays)  # display image
        ax.set_title('Iteraition #{0}'.format(i))  # label over image

Alright. After the execution you should see output similar to following:

```
inference took 9.450029611587524s
inference took 1.643486499786377s
inference took 1.6418390274047852s
inference took 1.6554217338562012s
inference took 1.676102876663208s
inference took 1.6452503204345703s
inference took 1.6418006420135498s
inference took 1.6436705589294434s

```
and of couse some images with overlays and boundingboxes.

But, as we saw, the inference time is not as we want for near real-time processing. 

Now let's switch to a Hardware accelerator. Again click on "Runtime" then "Change Runtime Type" to GPU Hardware accelerator.



## GPU approach

![alt text](https://raw.githubusercontent.com/VladoDemcak/accenture-va-workshop/master/images/colab_options.png =350x)


 After that, run the code from previous cell and compare the output with the previous CPU experiment.

The output should be similar to:

```
inference took 9.897098064422607s
inference took 0.10806679725646973s
inference took 0.10244631767272949s
inference took 0.10579109191894531s
inference took 0.10091924667358398s
inference took 0.10539960861206055s
inference took 0.10690879821777344s
inference took 0.10744500160217285s
inference took 0.10607767105102539s
```


### Further reading: 
*   What’s the Difference Between a CPU and a GPU? - https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/
*   TensorFlow performance test: CPU VS GPU - https://medium.com/@andriylazorenko/tensorflow-performance-test-cpu-vs-gpu-79fcd39170c

