# Distant Viewing with Deep Learning: Part 4

We further extend our techniques to working with moving images.

## Step 15: Load the Distant Viewing Library

The following code will load the distant viewing library; if
you do not have the library installed 

In [None]:
%pylab inline

import numpy as np
import scipy as sp
import pandas as pd
import sklearn
from sklearn import linear_model
import urllib
import pickle

import os
from os.path import join

In [None]:
if importlib.util.find_spec("dvt") is not None:
    import dvt
    dvt_flag = True
else:
    dvt_flag = False

In [None]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches

plt.rcParams["figure.figsize"] = (12,12)

In [None]:
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

## Step 16: Location of objects

The distant viewing toolkit can be used to detect objects within
an image. We start by creating an object detector:

In [None]:
df = pd.read_csv(join("..", "data", "bewitched.csv"))
df.head()

In [None]:
if dvt_flag:
    odrn = dvt.annotate.object.ObjectDetectRetinaNet()

Let's try to find objects in the following images:

In [None]:
img_path = join('..', 'images', 'bewitched', df.filename[200])
img = imread(img_path)
plt.imshow(img)

Here is how we detect objects in the image:

In [None]:
if dvt_flag:
    objs = odrn.detect(img)
else: 
    with open (join('..', 'cache', 'bw_face_ex.pickle'), 'rb') as fp:
        objs = pickle.load(fp)
        
objs

As with the faces, we can show these within the image. We'll add
some labels.

In [None]:
fig,ax = plt.subplots(1,1)
plt.imshow(img)
n, m, d = img.shape
for obj in objs:
    rect = plt.Rectangle((obj['left'], obj['top']),
                          obj['right'] - obj['left'],
                          obj['bottom'] - obj['top'],
                         edgecolor='orange', linewidth=2, facecolor='none')
    ax.add_patch(rect)
    
    plt.text(obj['left'], obj['top'] - 12, obj['class'],
             fontsize=12,
             bbox=dict(facecolor='orange'))
    
plt.axis('off')

## Step 17: DVT Demo

The real power the distant viewing toolkit is to analyze moving images. 
We are going to look at a very short clip of an episode of Friends. Let's
load in the functions that we will use.

In [None]:
if dvt_flag:
    from dvt.annotate.core import FrameProcessor, FrameInput, ImageInput
    from dvt.annotate.diff import DiffAnnotator
    from dvt.annotate.face import FaceAnnotator, FaceDetectDlib, FaceEmbedVgg2
    from dvt.annotate.meta import MetaAnnotator
    from dvt.annotate.png import PngAnnotator
    from dvt.aggregate.cut import CutAggregator

    import logging
    logging.basicConfig(level='INFO')

Start by constructing a frame input object attached to the video file. The bsize argument indicates that we will work with the video by looking through batches of 128 frames.

In [None]:
if dvt_flag:
    finput = FrameInput(join("..", "video", "bewitched.mp4"), bsize=128)

Now, create a frame processor and add four annotators: (i) metadata, (ii) png files, (iii) differences between successive frames, and (iv) faces. The quantiles input to the DiffAnnotator indicates that we want to compute the 40th percentile in differences between frames. The face detector take a long time to run when not on a GPU, so we restrict it to running only every 64 frames.

In [None]:
if dvt_flag:
    fpobj = FrameProcessor()
    fpobj.load_annotator(PngAnnotator(output_dir=join("..", "video-clip-frames")))
    fpobj.load_annotator(MetaAnnotator())
    fpobj.load_annotator(DiffAnnotator(quantiles=[40]))
    fpobj.load_annotator(FaceAnnotator(detector=FaceDetectDlib(), freq=64))

Now, we can run the pipeline of annotators over the input object. We will turn on logging here to see the output as Python processes each annotator over a batch of frames. The max_batch argument restricts the number of batches for testing purposes; set to None (default) to process the entire video file.

In [None]:
if dvt_flag:
    fpobj.process(finput, max_batch=2)

The output is now stored in the fpobj object. To access it, we call its collect_all method. This method returns a dictionary of custom objects (DictFrame, an extension of an ordered dictionary). Each can be converted to a Pandas data frame for ease of viewing the output or saving as a csv file.

In [None]:
if dvt_flag:
    obj = fpobj.collect_all()
    lobj = dict()

    for k in obj.keys():
        lobj[k] = obj[k].todf()
    
else:  
    with open (join('..', 'cache', 'friends_dvt_ex.pickle'), 'rb') as fp:
        lobj = pickle.load(fp)

In [None]:
with open (join('..', 'cache', 'friends_dvt_ex.pickle'), 'rb') as fp:
        lobj = pickle.load(fp)

We will now look at each output type.

### Metadata

The metadata is not very exciting, but is useful for downstream tasks:

In [None]:
lobj['meta']

### Png

The png annotator does not return any data:

In [None]:
lobj['png']

Instead, its used for its side-effects. You will see that there are individual frames from the video now saved in the directory "video-clip-frames".

### Difference

The difference annotator indicates the differences between successive frames, as well as information about the average value (brightness) of each frame.

In [None]:
lobj['diff']

### Face

The face annotator detects faces in the frames. We configured it to only run every 64 frames, so there is only output in frames 0, 64, 128, and 192.

In [None]:
lobj['face']

Notice that there are two faces in frame 0, 64, and 192 but four faces detected in frame 128. In fact, all six of the main cast members are in frame 128, but two are two small and obscured to be found by the dlib algorithm.

### Detect cuts

We can also aggregate the information to detect cuts in the video file:

In [None]:
if dvt_flag:
    from dvt.aggregate.cut import CutAggregator
    cagg = CutAggregator(cut_vals={'q40': 3})
    cout = cagg.aggregate(obj).todf()
else:
    with open (join('..', 'cache', 'friends_dvt_cut_ex.pickle'), 'rb') as fp:
        cout = pickle.load(fp)

cout

And you should see that these correspond with the cuts in the input video file.

In [None]:
import dvt.annotate.core

In [None]:
dvt.annotate.core.ImageInput

## Step 18: DVT with Still Images

Finally, it is also possible to use the pipeline API with a collection of
still images.

In [None]:
if dvt_flag:
    finput = ImageInput(join("..", "images", "bewitched", "*"))
    
    fpobj = FrameProcessor()
    fpobj.load_annotator(FaceAnnotator(detector=FaceDetectDlib()))
    
    fpobj.process(finput, max_batch=20)
    
    obj = fpobj.collect_all()
    lobj = dict()

    for k in obj.keys():
        lobj[k] = obj[k].todf()
        
else:  
    with open (join('..', 'cache', 'bw_dvt_ex.pickle'), 'rb') as fp:
        lobj = pickle.load(fp)

In [None]:
lobj['face']