In [1]:
import cv2
import numpy as np
import random
import tensorly as tl
from tensorly.decomposition import tucker

Using numpy backend.


# Data Ingestion

We can read all of our files, and get the number of frames in each one. When reading them as tensors we will truncate to the one with the smallest number of frames. I strived to take videos of the same length (~11s), but small discrepancies are bound to exist. For our particular application, truncation ought not to matter too much.

In [2]:
# Create VideoCapture objects
parking_lot = cv2.VideoCapture('parking_lot.MOV')
patio = cv2.VideoCapture('patio.MOV')

# Get number of frames in each video
parking_lot_frames = int(parking_lot.get(cv2.CAP_PROP_FRAME_COUNT))
patio_frames = int(patio.get(cv2.CAP_PROP_FRAME_COUNT))

parking_lot_frames, patio_frames

(321, 328)

In [3]:
# Get dimensions of each frame
parking_lot_height = int(parking_lot.get(cv2.CAP_PROP_FRAME_HEIGHT))
parking_lot_width = int(parking_lot.get(cv2.CAP_PROP_FRAME_WIDTH))
patio_height = int(patio.get(cv2.CAP_PROP_FRAME_HEIGHT))
patio_width = int(patio.get(cv2.CAP_PROP_FRAME_WIDTH))

print(parking_lot_height, parking_lot_width)
print(patio_height, patio_width)

1080 1920
1080 1920


Based on the number of frames and the dimensions of the frames, we will need a 4D tensor (321x1080x1920x3) to hold these videos:
- 321 for the frames of the images (we truncate the extra frames for the patio video)
- 1080x1920 for the height and width of the images
- 3 for the RGB color channels

In [4]:
# Create function to read all frames of a video in an array
def read_frames(video_capture, max_frames):
    """
    INPUTS:
    video_capture: an OpenCV VideoCapture object whose frames we want to read
    max_frames: the maximum number of frames we want to read
    
    OUTPUT:
    array of all the frames until max_frames
    """
    # Initialize empty array
    frames_array = []
    
    # Keep track of the frame number
    frame_nb = 0
    
    # iterate through the frames and append them to the array
    while video_capture.isOpened() and frame_nb < max_frames:
        ret, frame = video_capture.read()
        if not ret:
            break
        frames_array.append(frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
        frame_nb += 1
    
    # release the video capture
    video_capture.release()
    cv2.destroyAllWindows()
    
    # return the array
    return(frames_array)

In [5]:
# Read in all the videos
parking_lot_array = read_frames(video_capture=parking_lot, max_frames=parking_lot_frames)
patio_array = read_frames(video_capture=patio, max_frames=parking_lot_frames)

# Data Manipulation

We need to create tensors out of the NumPy arrays for use with the TensorLy library.

In [6]:
# Create tensors from matrices
parking_lot_tensor = tl.tensor(parking_lot_array)
patio_tensor = tl.tensor(patio_array)

To speed up later steps, we randomly select frames of the tensors to focus on.

In [29]:
# Set the seed for reproducibility
random.seed(42)
random_frames = random.sample(range(0, 320), 50)

In [30]:
# Use these random frames to subset the tensors
subset_parking_lot = parking_lot_tensor[random_frames,:,:,:]
subset_patio = patio_tensor[random_frames,:,:,:]

Convert to double

In [31]:
subset_parking_lot = subset_parking_lot.astype('d')
subset_patio = subset_patio.astype('d')

# Naive Comparison

A natural way of comparing two tensors is to compute the norm of the difference between them.

In [10]:
parking_patio_naive_diff = tl.norm(subset_parking_lot - subset_patio)
parking_patio_naive_diff

1831354.197367074

# Unsupervised Learning

Now that we have the tensors, we can perform Tucker decomposition to get a more robust representation of the tensor (using the resulting core tensor) to get rid of noise and get a better sense of the similarity between two videos.

The main tuning parameter is the n-rank of the tensor. If we were seeking the optimal decomposition, AIC criterion could be used to choose the best hyper parameter. Nevertheless, in this specific context we are not looking for an optimal setting, rather something that is usable. Besides, we need similar dimensions across tensors to be able to make comparisons.

For this reason, we chose n-rank of [2,2,2,2] for all tensors and compare the resulting core tensors. This also helps by reducing the computational complexity of our operations (trying out n-rank of [5,5,5,5] will exceed the capabilities of LAPACK, which is used under the hood).

In [14]:
# Get core tensor for the parking lot video
core_parking_lot, factors_parking_lot = tucker(subset_parking_lot, ranks = [2,2,2,2])

In [15]:
# Get core tensor for the patio video
core_patio, factors_patio= tucker(subset_patio, ranks = [2,2,2,2])

In [17]:
# Make comparisons
parking_patio_diff = tl.norm(core_parking_lot - core_patio)
parking_patio_diff

3719206.800381976