With the incredible amount of information being added to the internet every minute, it is humanly impossible to sift through all this data to find meaningful content for me to stay informed about everything that I want to. Additionally, with class and lectures put online, I personally have even more difficulty staying focused on the class material. I often notice myself switching the playback speed to 1.5 or 2 times for a while to have the pace of the slow parts feel good, but after a few moments I realize that I missed something important, need to rewind, and play what I already listened to at normal speed to understand all of the concepts portrayed in the video! Especially in review times before a test, I always have trouble sifting through hours worth of footage to find the example or content I was looking for.

## Background

In the Spring semester of 2019, I participated in a hackathon hosted by my school where I got this idea and worked on it a little bit. My idea was to give some importance value to each moment in the video, creating a sort of "importance curve". The "total importance" of the video could then be described as the area under the curve of this importance function. The toy example below can show spikes and valleys of importance, and the orange area resembles the most important moments of the video, and thus are the moments selected in the summarization.

It was a fun project to work on and I learned a lot from it! I thought while I'm practicing social distancing I might as well revisit this project and learn more about established video summarization techniques out there today.

## Available Free Videos

There is plenty of content online we can use for this. I'm going to be taking some of these [public test videos](https://gist.github.com/jsturgis/3b19447b304616f18657).

In [1]:
import os
import cv2
import json
import requests
import numpy as np
import pandas as pd
from skimage import transform
from IPython.display import Video
from matplotlib import pyplot as plt
from sklearn import metrics, cluster, decomposition

In [2]:
data_root = "../data/"
url_root = "https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/"

In [3]:
with open(os.path.join(data_root, "videos.json"), 'r') as f:
    videos = json.load(f)
[v['title'] for v in videos]

['Big Buck Bunny',
 'Elephant Dream',
 'For Bigger Blazes',
 'For Bigger Escape',
 'For Bigger Fun',
 'For Bigger Joyrides',
 'For Bigger Meltdowns',
 'Sintel',
 'Subaru Outback On Street And Dirt',
 'Tears of Steel',
 'Volkswagen GTI Review',
 'We Are Going On Bullrun',
 'What care can you get for a grand?']

In [4]:
vid_url = url_root + videos[0]['sources'][0]
Video(vid_url, width=400)

In [5]:
%%time
cap = cv2.VideoCapture(vid_url)
frameCount = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
frameWidth = 128 # int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frameHeight = int(frameWidth * (cap.get(cv2.CAP_PROP_FRAME_HEIGHT) / cap.get(cv2.CAP_PROP_FRAME_WIDTH))) + 1
V = np.zeros((frameCount, frameHeight, frameWidth), np.dtype('uint8'))
f = 0
while(cap.isOpened()):
    ret, frame = cap.read()
    if ret:
        frame_resized = cv2.resize(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY), (frameWidth, frameHeight), interpolation=cv2.INTER_CUBIC)
        V[f] = frame_resized
        f += 1
    else:
        break

cap.release()

Wall time: 41.2 s


## Summarization Techniques

I read this [survey](SURVEY) on automatic video summarization and used the referenced papers explaining the algorithms in more detail to implement the following summarization techniques

- VSCAN
- VSUMM
- STILL and MOVING (STIMO)
- Delauney Triangulation (DT)
- Video Summarization Using Higher Order Color Moments (VSUHCM)

## References

1. <a id="VSUHCM" href="https://github.com/coenvalk/CondenseMyTalk/blob/master/literature/VSUHCM.pdf">Jadhav, Mrs. Poonam S., and Dipti S. Jadhav. “Video Summarization Using Higher Order Color Moments (VSUHCM).” Procedia Computer Science, vol. 45, 2015, pp. 275–281., doi:10.1016/j.procs.2015.03.140.</a>
1. <a id="VSCAN" href="https://github.com/coenvalk/CondenseMyTalk/blob/master/literature/VSCAN.pdf">Karim M. Mohamed, Mohamed A. Ismail, and Nagia M. Ghanem (2014) VSCAN: An Enhanced Video Summarization using Density-based Spatial Clustering. Computer and Systems Engineering Department Faculty of Engineering, Alexandria University Alexandria, Egypt.</a>
1. <a id="STIMO" href="https://github.com/coenvalk/CondenseMyTalk/blob/master/literature/STIMO.pdf">Marco, Geraci and Montenegro, (2010)’ STIMO: Still and Moving video storyboard for the Web Scenario’ Journal Multimedia Tools and Applications, Volumes =46, issue1,January 2010,pages 47-69.</a>
1. <a id="DT" href="https://github.com/coenvalk/CondenseMyTalk/blob/master/literature/DT.pdf">Padmavathi Mundur, Yong Rao, Yelena Yesha,(2006)‘Key frame-based video summarization using Delaunay clustering’ International Journal on Digital Libraries April 2006, Volume 6, Issue 2, pp 219-232.</a>
1. <a id="VSUMM" href="https://github.com/coenvalk/CondenseMyTalk/blob/master/literature/VSUMM.pdf">Sandra E. F. de Avila, Antonio da Luz Jr., Arnaldo de A. Araujo, and Matthieu Cord, (2008). `VSUMM: An Approach for Automatic Video Summarization and Quantitative Evaluation', XXI Brazilian Symposium on Computer Graphics and Image Processing IEEE.</a>
1. <a id="SURVEY" href="https://github.com/coenvalk/CondenseMyTalk/blob/master/literature/SURVEY.pdf">Sebastian, Tinumol, and Jiby J. “A Survey on Video Summarization Techniques.” International Journal of Computer Applications, vol. 132, no. 13, 2015, pp. 30–32., doi:10.5120/ijca2015907592.</a>