<a href="https://colab.research.google.com/github/doubleblindreview2/jbr_video_mining/blob/master/video_mining.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **JBR Video Mining Script**
- **Output**: This script extracts various video features from a given set of videos and stores these features in multiple .csv files, which can be opened with Excel or any text editor afterwards
- **Input**: You need to provide a Google Drive Account with a folder with video files
- **Execution**: In order to execute this script, run all cells from top to bottom and follow the instructions

Before you start:  **<font color='red'>Please click -> Runtime -> Change runtime type -> GPU in top menu</font>**

## **1. Download Models**
_You need a Google Account to verify your legitimate use; The models will be automatically downloaded from a public GDrive_
_____________________________________________________________________________



In [0]:
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
from tqdm import tqdm

Running the next cell will create a link. **You need to click this link and enter your Google Account credentials. Copy the code and enter it in the space below.**

In [0]:
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

Running the next cell will download required models from a public GDrive. After running it, you can verify success by clicking on the folder icon on the left side of this screen.There should be a 'downloads' folder containing various files (You may need to click 'Refresh').

In [3]:
if not os.path.exists('/content/downloads/'): os.makedirs('/content/downloads/')
os.chdir('./downloads/')

folder_id = '1e-UQc-ylzVOOvW2ZiOCpnP-EEziHA4cQ'
file_list = drive.ListFile({'q': "'{}' in parents and trashed=false".format(folder_id)}).GetList()
for file in tqdm(file_list):
    file.GetContentFile(file['title'])

os.chdir('/content/')

100%|██████████| 12/12 [00:39<00:00,  3.30s/it]


## **2. Provide Input & Select Features**
_You need to provide a Google Drive Account, which includes a folder with video files for video mining._
_____________________________________________________________


Running the next cell will create a link. **You need to click this link and enter your Google Account credentials. Copy the code and enter it in the space below.**

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


**Please provide the correct pathes to your folder with videos. You probably need to change the text after '/My Drive/**.

You can verify the correct path by clicking on the folder icon on the left side of this screen and search within the 'drive' folder. (You may need to click 'Refresh') 

In [0]:
in_folder  = '/content/drive/My Drive/trailer/vids/'  # folder with videos, names are used as IDs

You **can unselect features** you are not interested in **by writing 'False' instead of 'True'**. This may drastically increase processing speed.

In [0]:
extract_length           = True # get length of video
extract_cuts             = True # get scene cuts
extract_colors           = True # get brightness and  color information
extract_faces            = True # get faces
extract_emotions         = True # get 8 different emotions per face 
extract_objects          = True # get 80 objects
extract_variance         = True # get semantic variance
extract_quality          = True # get quality

## **4. Provide Optional Input**
_**Run all cells - no additional input required, however you may change things to customize execution**_
_____________________________________________________________

The following folders and the logfile will be created to store the results of your analysis. **You do not have to change them, unless you want to use your custom folder** structure to store results

In [0]:
out_folder = '/content/drive/My Drive/trailer/preds/' # folder to store extracted features
log_name   = '2020-05-21_logfile.csv' # name of lofile, pls include .csv ending 
log_folder = '/content/drive/My Drive/trailer/logs/' # folder for logfile

Setting a start and end index allows you to only analyze a subset of the videos in your folder. **You do not have to change this, if you want to analyze all videos in your provided folder.**

In [0]:
start_index = 0
end_index = 3 #len(os.listdir(in_folder))

The following code tells the code to use the GPU if required.  **You do not have to change this.**

In [0]:
device = 0

## **4. Extract Selected Features**
_**Run all cells - no additional input required**_

_This will extract your selected features by looping through your provided video files folder. (This may take siginificant amount of time)._

In [0]:
 # device number of GPU, do not change and set Runtime Type to GPU
variables = ''
for variable in ['extract_length',
                 'extract_cuts',
                 'extract_colors',
                 'extract_faces',
                 'extract_emotions',
                 'extract_variance',
                 'extract_objects',
                 'extract_quality',
                 'start_index',
                 'end_index',
                 'device']:
  variables += ' --' + variable.replace('_','-') + ' ' + str(eval(variable))

for variable in ['in_folder',
                 'out_folder',
                 'log_folder',
                 'log_name']:
  variables += ' --' + variable.replace('_','-') + ' "' + str(eval(variable))+'"'

In [17]:
variables

' --extract-length True --extract-cuts True --extract-colors True --extract-faces True --extract-emotions True --extract-variance True --extract-objects True --extract-quality True --start-index 0 --end-index 3 --device 0 --in-folder "/content/drive/My Drive/trailer/vids/" --out-folder "/content/drive/My Drive/trailer/preds/" --log-folder "/content/drive/My Drive/trailer/logs/" --log-name "2020-05-21_logfile.csv"'

In [19]:
!python "/content/downloads/jbr_run.py" {variables}

Visual variance requires scene cuts and will automatically extract scene cuts

Analysis of 3 videos begins:

  0% 0/3 [00:00<?, ?it/s]

Analyze pzysZI-1LN8_tt1724597.mp4
2020-05-21 18:07:15.843076: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-21 18:07:17.425573: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-21 18:07:17.425965: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-21 18:07:17.426859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2020-05-21 18:07:17.426917: I tensorflow/strea