<a href="https://colab.research.google.com/github/doubleblindreview2/jbr_video_mining/blob/master/video_mining.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **JBR Video Mining Script**
- **Output**: This script extracts various video features from a given set of videos and stores these features in multiple .csv files, which can be opened with Excel or any text editor afterwards
- **Input**: You need to provide a Google Drive Account with a folder with video files
- **Execution**: In order to execute this script, run all cells from top to bottom and follow the instructions

## **1. Download Models**
_You need a Google Account to verify your legitimate use; The models will be automatically downloaded from a public GDrive_
_____________________________________________________________


In [0]:
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
from tqdm import tqdm

Running the next cell will create a link. **You need to click this link and enter your Google Account credentials. Copy the code and enter it in the space below.**

In [0]:
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

Running the next cell will download required models from a public GDrive. After running it, you can verify success by clicking on the folder icon on the left side of this screen.There should be a 'downloads' folder containing various files (You may need to click 'Refresh').

In [3]:
if not os.path.exists('downloads'): os.makedirs('downloads')
os.chdir('./downloads/')

folder_id = '1e-UQc-ylzVOOvW2ZiOCpnP-EEziHA4cQ'
file_list = drive.ListFile({'q': "'{}' in parents and trashed=false".format(folder_id)}).GetList()
for file in tqdm(file_list):
    file.GetContentFile(file['title'])

os.chdir('/content/')

100%|██████████| 6/6 [00:10<00:00,  1.81s/it]


## **2. Provide Input & Select Features**
_You need to provide a Google Drive Account, which includes a folder with video files for video mining._
_____________________________________________________________


Running the next cell will create a link. **You need to click this link and enter your Google Account credentials. Copy the code and enter it in the space below.**

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


**Please provide the correct pathes to your folder with videos. You probably need to change the text after '/My Drive/**.

You can verify the correct path by clicking on the folder icon on the left side of this screen and search within the 'drive' folder. (You may need to click 'Refresh') 

In [0]:
in_folder  = '/content/drive/My Drive/trailer/vids/'  # folder with videos, names are used as IDs

You **can unselect features** you are not interested in **by writing 'False' instead of 'True'**. This may drastically increase processing speed.

In [0]:
extract_length           = False # get length of video
extract_cuts             = False # get scene cuts
extract_colors           = False # get brightness and  color information
extract_faces            = False # get faces
extract_emotions         = False # get 8 different emotions per face 
extract_objects          = False # get 80 objects
extract_variance         = True # get semantic variance

The following folders and the logfile will be created to store the results of your analysis. **You do not have to change them, unless you want to use your custom folder** structure to store results

In [0]:
out_folder = '/content/drive/My Drive/trailer/preds/' # folder to store extracted features
log_name   = '2020-05-12_logfile.csv' # name of lofile, pls include .csv ending 
log_folder = '/content/drive/My Drive/trailer/logs/' # folder for logfile
yolo_folder= 'D:/JBR_video_mining/yolov3/' # TO BE REPLACED directory for yolov3 folder    

## **3. Install Requirements**
_**Run all cells - no additional input required**_

_This will install all required packages. (May take a few minutes)_
______________________________________________________________________________

In [0]:
from IPython.utils import io
from downloads.video_mining_dependencies import*
with io.capture_output() as captured:
  if extract_cuts: install_dependencies_cuts()
  if extract_emotions: install_dependencies_emotions()
  if extract_variance: install_dependencies_variance()

## **4. Extract Selected Features**
_**Run all cells - no additional input required**_

_This will extract your selected features by looping through your provided video files folder. (This may take siginificant amount of time)._
______________________________________________________________________________

In [0]:
import os
import pandas as pd
from datetime import datetime
from tqdm import tqdm
from downloads.video_mining_functions import*

In [0]:
if not os.path.exists(out_folder): os.makedirs(out_folder)
if not os.path.exists(log_folder): os.makedirs(log_folder)
logcols = ['time','name','log.length','log.cuts','log.colors','log.faces','log.emotions','log.objects','log.variance']
pd.DataFrame(columns=logcols).to_csv(log_folder+log_name,index_label=False)

In [7]:
for i in tqdm(os.listdir(in_folder)[:2]): # Loop through folder with videos to extract selected features

  ### Provide name, input- and output_path
  name = i.split('.')[0]
  input_path = in_folder+i
  output_path = out_folder +name +'/'
  if not os.path.exists(output_path): os.makedirs(output_path)
  
  ### Extract selected features
  if extract_length: log_length = vid_length(input_path,output_path,name)
  else: log_length=False
  if extract_cuts: log_cuts = get_cuts(input_path,output_path,name)
  else:log_cuts=False
  if extract_colors: log_colors = color_loop(input_path,output_path,name)
  else:log_colors=False
  if extract_faces: log_faces = video_loop_faces(input_path,output_path,name)
  else: log_faces=False
  if extract_emotions: log_emotions = video_loop_faces_predict_emotion(input_path,output_path,name)
  else: log_emotions=False
  if extract_objects: log_objects = coco_loop(input_path,output_path,name)
  else: log_objects=False
  if extract_variance: log_variance = get_visual_variance(input_path,output_path,name)
  else: log_variance=False
        
  ### Write potential errors into logfile
  row = pd.Series([datetime.now(),name,log_length,log_cuts,log_colors,log_faces,log_emotions,log_objects,log_variance])
  pd.DataFrame([row]).to_csv(log_folder+log_name,mode='a+',header=False,index=None)

  0%|          | 0/2 [00:00<?, ?it/s]

scenedetect --input "/content/drive/My Drive/trailer/vids/pzysZI-1LN8_tt1724597.mp4" --output "/content/drive/My Drive/trailer/preds/pzysZI-1LN8_tt1724597/" detect-content list-scenes -f pzysZI-1LN8_tt1724597_FrameLevel_Scenes.csv -q 


 50%|█████     | 1/2 [00:34<00:34, 34.82s/it]

scenedetect --input "/content/drive/My Drive/trailer/vids/6Duh20o-qWs_tt6057032.mp4" --output "/content/drive/My Drive/trailer/preds/6Duh20o-qWs_tt6057032/" detect-content list-scenes -f 6Duh20o-qWs_tt6057032_FrameLevel_Scenes.csv -q 


100%|██████████| 2/2 [00:43<00:00, 21.86s/it]


In [22]:
watch_log = pd.read_csv(log_folder + log_name)
watch_log.head()

Unnamed: 0,time,name,log.length,log.cuts,log.colors,log.faces,log.emotions,log.objects,log.variance
0,2020-05-12 14:34:32.120498,pzysZI-1LN8_tt1724597,False,False,False,False,False,False,True
1,2020-05-12 14:36:07.708185,pzysZI-1LN8_tt1724597,False,False,False,False,False,False,True
2,2020-05-12 14:36:21.593296,pzysZI-1LN8_tt1724597,False,False,False,False,False,False,True
3,2020-05-12 14:36:21.599589,6Duh20o-qWs_tt6057032,False,False,False,False,False,False,True
