<a href="https://colab.research.google.com/github/doubleblindreview2/jbr_video_mining/blob/master/video_mining.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **JBR Video Mining Script**
- **Output**: This script extracts various video features from a given set of videos and stores these features in multiple .csv files, which can be opened with Excel or any text editor afterwards. It will store results in your Google Drive. This requires some free space (<1 MB).
- **Input**: You need to provide a Google Drive Account with a folder with video files. Atlernatively, one example video is provided. 
- **Execution**: In order to execute this script, **run all cells from top to bottom and follow the instructions**

## **1. Download Models**
_You need a Google Account to verify your legitimate use; The models will be automatically downloaded from a public GDrive_
_____________________________________________________________________________



In [1]:
!pip install -U pillow==6.1
print('\n successfully executed, please proceed with next step\n')

Collecting pillow==6.1
[?25l  Downloading https://files.pythonhosted.org/packages/14/41/db6dec65ddbc176a59b89485e8cc136a433ed9c6397b6bfe2cd38412051e/Pillow-6.1.0-cp36-cp36m-manylinux1_x86_64.whl (2.1MB)
[K     |████████████████████████████████| 2.1MB 2.9MB/s 
[31mERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.[0m
[?25hInstalling collected packages: pillow
  Found existing installation: Pillow 7.0.0
    Uninstalling Pillow-7.0.0:
      Successfully uninstalled Pillow-7.0.0
Successfully installed pillow-6.1.0



 successfully executed, please proceed with next step



**<font color='red'>After completion: Please click -> Runtime -> Change runtime type -> GPU in top menu</font>**

If runtime is already set to GPU, please click Runtime -> Restart runtime

In [1]:
import os
import shutil
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
from tqdm import tqdm
print('\n successfully executed, please proceed with next step\n')


 successfully executed, please proceed with next step



Running the next cell will create a link. **You need to click this link and enter your Google Account credentials. Copy the code and enter it in the space below.**

In [2]:
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
print('\n successfully executed, please proceed with next step\n')


 successfully executed, please proceed with next step



Running the next cell will download required models from a public GDrive. After running it, you can verify success by clicking on the folder icon on the left side of this screen.There should be a 'downloads' folder containing various files (You may need to click 'Refresh').

In [3]:
if not os.path.exists('/content/downloads/'): os.makedirs('/content/downloads/')
os.chdir('./downloads/')

folder_id = '1e-UQc-ylzVOOvW2ZiOCpnP-EEziHA4cQ'
file_list = drive.ListFile({'q': "'{}' in parents and trashed=false".format(folder_id)}).GetList()
for file in tqdm(file_list):
    file.GetContentFile(file['title'])

os.chdir('/content/')
print('\n successfully executed, please proceed with next step\n')

100%|██████████| 19/19 [01:46<00:00,  5.60s/it]


 successfully executed, please proceed with next step






## **2. Provide Input & Select Features**
_You need to provide a Google Drive Account, which includes a folder with video files for video mining._
_____________________________________________________________


Running the next cell will create a link. **You need to click this link and enter your Google Account credentials. Copy the code and enter it in the space below.** If you just want to test the example video, you can skip this step.

In [0]:
from google.colab import drive
drive.mount('/content/drive')
print('\n successfully executed, please proceed with next step\n')

**Please provide the correct path to your folder with videos. You probably need to change the text after '/My Drive/ and uncomment the line**. If you just want to test an example video, leave the folder path to 'None'

You can verify the correct path by clicking on the folder icon on the left side of this screen and search within the 'drive' folder. (You may need to click 'Refresh') 

In [4]:
in_folder  = None #'/content/drive/My Drive/trailer/vids/'  folder with videos, names are used as IDs
print('\n successfully executed, please proceed with next step\n')


 successfully executed, please proceed with next step



If you do not provide a video_folder, the following cell will create a folder with one example_video for analysis


In [5]:
if in_folder == '' or in_folder == None: 
  if not os.path.exists('/content/examples/'): os.makedirs('/content/examples/')
  shutil.copy('/content/downloads/example_video.mp4','/content/examples/example_video.mp4')
  in_folder = '/content/examples/'
print('\n successfully executed, please proceed with next step\n')


 successfully executed, please proceed with next step



You **can unselect features** you are not interested in **by writing 'False' instead of 'True'**. This may drastically increase processing speed.

In [6]:
extract_length           = True # get length of video
extract_cuts             = True # get scene cuts
extract_colors           = True # get brightness and  color information
extract_faces            = True # get faces
extract_emotions         = True # get 8 different emotions per face 
extract_objects          = True # get 80 objects
extract_variance         = True # get semantic variance
extract_quality          = True # get quality
print('\n successfully executed, please proceed with next step\n')


 successfully executed, please proceed with next step



## **3. Provide Optional Input**
_**Run all cells - no additional input required, however you may change things to customize execution**_
_____________________________________________________________

The following folders and the logfile will be created to store the results of your analysis. **You do not have to change them, unless you want to use your custom folder** structure to store results

In [7]:
out_folder = '/content/drive/My Drive/video_analysis/preds/' # folder to store extracted features
log_name   = '2020-06-09_logfile.csv' # name of lofile, pls include .csv ending 
log_folder = '/content/drive/My Drive/video_analysis/logs/' # folder for logfile
print('\n successfully executed, please proceed with next step\n')


 successfully executed, please proceed with next step



Setting a start and end index allows you to only analyze a subset of the videos in your folder. **You do not have to change this, if you want to analyze all videos in your provided folder.**

In [8]:
start_index = 0
end_index = len(os.listdir(in_folder))
print('\n successfully executed, please proceed with next step\n')


 successfully executed, please proceed with next step



## **4. Extract Selected Features**
_**Run all cells - no additional input required**_
_____________________________________________________________________________

_This will extract your selected features by looping through your provided video files folder. (Depending on number and size of your videos, this may take some minutes or multiple hours)._

In [9]:
variables = ''
for variable in ['extract_length',
                 'extract_cuts',
                 'extract_colors',
                 'extract_faces',
                 'extract_emotions',
                 'extract_variance',
                 'extract_objects',
                 'extract_quality',
                 'start_index',
                 'end_index']:
  variables += ' --' + variable.replace('_','-') + ' ' + str(eval(variable))

for variable in ['in_folder',
                 'out_folder',
                 'log_folder',
                 'log_name']:
  variables += ' --' + variable.replace('_','-') + ' "' + str(eval(variable))+'"'
print('\n successfully executed, please proceed with next step\n')


 successfully executed, please proceed with next step



In [0]:
!python "/content/downloads/jbr_run.py" {variables}
print('\n successfully executed, please proceed with next step\n')

Using TensorFlow backend.
2020-06-16 10:51:47.107585: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Visual variance requires scene cuts and will automatically extract scene cuts, setting scene_cuts to False

Analysis of 1 videos begins:

  0% 0/1 [00:00<?, ?it/s]

Analyze example_video.mp4
2020-06-16 10:55:51.547160: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-16 10:55:51.547632: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-16 10:55:51.548449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s

In [0]:
print(f'Your feature extraction is completed, you will find your results in {out_folder} on the left side. \
\nIf you had provided a Drive Account, results will have been automatically stored to this folder in your Drive account')

## _**5. APPENDIX: Optional Extras**_
_The following code snippets may be used for specific purposes._
_____________________________________________________________________________

### **5.1 Analyzing videos from Youtube or similar sources**
Instead of a folder with videos, this code only requires you to provide a list of URLs. **Please provide this list in he following cell.**

In [0]:
urls = ['https://www.youtube.com/watch?v=TcMBFSGVi1c',
        'https://www.youtube.com/watch?v=VyHV0BRtdxo'] #replace with your list of URLs

You **can unselect features** you are not interested in **by writing 'False' instead of 'True'**. This may drastically increase processing speed.



In [0]:
extract_length           = True # get length of video
extract_cuts             = True # get scene cuts
extract_colors           = True # get brightness and  color information
extract_faces            = True # get faces
extract_emotions         = True # get 8 different emotions per face 
extract_objects          = True # get 80 objects
extract_variance         = True # get semantic variance
extract_quality          = True # get quality

The following folders and the logfile will be created to store the results of your analysis. You do not have to change them, unless you want to use your custom folder structure to store results

In [0]:
out_folder = '/content/drive/My Drive/video_analysis/preds/' # folder to store extracted features
log_name   = '2020-06-09_logfile_downloads.csv' # name of lofile, pls include .csv ending 
log_folder = '/content/drive/My Drive/video_analysis/logs/' # folder for logfile

**Now you can run all cells to analyze the videos in the URLs provided.** The code will temporarily create a copy of the video in your Colab work environment.

In [0]:
import os
import pandas as pd
!pip install -q youtube-dl

In [0]:
if not os.path.exists('/content/download_vids/'): os.mkdir('/content/download_vids/')

In [0]:
in_folder = '/content/download_vids/'
start_index = 0
device = 0

In [0]:
for video in urls:
  name = video[-11:]
  print(name,video)

  if not name+'.mp4' in os.listdir(in_folder):
    # !youtube-dl -f 'bestvideo[ext=mp4]' --output $in_folder$name".%(ext)s" $video
    !youtube-dl --output $in_folder$name".%(ext)s" $video


In [0]:
variables = ''
for variable in ['extract_length',
                'extract_cuts',
                'extract_colors',
                'extract_faces',
                'extract_emotions',
                'extract_variance',
                'extract_objects',
                'extract_quality',
                'start_index',
                'end_index',
                'device']:
  variables += ' --' + variable.replace('_','-') + ' ' + str(eval(variable))

for variable in ['in_folder',
                'out_folder',
                'log_folder',
                'log_name']:
  variables += ' --' + variable.replace('_','-') + ' "' + str(eval(variable))+'"'

!python "/content/downloads/jbr_run.py" {variables}


### **5.2 Extracting Text from Videos or Images**
**TBD.**