# Video Processing Example

This example shows how to use `interactionvideo` package to process a video for studies in human interactions. Please also refer to our research paper: Hu and Ma (2020), "Persuading Investors: A Video-Based Study", available at: https://songma.github.io/files/hm_video.pdf.

## Overview

The video processing involves the following steps:
1. Set up folders and check dependencies (requirements)
2. Extract images and audios from a video using `pliers`
3. Extract text from audios using Google Speech2Text API
4. Process images(faces) using Face++ API
5. Process text using Loughran and McDonald (2011) Finance Dictionary and Nicolas, Bai, and Fiske (2019) Social Psychology Dictionary
6. Process audios using pre-trained ML models in `pyAudioAnalysis` and `speechemotionrecognition`
7. Aggregate information from 3V (visual, vocal, and verbal) to video level

## Structure

```bash
├── interactionvideo
│   ├── __pycache__
│   ├── prepare.py
│   ├── decompose.py
│   ├── faceppml.py
│   ├── googleml.py
│   ├── textualanalysis.py
│   ├── audioml.py
│   ├── aggregate.py
│   └── utils.py
├── data
│   ├── example_video.mp4
│   └── VideoDictionary.csv
├── mlmodel
│   ├── pyAudioAnalysis
│   └── speechemotionrecognition
├── output
│   ├── audio_temp
│   ├── image_temp
│   └── result_temp
├── PythonSDK
├── example.py
├── Video Processing Example.ipynb
├── README.md
└── requirement.txt
```

## Dependencies
 - pandas 
 - tqdm 
 - codecs
 - pliers
 - pydub
 - PIL
 - google-cloud-speech
 - google-cloud-storage
 - speechemotionrecognition
 - pyAudioAnalysis

## 1. Set up folders and check dependencies (requirements)

In [None]:
from os.path import join
# Set your root path here
RootPath = r''
# Set your video file path here
VideoFilePath = join(RootPath,'data','example_video.mp4')
# Set your work path here
# Work path is where to store meta files and output files
WorkPath = join(RootPath,'output')

In [None]:
# Set up the folders
from interactionvideo.prepare import setup_folder
setup_folder(WorkPath)

# check the requirements for interactionvideo
from interactionvideo.prepare import check_requirements
check_requirements()

## 2. Extract images and audios from video

In [None]:
from interactionvideo.decompose import convert_video_to_images

# Decompose the video into a stream of images
# The default sampling rate is 10 frames per second
# Find the output at WorkPath\image_temp
convert_video_to_images(VideoFilePath, WorkPath)

In [None]:
from interactionvideo.decompose import convert_video_to_audios

# Decompose the video into audios
# Find the output at WorkPath\audio_temp
convert_video_to_audios(VideoFilePath, WorkPath)

## 3. Extract text from audios using Google Speech2Text API

Set up your Google Cloud environment following

 - https://cloud.google.com/python
 - https://cloud.google.com/storage/docs/quickstart-console
 - https://cloud.google.com/speech-to-text

Create a Google Cloud Storage bucket.

In [None]:
from interactionvideo.googleml import upload_audio_to_googlecloud

# Set your Google Cloud Storage bucket name here
GoogleBucketName = ''

# Upload audio file to Google Cloud Storage
upload_audio_to_googlecloud(WorkPath, GoogleBucketName)

In [None]:
from interactionvideo.googleml import convert_audio_to_text_by_google

# Use Google Speech2Text API to convert audio to text
# Return a txt file of full speech script and a csv file of text and punctuation
# Find the output at 
# - WorkPath\result_temp\script_google.txt (full speech script)
# - WorkPath\result_temp\text_panel_google.csv (text panel from Google)
google_result_text, google_result_df = convert_audio_to_text_by_google(WorkPath, GoogleBucketName)

In [None]:
# Check full speech script from Google
print(google_result_text)

In [None]:
# Check text panel from Google
google_result_df.head(10)

## 4. Process images(faces) using Face++ API

Get your key and secret from https://www.faceplusplus.com.

If you register at https://console.faceplusplus.com/register, use https://api-us.faceplusplus.com as the server.

If you register at https://console.faceplusplus.com.cn/register, use https://api-cn.faceplusplus.com as the server.

The `Python SDK` of Face++ is included in this package.

You can also download it from https://github.com/FacePlusPlus/facepp-python-sdk.

In [None]:
from interactionvideo.faceppml import process_image_by_facepp

# Use Face++ ML API to process images
# Return csv files of facial emotion, gender, predicted age
# Find the output
# - WorkPath\result_temp\face_panel_facepp.csv (full returns from Face++)
# - WorkPath\result_temp\face_panel.csv (clean results)

# Set your key, secret, and server here
FaceppKey = ''
FaceppSecret = ''
FaceppServer = 'https://api-us.faceplusplus.com'

facepp_result_df, facepp_result_clean_df = process_image_by_facepp(VideoFilePath, WorkPath,\
                                            FaceppKey, FaceppSecret, FaceppServer)

In [None]:
# Check full returns from Face++
facepp_result_df.head(10)

In [None]:
# Check clean results
facepp_result_clean_df.head(10)

## 5. Process text using LM and NBF Dictionaries

Use Loughran-McDonald (2011) Finance Dictionary (LM) to construct verbal positive and negative.

For more details, please check https://sraf.nd.edu/textual-analysis/resources.

Use Nicolas, Bai, and Fiske (2019) Social Psychology Dictionary (NBF) to construct verbal warmth and ability.

For more details, please check https://psyarxiv.com/afm8k.

In [None]:
from interactionvideo.textualanalysis import process_text_by_dict

# Set LM-NBF dictionary path
DictionaryPath = join(RootPath,'data','VideoDictionary.csv')

# Dictionary-based textual analysis to get verbal measures
# (e.g., verbal positive, negative, warmth, ability)
# Return csv files of verbal positive, negative, warmth, and ability
# Find the output at 
# - WorkPath\result_temp\text_panel.csv
text_result_df = process_text_by_dict(WorkPath, DictionaryPath)

In [None]:
# Check text panel from Dictionary
text_result_df.head(10)

## 6. Process audios by pre-trained ML models

Construct vocal arousal and vocal valence from pre-trained SVM ML models in `pyAudioAnalysis`.

The pre-trained models are located at mlmodel\pyAudioAnalysis
- svmSpeechEmotion_arousal
- svmSpeechEmotion_arousalMEANS
- svmSpeechEmotion_valence
- svmSpeechEmotion_valenceMEANS

For more details, please check https://github.com/tyiannak/pyAudioAnalysis/wiki/4.-Classification-and-Regression.

Construct vocal positive and vocal negative from pre-trained LSTM ML models in `speechemotionrecognition`.

The pre-trained models are located at mlmodel\speechemotionrecognition
- best_model_LSTM_39.h5

For more details, please check https://github.com/harry-7/speech-emotion-recognition.

Note: speechemotionrecognition requires tensorflow and Keras.


In [None]:
from interactionvideo.audioml import process_audio_by_pyAudioAnalysis

# Set the model path
pyAudioAnalysisModelPath = join(RootPath,'mlmodel','pyAudioAnalysis')

# Construct vocal arousal and vocal valence
# Find the output at 
# - WorkPath\result_temp\audio_panel_pyAudioAnalysis.csv
audio_result_df1 = process_audio_by_pyAudioAnalysis(WorkPath, pyAudioAnalysisModelPath)

In [None]:
# Check audio panel from pyAudioAnalysis
audio_result_df1.head()

In [None]:
from interactionvideo.audioml import process_audio_by_speechemotionrecognition

# Set the model path
speechemotionrecognitionModelPath = join(RootPath,'mlmodel','speechemotionrecognition')

# Construct vocal positive and vocal negative
# Find the output at 
# - WorkPath\result_temp\audio_panel_speechemotionrecognition.csv
audio_result_df2 = process_audio_by_speechemotionrecognition(WorkPath, speechemotionrecognitionModelPath)

In [None]:
# Check audio panel from speechemotionrecognition
audio_result_df2.head()

## 7. Aggregate information from 3V to video level

In [None]:
from interactionvideo.aggregate import aggregate_3v_to_video

# Aggregate 3V information
# Find the output at 
# - WorkPath\result_temp\video_panel.csv
video_result_df = aggregate_3v_to_video(WorkPath)

In [None]:
# Check video panel
video_result_df.T