# Example of Obtaining MOSI through CMU-Multimodal Data SDK

## Usage

Please refer to https://github.com/A2Zadeh/CMU-MultimodalDataSDK for more details.

### 1 Installation
In bash:

git clone git@github.com:A2Zadeh/CMU-MultimodalDataSDK.git

export PYTHONPATH="/path/to/cloned/directory/CMU-MultimodalDataSDK:$PYTHONPATH"


In [1]:
import sys
sys.path.append("./CMU-MultimodalDataSDK/")
from mmdata import Dataloader, Dataset

### 2 Merging and Accessing Datasets

In [2]:
mosi = Dataloader('http://sorena.multicomp.cs.cmu.edu/downloads/MOSI') # feed in the URL for the dataset. 
mosi_visual = mosi.facet()
mosi_text = mosi.embeddings()
mosi_audio = mosi.covarep()
mosi_all = Dataset.merge(mosi_visual, mosi_text)
mosi_all = Dataset.merge(mosi_all, mosi_audio)

Let's see what's in the merged dataset

In [3]:
print mosi_all.keys()

['facet', 'covarep', 'embeddings']


### 3 Loading Train/Validation/Test Splits



In [4]:
train_ids = mosi.train()
#valid_ids = mosi.valid()
#test_ids = mosi.test()
vid = list(train_ids)[0]  
print vid # print the first video id in training split

2iD-tVS8NPw


### 4 Access Segments and Features

In [5]:
segment_data = mosi_all['facet'][vid]['3'] # access the facet data in the first video for the 3rd segment

Check how many features in a segment. Note that number of features may be different from different modalities

In [6]:
print len(mosi_all['facet'][vid]['3']) # number of visual features (30 features per second)
print len(mosi_all['embeddings'][vid]['3']) # number of text features (1 feature per word)
print len(mosi_all['covarep'][vid]['3']) # number of audio features (100 features per second)

68
9
229


The format of each feature is "(start_time_1, end_time_1, numpy.array([...]))"

In [7]:
print mosi_all['facet'][vid]['3'][0] # print the first visual feature

(0.027177550000001105, 0.06051085000000111, array([ 4.91000e+02,  1.87000e+02,  2.79000e+02,  2.79000e+02,
       -2.45659e+00, -1.37122e+00, -1.71153e+00, -5.12618e-01,
        5.34201e-01, -1.32983e+00, -1.14383e+00,  5.05123e-01,
       -8.44108e-01, -5.20902e-01, -2.22139e+00, -1.70561e+00,
        1.12229e-04,  8.84197e-03,  3.62235e-03,  8.39275e-02,
        6.54749e-01,  3.75348e-02,  6.69964e-02,  3.62424e-01,
        1.33997e-02,  5.98657e-02,  1.96531e-10,  4.67588e-07,
        1.43613e+00,  9.61095e-01,  1.94466e-01, -5.75646e-01,
       -8.92864e-01,  7.80307e-01, -2.38334e+00, -1.24573e-01,
       -4.10652e-01, -9.14627e-01, -1.10465e+00, -9.49969e-01,
       -3.46717e-01,  5.19613e-01, -2.13318e-01, -1.92194e+00,
        2.29312e-01,  5.81445e-01], dtype=float32))


### 4 Features Alignment between Modalities

Perform alignment for different modality features. 

In [8]:
#aligned_text = mosi_all.align('embeddings') # aligning features according to the textual features.
#aligned_audio = mosi_all.align('covarep') # aligning features according to the audio features.
aligned_visual = mosi_all.align('facet') # aligning features according to the visual features.

# assert the features is being aligned!
print len(aligned_visual['embeddings'][vid]['3']) == len(aligned_visual['facet'][vid]['3']) 

True


### 5  Loading Labels

In [9]:
labels = mosi.sentiments()
print labels[vid]['3'] # print the labels for the segment

1.4


### 6 Tutorials

Install Keras and at least one of the backend (Tensorflow or Theano). Play with `early_fusion_lstm.py` in the `CMU-MultimodalDataSDK/examples`