<font style="font-size:25px;text-align:center;">**Machine Learning Assignment on Video Memorability Prediction**</font>

This python notebook is classified into two main subsections :
<ol type="I">
    <li>Final ML Model - Final ML Model has the best model that i chose from the explorations i did in predicting memorability.</li>
    <li>Exploration - Has the explorations i did in predicting memorability.
        <ol type="1">
            <li>Video feature: HMP features</li>
            <li>Video features: C3D features</li>
            <li>Semantic Feature : Captions</li>
                <ol type="A">
                    <li>Using TfidfVectorizer</li>
                    <li>Using CountVectorizer</li>
                </ol>
            <li>Semantic Feature : Captions with weights
                <ol type="A">
                    <li>Using weighted scores for positive words</li>
                </ol>
            </li>
            <li>Video & Semantic feature: C3D & Captions features</li>
        </ol>
    </li> 
</ol>

**NOTE: For each of the above features, three models were run:**

<ul>
    <li>**Linear Regression Model**</li>
    <li>**Decision Tree Regression Model**</li>
    <li>**RandomForest Regression Model**</li>
</ul>

My best working model is **Semantic Feature : Captions with weights - Using weighted scores for positive words. **  

I'll be using that as my Final ML Model

<font style="font-size:21px;text-align:center;">**I. Final ML Model**</font>

<font style="font-size:18px;text-align:center;">**Semantic Feature : Captions with weights**</font>

<font style="font-size:16px;text-align:center;">**Using weighted scores for positive words**</font>

**According to the Rohit Gupta et. al. paper on  Linear Models for Video Memorability Prediction Using Visual and Semantic Feature, certain words had more impact than others.  I am trying to give extra weights to such words and sending as a new feature.**

Importing the required libraries

In [375]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob
import os

Importing the Ground truth Dataset for train set

In [376]:
train_ground_truth = pd.read_csv('dev-set_ground-truth.csv')

In [377]:
train_ground_truth.head()

Unnamed: 0,video,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations
0,video3.webm,0.924,34,0.846,13
1,video4.webm,0.923,33,0.667,12
2,video6.webm,0.863,33,0.7,10
3,video8.webm,0.922,33,0.818,11
4,video10.webm,0.95,34,0.9,10


Dropping the annotations columns as these features/columns don't really contribute to memorability scores

In [378]:
train_ground_truth = train_ground_truth.drop(['nb_short-term_annotations', 'nb_long-term_annotations'], axis=1)

In [379]:
train_ground_truth.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability
0,video3.webm,0.924,0.846
1,video4.webm,0.923,0.667
2,video6.webm,0.863,0.7
3,video8.webm,0.922,0.818
4,video10.webm,0.95,0.9


Importing the Ground truth Dataset for test set

In [380]:
test_ground_truth = pd.read_csv('testset/ground_truth_template.csv')

In [381]:
test_ground_truth.head()

Unnamed: 0,video,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations
0,7494,,33,,12
1,7495,,34,,10
2,7496,,32,,13
3,7497,,33,,10
4,7498,,33,,10


Importing the Captions for train set

In [382]:
train_captions = pd.read_csv('dev-set_video-captions.txt',delimiter='\t',header= None,names=('video','Captions'))

In [383]:
train_captions.head()

Unnamed: 0,video,Captions
0,video3.webm,blonde-woman-is-massaged-tilt-down
1,video4.webm,roulette-table-spinning-with-ball-in-closeup-shot
2,video6.webm,khr-gangsters
3,video8.webm,medical-helicopter-hovers-at-airport
4,video10.webm,couple-relaxing-on-picnic-crane-shot


Importing the Captions for test set

In [384]:
test_captions = pd.read_csv('testset/test-set-1_video-captions.txt',delimiter='\t',header= None,names=('video','Captions'))

In [385]:
test_captions.head()

Unnamed: 0,video,Captions
0,video7494.webm,green-jeep-struggling-to-drive-over-huge-rocks
1,video7495.webm,hiking-woman-tourist-is-walking-forward-in-mou...
2,video7496.webm,close-up-of-african-american-doctors-hands-usi...
3,video7497.webm,slow-motion-of-a-man-using-treadmill-in-the-gy...
4,video7498.webm,slow-motion-of-photographer-in-national-park


In [386]:
test_captions.tail()

Unnamed: 0,video,Captions
1995,video10004.webm,astronaut-in-outer-space-against-the-backdrop-...
1996,video10005.webm,young-women-lying-on-sunbed-and-applying-sun-c...
1997,video10006.webm,doctor-talking-to-patient-using-a-tablet-to-ex...
1998,video10007.webm,businessman-sitting-on-the-beach-on-inflatable...
1999,video10008.webm,woman-eating-ice-cream-and-sitting-in-the-stre...


Combining the train and test captions, so as to send it for feature extraction together.

In [387]:
merged_captions = pd.concat([train_captions, test_captions],ignore_index=True)

In [388]:
merged_captions.head()

Unnamed: 0,video,Captions
0,video3.webm,blonde-woman-is-massaged-tilt-down
1,video4.webm,roulette-table-spinning-with-ball-in-closeup-shot
2,video6.webm,khr-gangsters
3,video8.webm,medical-helicopter-hovers-at-airport
4,video10.webm,couple-relaxing-on-picnic-crane-shot


In [389]:
merged_captions.tail()

Unnamed: 0,video,Captions
7995,video10004.webm,astronaut-in-outer-space-against-the-backdrop-...
7996,video10005.webm,young-women-lying-on-sunbed-and-applying-sun-c...
7997,video10006.webm,doctor-talking-to-patient-using-a-tablet-to-ex...
7998,video10007.webm,businessman-sitting-on-the-beach-on-inflatable...
7999,video10008.webm,woman-eating-ice-cream-and-sitting-in-the-stre...


In [394]:
merged_captions['Captions'][0]

'blonde-woman-is-massaged-tilt-down'

**According to the Rohit Gupta et. al. paper on  Linear Models for Video Memorability Prediction Using Visual and Semantic Feature, certain words had more impact than others.  I am trying to give extra weights to such words and sending as a new feature.**

In [391]:
weights_to_certain_words = {'women':16,'woman':16,'eating':15,'putting':14,'lying':13,'girl':12,'selfie':11,'relaxing':10,'jellyfish':9,'cat':8,'super':7,'slow':6,'super':5,'american':4,'potrait':3,'pregnant':2,'couple':1}

Cleaning the text data

In [392]:
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
#from nltk.stem.porter import PorterStemmer

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/arunabellgutteramesh/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Cleaning involves:
<ul>
    <li>Removing the special charecters and retaining only text data in lowercase</li>
    <li>Removing Stopwords</li>
    <li>Storing the cleaned text data into a list named corpus</li>
    <li>Creating Bag of Words model</li>
</ul>

If a word from the caption exists in the dictionary of "weights_to_certain_words", then such respective word's weight is cumulatively addded for the given caption.  Else, the weight is retained to be zero.  This forms a separate feature.

In [395]:
corpus = []
weights = []
for i in range(0, 8000):
    local_weight = 0
    caption = re.sub('[^a-zA-Z]', ' ', merged_captions['Captions'][i])
    caption = caption.lower()
    caption = caption.split()
    caption = [word for word in caption if not word in set(stopwords.words('english'))]
    for word in caption:
        if(word in set(weights_to_certain_words.keys())):
                local_weight = local_weight + weights_to_certain_words[word]
    weights.append(local_weight)
    caption = ' '.join(caption)
    corpus.append(caption)

In [399]:
weights_df = pd.DataFrame(np.array(weights).reshape(8000,1))

In [403]:
len(corpus)

8000

Since TfIdf Vectorizer outperformed Count Vectorizer, using TfIdf Vectorizer for further evaluations.

Using TfIdf value of each word occurance amongst the bag of words.

In [400]:
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer()

In [404]:
captions_array = tf.fit_transform(corpus).toarray()

In [405]:
merged_array = np.concatenate((captions_array, weights_df), axis=1)

In [406]:
captions_array.shape

(8000, 5762)

In [407]:
merged_array.shape

(8000, 5763)

**Building Models**

Defining the method for Spearmann's correlation coefficient

In [408]:
def Get_score(Y_pred,Y_true):
    '''Calculate the Spearmann"s correlation coefficient'''
    Y_pred = np.squeeze(Y_pred)
    Y_true = np.squeeze(Y_true)
    if Y_pred.shape != Y_true.shape:
        print('Input shapes don\'t match!')
    else:
        if len(Y_pred.shape) == 1:
            Res = pd.DataFrame({'Y_true':Y_true,'Y_pred':Y_pred})
            score_mat = Res[['Y_true','Y_pred']].corr(method='spearman',min_periods=1)
            print('The Spearman\'s correlation coefficient is: %.3f' % score_mat.iloc[1][0])
        else:
            for ii in range(Y_pred.shape[1]):
                Get_score(Y_pred[:,ii],Y_true[:,ii])

Split the dataset back into test and train

In [409]:
train_caption_feature = merged_array[:6000]

In [410]:
test_caption_feature = merged_array[6000:]

In [411]:
train_caption_feature.shape

(6000, 5763)

In [412]:
test_caption_feature.shape

(2000, 5763)

Inputting the data into X and Y variables

In [418]:
X_train = train_caption_feature
Y_train = train_ground_truth.iloc[:, 1:3].values

In [419]:
X_train.shape

(6000, 5763)

In [420]:
Y_train.shape

(6000, 2)

**Using my best model :**

**Random Forest Regression Model with n_estimators=100 **

In [421]:
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=100)
regressor.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [422]:
X_test = test_caption_feature

In [423]:
X_test.shape

(2000, 5763)

In [424]:
Y_pred = regressor.predict(X_test)

In [425]:
Y_pred.shape

(2000, 2)

In [427]:
type(Y_pred)

numpy.ndarray

In [428]:
Y_pred

array([[0.8680944 , 0.81992   ],
       [0.764564  , 0.772126  ],
       [0.89924238, 0.80948643],
       ...,
       [0.89565   , 0.77094   ],
       [0.91036825, 0.86115958],
       [0.90093917, 0.75564542]])

In [429]:
import numpy
result = numpy.asarray(Y_pred)
numpy.savetxt("result.csv", result, delimiter=",")

**Conclusion**

**Short-term memorability and long-term memorability scores were taken from this results.csv file into the orginal template of test set (called ground_truth_template.csv) and saved as testSetResult.csv**

<font style="font-size:21px;text-align:center;">**II. Exploration**</font>

Importing the required libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import glob

<font style="font-size:18px;text-align:center;">**1) Video feature: HMP features**</font>

Reading HMP Features and the video names from the folder

In [2]:
def read_HMP(fname):
    """Scan HMP(Histogram of Motion Patterns) features from file"""
    with open(fname) as f:
        for line in f:
            pairs=line.split()
            HMP_temp = { int(p.split(':')[0]) : float(p.split(':')[1]) for p in pairs}
    # there are 6075 bins, fill zeros
    HMP = np.zeros(6075)
    for idx in HMP_temp.keys():
        HMP[idx-1] = HMP_temp[idx]            
    return HMP

Fetching the HMP Features

In [3]:
HMP_feature_list= []
video_names_list = []
path = 'C3D/*.txt'
for filename in glob.glob('HMP/*.txt'):
    name = ((filename.split('/')[-1]).split('.')[0])
    video_names_list.append(name)
    HMP_features = read_HMP(filename)
    HMP_feature_list.append(HMP_features)

Converting features into dataframe

In [4]:
HMP_features = pd.DataFrame(np.array(HMP_feature_list).reshape(6000,6075))
HMP_features["video"] = video_names_list

In [5]:
HMP_features.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,6066,6067,6068,6069,6070,6071,6072,6073,6074,video
0,0.039095,0.013401,0.000546,0.0,0.004734,0.00032,0.0,0.0,4.2e-05,0.0,...,0.000199,6.7e-05,9e-06,0.00017,0.00049,3.4e-05,0.000275,0.00057,0.000177,video4960
1,0.00715,0.002285,0.000158,0.0,0.000995,0.000197,0.0,0.0,5.5e-05,0.0,...,0.000371,0.000101,1.8e-05,0.0002,0.000715,0.000134,0.000292,0.00084,0.001522,video1818
2,0.014682,0.003656,4.7e-05,0.0,0.001622,2.5e-05,0.0,0.0,1.1e-05,0.0,...,3.8e-05,1.1e-05,0.0,0.000105,5.4e-05,0.0,9.4e-05,0.00011,7e-06,video6811
3,0.090945,0.012822,0.000117,0.0,0.007057,8.8e-05,0.0,0.0,3.1e-05,0.0,...,1e-05,0.0,0.0,1.6e-05,2.1e-05,0.0,3.6e-05,3.6e-05,3e-06,video3969
4,0.017401,0.003635,1.8e-05,0.0,0.002006,3.5e-05,0.0,0.0,1.3e-05,0.0,...,7.8e-05,7.8e-05,2e-06,0.000126,0.000468,1.6e-05,0.000168,0.000497,2.9e-05,video993


Reading the ground truth from the csv 

In [6]:
ground_truth = pd.read_csv('dev-set_ground-truth.csv')

Dropping the annotations columns as these features/columns don't really contribute to memorability scores

In [7]:
ground_truth = ground_truth.drop(['nb_short-term_annotations', 'nb_long-term_annotations'], axis=1)

Cleaning the names of the video so that we can match with the HMP features

In [8]:
ground_truth['video'] = ground_truth['video'].apply(lambda x : x.split('.')[0])

In [9]:
ground_truth.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability
0,video3,0.924,0.846
1,video4,0.923,0.667
2,video6,0.863,0.7
3,video8,0.922,0.818
4,video10,0.95,0.9


Merging the HMP features and the ground truth vlaues together into a dataframe

In [10]:
df = pd.merge(ground_truth, HMP_features , on='video')

In [11]:
df.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability,0,1,2,3,4,5,6,...,6065,6066,6067,6068,6069,6070,6071,6072,6073,6074
0,video3,0.924,0.846,0.125563,0.024036,0.000314,0.0,0.015864,0.000358,0.0,...,0.0,0.000393,0.000279,0.0,0.000289,0.001926,0.0,8.6e-05,0.00058,0.0
1,video4,0.923,0.667,0.007526,0.001421,6.8e-05,0.0,0.001184,0.000143,0.0,...,5.3e-05,0.000244,6.6e-05,0.0,8.1e-05,0.000617,9.4e-05,0.00022,0.000762,0.001224
2,video6,0.863,0.7,0.109584,0.018978,0.000289,0.0,0.008774,0.000208,0.0,...,7e-06,5.4e-05,4.5e-05,0.0,2.8e-05,0.000291,3.3e-05,5.2e-05,0.000258,0.000215
3,video8,0.922,0.818,0.120431,0.013561,0.000277,0.0,0.018974,0.000913,0.0,...,5.9e-05,0.00111,7.5e-05,8e-06,0.000333,0.000793,0.000101,0.000588,0.000503,0.000452
4,video10,0.95,0.9,0.005026,0.001356,5.5e-05,0.0,0.000665,2.9e-05,0.0,...,9e-06,0.000882,0.0002,9e-06,0.000559,0.001097,1.8e-05,0.000632,0.001128,6.4e-05


In [12]:
df.shape

(6000, 6078)

**Building Models**

Defining the method for Spearmann's correlation coefficient

In [13]:
def Get_score(Y_pred,Y_true):
    '''Calculate the Spearmann"s correlation coefficient'''
    Y_pred = np.squeeze(Y_pred)
    Y_true = np.squeeze(Y_true)
    if Y_pred.shape != Y_true.shape:
        print('Input shapes don\'t match!')
    else:
        if len(Y_pred.shape) == 1:
            Res = pd.DataFrame({'Y_true':Y_true,'Y_pred':Y_pred})
            score_mat = Res[['Y_true','Y_pred']].corr(method='spearman',min_periods=1)
            print('The Spearman\'s correlation coefficient is: %.3f' % score_mat.iloc[1][0])
        else:
            for ii in range(Y_pred.shape[1]):
                Get_score(Y_pred[:,ii],Y_true[:,ii])

Assigning the features to X and Y

In [14]:
X = df.iloc[:,3:6078].values
Y = df.iloc[:, 1:3].values

Splitting the dataset into Train and Test sets

In [15]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.20)

**a) Using Linear Regression Model**

In [16]:
from sklearn.linear_model import LinearRegression
regressor1 = LinearRegression()
regressor1.fit(X_train, Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [17]:
Y_pred1 = regressor1.predict(X_test)
Get_score(Y_pred1, Y_test)

The Spearman's correlation coefficient is: 0.066
The Spearman's correlation coefficient is: 0.045


**b) Using Decision Tree Model**

In [18]:
from sklearn.tree import DecisionTreeRegressor
regressor2 = DecisionTreeRegressor()
regressor2.fit(X_train, Y_train)

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')

In [19]:
Y_pred2 = regressor2.predict(X_test)
Get_score(Y_pred2, Y_test)

The Spearman's correlation coefficient is: 0.093
The Spearman's correlation coefficient is: 0.005


**c) Using Random Forest Regression Model**

In [21]:
from sklearn.ensemble import RandomForestRegressor
regressor3 = RandomForestRegressor()
regressor3.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [22]:
Y_pred3 = regressor3.predict(X_test)
Get_score(Y_pred3, Y_test)

The Spearman's correlation coefficient is: 0.168
The Spearman's correlation coefficient is: 0.067


In [23]:
from sklearn.ensemble import RandomForestRegressor
regressor3 = RandomForestRegressor(n_estimators=100)
regressor3.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [25]:
Y_pred3 = regressor3.predict(X_test)
Get_score(Y_pred3, Y_test)

The Spearman's correlation coefficient is: 0.285
The Spearman's correlation coefficient is: 0.095


**Conclusion**

**As we can see, using only HMP video features yields poor results.**

<font style="font-size:18px;text-align:center;">**2) Video features: C3D features**</font>

Reading C3D Features and the video names from the folder

In [26]:
C3D_feature_list= []
video_names_list = []
path = 'C3D/*.txt'
for filename in glob.glob('C3D/*.txt'):
    name = ((filename.split('/')[-1]).split('.')[0])
    video_names_list.append(name) 
    with open(filename) as f:
        for line in f:
            C3D_features =[float(item) for item in line.split()]
    C3D_feature_list.append(C3D_features)

Adding these features into a dataframe

In [27]:
C3D_features = pd.DataFrame(np.array(C3D_feature_list).reshape(6000,101))
C3D_features["video"] = video_names_list

In [28]:
C3D_features.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,92,93,94,95,96,97,98,99,100,video
0,0.01016593,0.003529,0.00035963,3.73e-06,7.1e-07,2e-06,1.1e-05,1.24e-06,2e-06,7.41e-06,...,4.5e-07,7.3e-07,7.4e-05,8.93e-06,8e-08,1.515e-05,0.00692074,0.00575775,0.001361,video4960
1,1.1e-07,3e-06,2e-08,2e-08,1e-08,0.99834,0.001243,7e-08,2e-06,6.2e-07,...,2e-08,5e-08,0.0,1.2e-07,3.74e-06,3e-08,6e-08,3.3e-07,2e-06,video1818
2,0.00509931,0.003971,0.04524705,0.01191236,0.00047978,0.001651,2.8e-05,0.00520616,0.001073,0.00030112,...,0.00505672,0.0003543,0.015899,0.00049365,1.232e-05,0.00119673,0.00360873,0.00130745,0.015382,video6811
3,0.00072623,0.000772,0.00086538,8.51e-06,1.606e-05,4.6e-05,0.000535,7.48e-06,0.000392,1.807e-05,...,0.00019035,3.845e-05,5e-05,0.00044901,1.281e-05,0.00065495,5.448e-05,0.00074431,0.042389,video3969
4,0.0002519,0.002037,8.34e-06,2.389e-05,0.00016576,2e-06,1e-06,3.01e-06,2e-05,1.483e-05,...,1.52e-05,7.5e-07,7.5e-05,1.07e-05,7.6e-07,1.825e-05,6.562e-05,0.00031318,2e-06,video993


Reading the ground truth from the csv 

In [29]:
ground_truth = pd.read_csv('dev-set_ground-truth.csv')

Dropping the annotations columns as these features/columns don't really contribute to memorability scores

In [30]:
ground_truth = ground_truth.drop(['nb_short-term_annotations', 'nb_long-term_annotations'], axis=1)

Cleaning the names of the video so that we can match the C3D features

In [31]:
ground_truth['video'] = ground_truth['video'].apply(lambda x : x.split('.')[0])

In [32]:
ground_truth.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability
0,video3,0.924,0.846
1,video4,0.923,0.667
2,video6,0.863,0.7
3,video8,0.922,0.818
4,video10,0.95,0.9


Merging the C3D features and the ground truth vlaues together into a dataframe

In [33]:
df = pd.merge(ground_truth, C3D_features , on='video')

In [34]:
df.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability,0,1,2,3,4,5,6,...,91,92,93,94,95,96,97,98,99,100
0,video3,0.924,0.846,0.020249,0.001578,0.000826,0.000945,6.3e-05,3e-06,0.001162,...,0.001042,0.000161,0.000257,0.046617,0.000156,6e-06,0.000537,0.000339,0.008437,0.00047
1,video4,0.923,0.667,0.000118,0.000891,0.000188,4.5e-05,6.3e-05,2e-06,0.000641,...,0.000582,0.000393,0.000864,0.000947,0.000136,7e-06,0.00036,0.000159,0.001025,2e-05
2,video6,0.863,0.7,0.011765,0.000746,0.000784,1.3e-05,7e-06,2.8e-05,4.1e-05,...,0.000224,3e-06,3.1e-05,0.002538,0.000104,5e-06,6.4e-05,0.00538,0.001027,0.001384
3,video8,0.922,0.818,0.000223,0.000165,7e-06,1.6e-05,5e-06,1.4e-05,0.000154,...,4.6e-05,9e-06,2.3e-05,5.3e-05,4.8e-05,1.9e-05,1e-06,4e-06,0.00038,2.9e-05
4,video10,0.95,0.9,9e-05,0.000615,0.003436,0.001281,0.003551,0.000313,4.2e-05,...,3.7e-05,0.00069,0.000171,0.000231,0.000637,4e-05,6.1e-05,7.5e-05,2e-06,0.001323


In [35]:
df.shape

(6000, 104)

**Building Models**

Defining the method for Spearmann's correlation coefficient

In [36]:
def Get_score(Y_pred,Y_true):
    '''Calculate the Spearmann"s correlation coefficient'''
    Y_pred = np.squeeze(Y_pred)
    Y_true = np.squeeze(Y_true)
    if Y_pred.shape != Y_true.shape:
        print('Input shapes don\'t match!')
    else:
        if len(Y_pred.shape) == 1:
            Res = pd.DataFrame({'Y_true':Y_true,'Y_pred':Y_pred})
            score_mat = Res[['Y_true','Y_pred']].corr(method='spearman',min_periods=1)
            print('The Spearman\'s correlation coefficient is: %.3f' % score_mat.iloc[1][0])
        else:
            for ii in range(Y_pred.shape[1]):
                Get_score(Y_pred[:,ii],Y_true[:,ii])

Assigning the features to X and Y

In [37]:
X = df.iloc[:,3:104].values
Y = df.iloc[:, 1:3].values

Splitting the dataset into Train and Test sets

In [38]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.20)

**a) Using Linear Regression Model**

In [39]:
from sklearn.linear_model import LinearRegression
regressor1 = LinearRegression()
regressor1.fit(X_train, Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [40]:
Y_pred1 = regressor1.predict(X_test)
Get_score(Y_pred1, Y_test)

The Spearman's correlation coefficient is: 0.266
The Spearman's correlation coefficient is: 0.090


**b) Using Decision Tree Model**

In [41]:
from sklearn.tree import DecisionTreeRegressor
regressor2 = DecisionTreeRegressor()
regressor2.fit(X_train, Y_train)

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')

In [42]:
Y_pred2 = regressor2.predict(X_test)
Get_score(Y_pred2, Y_test)

The Spearman's correlation coefficient is: 0.081
The Spearman's correlation coefficient is: -0.007


**c) Using Random Forest Regression Model**

In [43]:
from sklearn.ensemble import RandomForestRegressor
regressor3 = RandomForestRegressor()
regressor3.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [44]:
Y_pred3 = regressor3.predict(X_test)
Get_score(Y_pred3, Y_test)

The Spearman's correlation coefficient is: 0.161
The Spearman's correlation coefficient is: 0.080


In [45]:
from sklearn.ensemble import RandomForestRegressor
regressor3 = RandomForestRegressor(n_estimators=100)
regressor3.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [46]:
Y_pred3 = regressor3.predict(X_test)
Get_score(Y_pred3, Y_test)

The Spearman's correlation coefficient is: 0.278
The Spearman's correlation coefficient is: 0.104


**Conclusion**

**As we can see again, using only C3D video features too yields poor results.**

<font style="font-size:18px;text-align:center;">**3) Semantic Feature : Captions**</font>

Importing the Ground truth Dataset

In [49]:
df = pd.read_csv('dev-set_ground-truth.csv')

In [50]:
df.head()

Unnamed: 0,video,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations
0,video3.webm,0.924,34,0.846,13
1,video4.webm,0.923,33,0.667,12
2,video6.webm,0.863,33,0.7,10
3,video8.webm,0.922,33,0.818,11
4,video10.webm,0.95,34,0.9,10


Dropping the annotations columns as these features/columns don't really contribute to memorability scores

In [51]:
df = df.drop(['nb_short-term_annotations', 'nb_long-term_annotations'], axis=1)

Cleaning the names of the video

In [52]:
df['video'] = df['video'].apply(lambda x : x.split('.')[0])

In [53]:
df.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability
0,video3,0.924,0.846
1,video4,0.923,0.667
2,video6,0.863,0.7
3,video8,0.922,0.818
4,video10,0.95,0.9


Importing the captions file

In [54]:
captions_features = pd.read_csv('dev-set_video-captions.txt',delimiter='\t',header= None,names=('video','Captions'))

Cleaning the names of the video

In [55]:
captions_features['video'] = captions_features['video'].apply(lambda x : x.split('.')[0])

In [57]:
captions_features.head()

Unnamed: 0,video,Captions
0,video3,blonde-woman-is-massaged-tilt-down
1,video4,roulette-table-spinning-with-ball-in-closeup-shot
2,video6,khr-gangsters
3,video8,medical-helicopter-hovers-at-airport
4,video10,couple-relaxing-on-picnic-crane-shot


Merging the ground truth file and captions file

In [58]:
for line in df:
    df['Captions'] = captions_features['Captions']

In [59]:
df.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability,Captions
0,video3,0.924,0.846,blonde-woman-is-massaged-tilt-down
1,video4,0.923,0.667,roulette-table-spinning-with-ball-in-closeup-shot
2,video6,0.863,0.7,khr-gangsters
3,video8,0.922,0.818,medical-helicopter-hovers-at-airport
4,video10,0.95,0.9,couple-relaxing-on-picnic-crane-shot


Cleaning the text data

In [60]:
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
#from nltk.stem.porter import PorterStemmer

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/arunabellgutteramesh/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Cleaning involves:
<ul>
    <li>Removing the special charecters and retaining only text data in lowercase</li>
    <li>Removing Stopwords</li>
    <li>Storing the cleaned text data into a list named corpus</li>
    <li>Creating Bag of Words model</li>
</ul>

In [139]:
corpus = []
for i in range(0, 6000):
    caption = re.sub('[^a-zA-Z]', ' ', df['Captions'][i])
    caption = caption.lower()
    caption = caption.split()
    caption = [word for word in caption if not word in set(stopwords.words('english'))]
    caption = ' '.join(caption)
    corpus.append(caption)

Using **TfIdf** value of each word occurance amongst the bag of words

**A)** Using **TfidfVectorizer**

In [63]:
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer()

**Building Models**

Defining the method for Spearmann's correlation coefficient

In [64]:
def Get_score(Y_pred,Y_true):
    '''Calculate the Spearmann"s correlation coefficient'''
    Y_pred = np.squeeze(Y_pred)
    Y_true = np.squeeze(Y_true)
    if Y_pred.shape != Y_true.shape:
        print('Input shapes don\'t match!')
    else:
        if len(Y_pred.shape) == 1:
            Res = pd.DataFrame({'Y_true':Y_true,'Y_pred':Y_pred})
            score_mat = Res[['Y_true','Y_pred']].corr(method='spearman',min_periods=1)
            print('The Spearman\'s correlation coefficient is: %.3f' % score_mat.iloc[1][0])
        else:
            for ii in range(Y_pred.shape[1]):
                Get_score(Y_pred[:,ii],Y_true[:,ii])

Inputting the data into X and Y variables

In [65]:
X = tf.fit_transform(corpus).toarray()
Y = df.iloc[:, 1:3].values

Splitting the dataset into the Training set and Test set

In [66]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.20)

**a) Using Linear Regression Model**

In [67]:
from sklearn.linear_model import LinearRegression
regressor1 = LinearRegression()
regressor1.fit(X_train, Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [68]:
X.shape

(6000, 5087)

In [69]:
Y_pred1 = regressor1.predict(X_test)
Get_score(Y_pred1, Y_test)

The Spearman's correlation coefficient is: 0.191
The Spearman's correlation coefficient is: 0.060


**b) Using Decision Tree Regression Model**

In [70]:
from sklearn.tree import DecisionTreeRegressor
regressor2 = DecisionTreeRegressor()
regressor2.fit(X_train, Y_train)

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')

In [71]:
Y_pred2 = regressor2.predict(X_test)
Get_score(Y_pred2, Y_test)

The Spearman's correlation coefficient is: 0.227
The Spearman's correlation coefficient is: 0.104


**c) Using RandomForest Regression Model**

In [72]:
from sklearn.ensemble import RandomForestRegressor
regressor3 = RandomForestRegressor()
regressor3.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [74]:
Y_pred3 = regressor3.predict(X_test)
Get_score(Y_pred3, Y_test)

The Spearman's correlation coefficient is: 0.351
The Spearman's correlation coefficient is: 0.158


In [75]:
from sklearn.ensemble import RandomForestRegressor
regressor3 = RandomForestRegressor(n_estimators=100)
regressor3.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [77]:
Y_pred3 = regressor3.predict(X_test)
Get_score(Y_pred3, Y_test)

The Spearman's correlation coefficient is: 0.396
The Spearman's correlation coefficient is: 0.182


Conclusion

Using **count** of occurance of each word amongst the bag of words

**B)** Using **CountVectorizer**

In [78]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()

Inputting the data into X and Y variables

In [79]:
X = cv.fit_transform(corpus).toarray()
Y = df.iloc[:, 1:3].values

Splitting the dataset into the Training set and Test set

In [80]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.20)

**a) Using Linear Regression Model**

In [81]:
from sklearn.linear_model import LinearRegression
regressor1 = LinearRegression()
regressor1.fit(X_train, Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [82]:
X.shape

(6000, 5087)

In [83]:
Y_pred1 = regressor1.predict(X_test)
Get_score(Y_pred1, Y_test)

The Spearman's correlation coefficient is: 0.087
The Spearman's correlation coefficient is: -0.010


**b) Using Decision Tree Regression Model**

In [84]:
from sklearn.tree import DecisionTreeRegressor
regressor2 = DecisionTreeRegressor()
regressor2.fit(X_train, Y_train)

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')

In [85]:
Y_pred2 = regressor2.predict(X_test)
Get_score(Y_pred2, Y_test)

The Spearman's correlation coefficient is: 0.257
The Spearman's correlation coefficient is: 0.049


**c) Using Random Forest Regression Model**

In [86]:
from sklearn.ensemble import RandomForestRegressor
regressor3 = RandomForestRegressor()
regressor3.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [87]:
Y_pred3 = regressor3.predict(X_test)
Get_score(Y_pred3, Y_test)

The Spearman's correlation coefficient is: 0.362
The Spearman's correlation coefficient is: 0.102


In [88]:
from sklearn.ensemble import RandomForestRegressor
regressor3 = RandomForestRegressor(n_estimators=100)
regressor3.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [90]:
Y_pred3 = regressor3.predict(X_test)
Get_score(Y_pred3, Y_test)

The Spearman's correlation coefficient is: 0.394
The Spearman's correlation coefficient is: 0.114


**Conclusion**

**TfidfVectorizer outperformed CountVectorizer with Random Frest Regression Model and n_estimator=100.**

<font style="font-size:18px;text-align:center;">**4) Semantic Feature : Captions with weights**</font>

**A)** Using weighted scores for **positive** words


**According to the Rohit Gupta et. al. paper on  Linear Models for Video Memorability Prediction Using Visual and Semantic Feature, certain words had more impact than others.  I am trying to give extra weights to such words and sending as a new feature.**

Importing the Ground truth Dataset

In [141]:
df = pd.read_csv('dev-set_ground-truth.csv')

In [142]:
df.head()

Unnamed: 0,video,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations
0,video3.webm,0.924,34,0.846,13
1,video4.webm,0.923,33,0.667,12
2,video6.webm,0.863,33,0.7,10
3,video8.webm,0.922,33,0.818,11
4,video10.webm,0.95,34,0.9,10


Removing the annotations columns as they will not be used for prediction

In [143]:
df = df.drop(['nb_short-term_annotations', 'nb_long-term_annotations'], axis=1)

In [144]:
df['video'] = df['video'].apply(lambda x : x.split('.')[0])

In [145]:
df.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability
0,video3,0.924,0.846
1,video4,0.923,0.667
2,video6,0.863,0.7
3,video8,0.922,0.818
4,video10,0.95,0.9


Importing the captions file

In [146]:
captions_features = pd.read_csv('dev-set_video-captions.txt',delimiter='\t',header= None,names=('video','Captions'))

In [147]:
captions_features['video'] = captions_features['video'].apply(lambda x : x.split('.')[0])

In [148]:
captions_features.head()

Unnamed: 0,video,Captions
0,video3,blonde-woman-is-massaged-tilt-down
1,video4,roulette-table-spinning-with-ball-in-closeup-shot
2,video6,khr-gangsters
3,video8,medical-helicopter-hovers-at-airport
4,video10,couple-relaxing-on-picnic-crane-shot


Merging the ground truth file and captions file

In [149]:
for line in df:
    df['Captions'] = captions_features['Captions']

In [150]:
df.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability,Captions
0,video3,0.924,0.846,blonde-woman-is-massaged-tilt-down
1,video4,0.923,0.667,roulette-table-spinning-with-ball-in-closeup-shot
2,video6,0.863,0.7,khr-gangsters
3,video8,0.922,0.818,medical-helicopter-hovers-at-airport
4,video10,0.95,0.9,couple-relaxing-on-picnic-crane-shot


**According to the Rohit Gupta et. al. paper on  Linear Models for Video Memorability Prediction Using Visual and Semantic Feature, certain words had more impact than others.  I am trying to give extra weights to such words and sending as a new feature.**

In [151]:
weights_to_certain_words = {'women':16,'woman':16,'eating':15,'putting':14,'lying':13,'girl':12,'selfie':11,'relaxing':10,'jellyfish':9,'cat':8,'super':7,'slow':6,'super':5,'american':4,'potrait':3,'pregnant':2,'couple':1}

Cleaning the text data

In [152]:
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/arunabellgutteramesh/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Cleaning involves:
<ul>
    <li>Removing the special charecters and retaining only text data in lowercase</li>
    <li>Removing Stopwords</li>
    <li>Storing the cleaned text data into a list named corpus</li>
    <li>Creating Bag of Words model</li>
</ul>

If a word from the caption exists in the dictionary of "weights_to_certain_words", then such respective word's weight is cumulatively addded for the given caption.  Else, the weight is retained to be zero.  This forms a separate feature.

In [187]:
corpus = []
weights = []
for i in range(0, 6000):
    local_weight = 0
    caption = re.sub('[^a-zA-Z]', ' ', df['Captions'][i])
    caption = caption.lower()
    caption = caption.split()
    caption = [word for word in caption if not word in set(stopwords.words('english'))]
    for word in caption:
        if(word in set(weights_to_certain_words.keys())):
                local_weight = local_weight + weights_to_certain_words[word]
    weights.append(local_weight)
    caption = ' '.join(caption)
    corpus.append(caption)

In [205]:
weights_df = pd.DataFrame(np.array(weights).reshape(6000,1))

Since TfIdf Vectorizer outperformed Count Vectorizer, using TfIdf Vectorizer for further evaluations.

Using TfIdf value of each word occurance amongst the bag of words.

In [208]:
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer()

In [209]:
captions_array = tf.fit_transform(corpus).toarray()

In [211]:
merged_array = np.concatenate((captions_array, weights_df), axis=1)

**Building Models**

Defining the method for Spearmann's correlation coefficient

In [215]:
def Get_score(Y_pred,Y_true):
    '''Calculate the Spearmann"s correlation coefficient'''
    Y_pred = np.squeeze(Y_pred)
    Y_true = np.squeeze(Y_true)
    if Y_pred.shape != Y_true.shape:
        print('Input shapes don\'t match!')
    else:
        if len(Y_pred.shape) == 1:
            Res = pd.DataFrame({'Y_true':Y_true,'Y_pred':Y_pred})
            score_mat = Res[['Y_true','Y_pred']].corr(method='spearman',min_periods=1)
            print('The Spearman\'s correlation coefficient is: %.3f' % score_mat.iloc[1][0])
        else:
            for ii in range(Y_pred.shape[1]):
                Get_score(Y_pred[:,ii],Y_true[:,ii])

Inputting the data into X and Y variables

In [219]:
X = merged_array
Y = df.iloc[:, 1:3].values

Splitting the dataset into the Training set and Test set

In [220]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.20)

**a) Using Linear Regression Model**

In [221]:
from sklearn.linear_model import LinearRegression
regressor1 = LinearRegression()
regressor1.fit(X_train, Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [222]:
Y_pred1 = regressor1.predict(X_test)
Get_score(Y_pred1, Y_test)

The Spearman's correlation coefficient is: 0.136
The Spearman's correlation coefficient is: -0.006


**b) Using Decision Tree Model**

In [223]:
from sklearn.tree import DecisionTreeRegressor
regressor2 = DecisionTreeRegressor()
regressor2.fit(X_train, Y_train)

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')

In [224]:
Y_pred2 = regressor2.predict(X_test)
Get_score(Y_pred2, Y_test)

The Spearman's correlation coefficient is: 0.280
The Spearman's correlation coefficient is: 0.097


**c) Using Random Forest Regression Model**

In [225]:
from sklearn.ensemble import RandomForestRegressor
regressor3 = RandomForestRegressor()
regressor3.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [226]:
Y_pred3 = regressor3.predict(X_test)
Get_score(Y_pred3, Y_test)

The Spearman's correlation coefficient is: 0.370
The Spearman's correlation coefficient is: 0.152


In [227]:
from sklearn.ensemble import RandomForestRegressor
regressor3 = RandomForestRegressor(n_estimators=100)
regressor3.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [229]:
Y_pred3 = regressor3.predict(X_test)
Get_score(Y_pred3, Y_test)

The Spearman's correlation coefficient is: 0.415
The Spearman's correlation coefficient is: 0.178


**Conclusion**

**Captions with weights feature outperformed simple captions feature**

<font style="font-size:18px;text-align:center;">**5) Video & Semantic feature: C3D & Captions features**</font>

Importing the Ground truth Dataset

In [346]:
df = pd.read_csv('dev-set_ground-truth.csv')

Removing the annotations columns as they will not be used for prediction

In [347]:
df = df.drop(['nb_short-term_annotations', 'nb_long-term_annotations'], axis=1)

Cleaning the names of the video so that we can match with the C3D features

In [348]:
df['video'] = df['video'].apply(lambda x : x.split('.')[0])

In [349]:
df.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability
0,video3,0.924,0.846
1,video4,0.923,0.667
2,video6,0.863,0.7
3,video8,0.922,0.818
4,video10,0.95,0.9


Importing the captions file

In [350]:
captions_features = pd.read_csv('dev-set_video-captions.txt',delimiter='\t',header= None,names=('video','Captions'))

Cleaning the names of the video

In [351]:
captions_features['video'] = captions_features['video'].apply(lambda x : x.split('.')[0])

In [352]:
captions_features.head()

Unnamed: 0,video,Captions
0,video3,blonde-woman-is-massaged-tilt-down
1,video4,roulette-table-spinning-with-ball-in-closeup-shot
2,video6,khr-gangsters
3,video8,medical-helicopter-hovers-at-airport
4,video10,couple-relaxing-on-picnic-crane-shot


Cleaning the text data

In [353]:
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
#from nltk.stem.porter import PorterStemmer

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/arunabellgutteramesh/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Cleaning involves:
<ul>
    <li>Removing the special charecters and retaining only text data in lowercase</li>
    <li>Removing Stopwords</li>
    <li>Storing the cleaned text data into a list named corpus</li>
    <li>Creating Bag of Words model</li>
</ul>

In [354]:
corpus = []
for i in range(0, 6000):
    caption = re.sub('[^a-zA-Z]', ' ', captions_features['Captions'][i])
    caption = caption.lower()
    caption = caption.split()
    caption = [word for word in caption if not word in set(stopwords.words('english'))]
    caption = ' '.join(caption)
    corpus.append(caption)

In [355]:
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer()

In [356]:
captions_array = tf.fit_transform(corpus).toarray()

Importing the C3D files

In [357]:
C3D_feature_list= []
video_names_list = []
path = 'C3D/*.txt'
for filename in glob.glob('C3D/*.txt'):
    name = ((filename.split('/')[-1]).split('.')[0])
    video_names_list.append(name) 
    with open(filename) as f:
        for line in f:
            C3D_features =[float(item) for item in line.split()]
    C3D_feature_list.append(C3D_features)

In [358]:
C3D_features = pd.DataFrame(np.array(C3D_feature_list).reshape(6000,101))
C3D_features["video"] = video_names_list

In [359]:
C3D_features.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,92,93,94,95,96,97,98,99,100,video
0,0.01016593,0.003529,0.00035963,3.73e-06,7.1e-07,2e-06,1.1e-05,1.24e-06,2e-06,7.41e-06,...,4.5e-07,7.3e-07,7.4e-05,8.93e-06,8e-08,1.515e-05,0.00692074,0.00575775,0.001361,video4960
1,1.1e-07,3e-06,2e-08,2e-08,1e-08,0.99834,0.001243,7e-08,2e-06,6.2e-07,...,2e-08,5e-08,0.0,1.2e-07,3.74e-06,3e-08,6e-08,3.3e-07,2e-06,video1818
2,0.00509931,0.003971,0.04524705,0.01191236,0.00047978,0.001651,2.8e-05,0.00520616,0.001073,0.00030112,...,0.00505672,0.0003543,0.015899,0.00049365,1.232e-05,0.00119673,0.00360873,0.00130745,0.015382,video6811
3,0.00072623,0.000772,0.00086538,8.51e-06,1.606e-05,4.6e-05,0.000535,7.48e-06,0.000392,1.807e-05,...,0.00019035,3.845e-05,5e-05,0.00044901,1.281e-05,0.00065495,5.448e-05,0.00074431,0.042389,video3969
4,0.0002519,0.002037,8.34e-06,2.389e-05,0.00016576,2e-06,1e-06,3.01e-06,2e-05,1.483e-05,...,1.52e-05,7.5e-07,7.5e-05,1.07e-05,7.6e-07,1.825e-05,6.562e-05,0.00031318,2e-06,video993


Merges C3D features and ground truth and sets in an order

In [360]:
pdf = pd.merge(df, C3D_features , on='video')

In [361]:
pdf.head()

Unnamed: 0,video,short-term_memorability,long-term_memorability,0,1,2,3,4,5,6,...,91,92,93,94,95,96,97,98,99,100
0,video3,0.924,0.846,0.020249,0.001578,0.000826,0.000945,6.3e-05,3e-06,0.001162,...,0.001042,0.000161,0.000257,0.046617,0.000156,6e-06,0.000537,0.000339,0.008437,0.00047
1,video4,0.923,0.667,0.000118,0.000891,0.000188,4.5e-05,6.3e-05,2e-06,0.000641,...,0.000582,0.000393,0.000864,0.000947,0.000136,7e-06,0.00036,0.000159,0.001025,2e-05
2,video6,0.863,0.7,0.011765,0.000746,0.000784,1.3e-05,7e-06,2.8e-05,4.1e-05,...,0.000224,3e-06,3.1e-05,0.002538,0.000104,5e-06,6.4e-05,0.00538,0.001027,0.001384
3,video8,0.922,0.818,0.000223,0.000165,7e-06,1.6e-05,5e-06,1.4e-05,0.000154,...,4.6e-05,9e-06,2.3e-05,5.3e-05,4.8e-05,1.9e-05,1e-06,4e-06,0.00038,2.9e-05
4,video10,0.95,0.9,9e-05,0.000615,0.003436,0.001281,0.003551,0.000313,4.2e-05,...,3.7e-05,0.00069,0.000171,0.000231,0.000637,4e-05,6.1e-05,7.5e-05,2e-06,0.001323


In [362]:
C3D_array = pdf.iloc[:,3:104].values

In [363]:
C3D_array.shape

(6000, 101)

Finally, merged C3D features, ground truth and captions

In [364]:
merged_array = np.concatenate((captions_array, C3D_array), axis=1)

**Building Models**

Defining the method for Spearmann's correlation coefficient

In [365]:
def Get_score(Y_pred,Y_true):
    '''Calculate the Spearmann"s correlation coefficient'''
    Y_pred = np.squeeze(Y_pred)
    Y_true = np.squeeze(Y_true)
    if Y_pred.shape != Y_true.shape:
        print('Input shapes don\'t match!')
    else:
        if len(Y_pred.shape) == 1:
            Res = pd.DataFrame({'Y_true':Y_true,'Y_pred':Y_pred})
            score_mat = Res[['Y_true','Y_pred']].corr(method='spearman',min_periods=1)
            print('The Spearman\'s correlation coefficient is: %.3f' % score_mat.iloc[1][0])
        else:
            for ii in range(Y_pred.shape[1]):
                Get_score(Y_pred[:,ii],Y_true[:,ii])

Inputting the data into X and Y variables

In [366]:
X = merged_array

In [367]:
Y = df.iloc[:, 1:3].values

Splitting the dataset into the Training set and Test set

In [368]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.30)

**Using Random Forest Regression Model**

In [369]:
from sklearn.ensemble import RandomForestRegressor
regressor3 = RandomForestRegressor(n_estimators=100)
regressor3.fit(X_train, Y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [370]:
Y_pred3 = regressor3.predict(X_test)
Get_score(Y_pred3, Y_test)

The Spearman's correlation coefficient is: 0.369
The Spearman's correlation coefficient is: 0.158


**Conclusion**

**Combination of C3D and captions yields better results than simple C3D features.  But it doesn't perform better than captions alone or captions with weights**