# **Predicting Media Memorability: MediaEval¶**

To Do: Predict media memorability scores for 2000 videos based on features extracted.
1. Short-term Memorability Score.
2. Long-term Memorability Score.
---
This notebook is organized as follows:

SECTION 1   : Laoding  and Exploring Features.

SECTION 2   : Data Preprocessing

SECTION 3   : Experiments with different Machine Learning models and features/
combinations

Vectorizing Data
*   Section 3.1: Count Vectorization

  *   Section 3.1.1: Random Forest with Captions
  *   Section 3.1.2: Support Vector with Caption

*   Section 3.2: Tfdf Vectorization

  *   Section 3.1.1: Random Forest with Captions
  *   Section 3.1.2: Support Vector Machine with Captions

SECTION 4   : Selecting the best model and predicting the memorability scores on the Test-Set

*   Section 4.1: Training full 6000 Dev-Set
*   Section 4.2: Importing Test Datasets and Captions
*   Section 4.3: Predicting the Scores and Exporting the Results








Importing Libraries


In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import seaborn as sns

In [127]:
from google.colab import drive
import os
drive.mount('/content/drive/')
os.chdir('/content/drive/My Drive/CA684_Assignment')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


**SECTION 1 : Laoding and Exploring Features.**

In [0]:
# examine ground truth
groundTruth_file = './Dev-set/Ground-truth/ground-truth.csv'

In [129]:
groundTruth = pd.read_csv(groundTruth_file)
groundTruth.head()

Unnamed: 0,video,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations
0,video3.webm,0.924,34,0.846,13
1,video4.webm,0.923,33,0.667,12
2,video6.webm,0.863,33,0.7,10
3,video8.webm,0.922,33,0.818,11
4,video10.webm,0.95,34,0.9,10


In [130]:
videoAndCaptions = list(open('./Dev-set/Captions/dev-set_video-captions.txt', 'r'))
videoAndCaptions = [i.split('\t') for i in videoAndCaptions]
videoAndCaptions = [[a, b.strip()] for a, b in videoAndCaptions]
videoAndCaptions = pd.DataFrame(videoAndCaptions, columns=['video', 'caption'])
videoAndCaptions.head(5)

Unnamed: 0,video,caption
0,video3.webm,blonde-woman-is-massaged-tilt-down
1,video4.webm,roulette-table-spinning-with-ball-in-closeup-shot
2,video6.webm,khr-gangsters
3,video8.webm,medical-helicopter-hovers-at-airport
4,video10.webm,couple-relaxing-on-picnic-crane-shot


In [132]:
videoAndCaptions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6000 entries, 0 to 5999
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   video    6000 non-null   object
 1   caption  6000 non-null   object
dtypes: object(2)
memory usage: 93.9+ KB


In [133]:

groundTruth_captions = videoAndCaptions.merge(groundTruth, left_on='video', right_on='video')
groundTruth_captions.head(5)



Unnamed: 0,video,caption,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations
0,video3.webm,blonde-woman-is-massaged-tilt-down,0.924,34,0.846,13
1,video4.webm,roulette-table-spinning-with-ball-in-closeup-shot,0.923,33,0.667,12
2,video6.webm,khr-gangsters,0.863,33,0.7,10
3,video8.webm,medical-helicopter-hovers-at-airport,0.922,33,0.818,11
4,video10.webm,couple-relaxing-on-picnic-crane-shot,0.95,34,0.9,10


In [0]:
#We can see that text annotations are sentences with dashes instead of spaces.
#Lets see what are the most memorable and least memorable captions
#To do this it will be useful to merge this dataframe with the ground truth.

In [0]:
groundTruth_captions = videoAndCaptions.merge(groundTruth, left_on='video', right_on='video')
groundTruth_captions.head(5)

Unnamed: 0,video,caption,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations
0,video3.webm,blonde-woman-is-massaged-tilt-down,0.924,34,0.846,13
1,video4.webm,roulette-table-spinning-with-ball-in-closeup-shot,0.923,33,0.667,12
2,video6.webm,khr-gangsters,0.863,33,0.7,10
3,video8.webm,medical-helicopter-hovers-at-airport,0.922,33,0.818,11
4,video10.webm,couple-relaxing-on-picnic-crane-shot,0.95,34,0.9,10


In [0]:
### Most Memorable Short Term Videos
topShortTerm = (groundTruth_captions
                           .sort_values('short-term_memorability',
                                        ascending=False)['caption'])

### Most Memorable Long Term Captions
topLongTerm = (groundTruth_captions
                          .sort_values('long-term_memorability',
                                       ascending=False)['caption'])

### Most
for i in topShortTerm[:5]:
    print(i)


camera-moves-in-on-beared-man-with-shovel-taking-a-breather-with-truck-in-b
mather-and-daughter-enjoying-a-movie-on-tablet
happy-stylish-elegant-young-couple-welcoming-in-the-new-year-with-sparklers-looking-at-the-camera-with-warm-friendly-smiles-against-winkling-party-lights
head-of-big-yellow-eel
funny-little-boy-sitting-at-desk-eating-apple-and-drawing


In [0]:
### Least 
# use fancy indexing to reverse array
for i in (topShortTerm)[::-1][:5]:
    print(i)

timelapse-of-snow-mountains
snow-capped-mountain-at-dusk
grey-canyons-and-valleys
grassy-field-with-flowers-and-trees
dark-sea-with-bright-sky


In [0]:
### Most
for i in topLongTerm[:5]:
    print(i)

nurses-moving-patient-from-one-to-another
cute-blond-woman-in-lifestyle-scene-in-white-luxury-bedroom
female-doctor-showing-patients-x-ray-of-chest-using-tablet
people-walking-down-snowy-street
kitten-playing-with-string




**Section** **2** **Data** **Preprocessing**

In [0]:
from string import punctuation
!pip install pyprind
import pyprind
from collections import Counter
from nltk.corpus import stopwords
stop_words = set(stopwords.words("english"))

from keras.preprocessing.text import Tokenizer

counts = Counter() # empty counter...
# Counter() is a dict subclass for counting hashable objects. It is an unordered collection where elements
#are stored as dictionary keys and their counts are stored as dictionary values...


# setup prograss tracker
pbar = pyprind.ProgBar(len(groundTruth_captions['caption']), title='Counting word occurrences')


for i, cap in enumerate(groundTruth_captions['caption']):
    # replace punctuations with space
    # convert words to lower case 
    text = ''.join([c if c not in punctuation else ' ' for c in cap]).lower()
    groundTruth_captions.loc[i,'caption'] = text
    # .loc[,]: Access a group of rows and columns by label(s)
    
    #removing stopwords
    rmv_stopwords= ' '.join([word for word in text.split() if word not in stop_words])
    groundTruth_captions.loc[i,'caption'] = rmv_stopwords #updating the original captions 
    
    pbar.update()
    counts.update(text.split())



Counting word occurrences
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:07


In [0]:
print(counts) # No.of occurrence



In [0]:
len(counts)

5191

In [0]:
videoAndCaptions.caption.values.shape

(6000,)

In [0]:
groundTruth_captions['body_text_clean']=groundTruth_captions['caption'].apply(lambda x: remove_punct(x))
groundTruth_captions.head()

Unnamed: 0,video,caption,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations,body_text_clean
0,video3.webm,blonde woman massaged tilt,0.924,34,0.846,13,blonde woman massaged tilt
1,video4.webm,roulette table spinning ball closeup shot,0.923,33,0.667,12,roulette table spinning ball closeup shot
2,video6.webm,khr gangsters,0.863,33,0.7,10,khr gangsters
3,video8.webm,medical helicopter hovers airport,0.922,33,0.818,11,medical helicopter hovers airport
4,video10.webm,couple relaxing picnic crane shot,0.95,34,0.9,10,couple relaxing picnic crane shot


In [134]:
import re
#function to tokenise
def tokenize(text):
  tokens=re.split('\W+',text)
  return tokens
groundTruth_captions['tokenized_body']=groundTruth_captions['caption'].apply(lambda x: tokenize(x.lower()))
groundTruth_captions.head()

Unnamed: 0,video,caption,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations,tokenized_body
0,video3.webm,blonde-woman-is-massaged-tilt-down,0.924,34,0.846,13,"[blonde, woman, is, massaged, tilt, down]"
1,video4.webm,roulette-table-spinning-with-ball-in-closeup-shot,0.923,33,0.667,12,"[roulette, table, spinning, with, ball, in, cl..."
2,video6.webm,khr-gangsters,0.863,33,0.7,10,"[khr, gangsters]"
3,video8.webm,medical-helicopter-hovers-at-airport,0.922,33,0.818,11,"[medical, helicopter, hovers, at, airport]"
4,video10.webm,couple-relaxing-on-picnic-crane-shot,0.95,34,0.9,10,"[couple, relaxing, on, picnic, crane, shot]"


**SECTION 3 : Experiments with different Machine Learning models and features/ combinations**

Function to calculate Spearman Coeffcient Scores


In [0]:
def Calculate_score(Y_prd,Y_true):
    '''Calculate the Spearmann"s correlation coefficient'''
    Y_prd = np.squeeze(Y_prd)
    Y_true = np.squeeze(Y_true)
    if Y_prd.shape != Y_true.shape:
        print('Input shapes don\'t match!')
    else:
        if len(Y_prd.shape) == 1:
            Res = pd.DataFrame({'Y_true':Y_true,'Y_prd':Y_prd})
            score_mat = Res[['Y_true','Y_prd']].corr(method='spearman',min_periods=1)
            print('The Spearman\'s correlation coefficient is: %.3f' % score_mat.iloc[1][0])
        else:
            for ii in range(Y_prd.shape[1]):
                Calculate_score(Y_prd[:,ii],Y_true[:,ii])

**Vectorizing Data:**

*   **Section 3.1 Count Vectorization**


In [0]:
from sklearn.feature_extraction.text import CountVectorizer
count_vect=CountVectorizer(analyzer = "word",max_features=3187) 

In [0]:
countsVect=count_vect.fit_transform(groundTruth_captions['caption'])
print(countsVect.shape)


(6000, 3187)


In [0]:
countsVect=count_vect.fit_transform(groundTruth_captions['caption'])

In [0]:
countsVect_seq=countsVect.toarray()
countsVect_seq

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

**Section 3.1.1: Random Forest with Captions**




In [0]:
X = countsVect_seq
y = groundTruth_captions[['short-term_memorability','long-term_memorability']].values

In [0]:
# Splitting the dataset into the Training set and Test set
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)


In [0]:
print('X_train ', X_train.shape)
print('X_test  ', X_test.shape)
print('Y_train ', y_train.shape)
print('Y_test  ', y_test.shape)

X_train  (4800, 3187)
X_test   (1200, 3187)
Y_train  (4800, 2)
Y_test   (1200, 2)


In [0]:
from sklearn.ensemble import RandomForestRegressor
captionsRandom = RandomForestRegressor(n_estimators=100,random_state=45)

In [0]:
from sklearn.ensemble import RandomForestRegressor
captionsRandom = RandomForestRegressor(n_estimators=100,random_state=45)


In [0]:
captionsRandom.fit(X_train,y_train)

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=100, n_jobs=None, oob_score=False,
                      random_state=45, verbose=0, warm_start=False)

In [0]:
captionsPred = captionsRandom.predict(X_test)

In [0]:
Calculate_score(captionsPred, y_test)


The Spearman's correlation coefficient is: 0.411
The Spearman's correlation coefficient is: 0.172


**Section 3.1.2: Support Vector Machine with Caption**


In [0]:
svrX = countsVect_seq
svrY_short = groundTruth_captions[['short-term_memorability']].values
svrY_long = groundTruth_captions[['long-term_memorability']].values

In [0]:
# Splitting the dataset into the Training set and Test set
shortX_train,shortX_test,shortY_train,shortY_test = train_test_split(svrX,svrY_short,test_size=0.2,random_state=40)
longX_train,longX_test,longY_train,longY_test = train_test_split(svrX,svrY_long,test_size=0.2,random_state=40)

In [0]:
from sklearn.preprocessing import StandardScaler
shortX = StandardScaler()
shortY = StandardScaler()
shortX_train = shortX.fit_transform(shortX_train)
shortY_train = shortY.fit_transform(shortY_train)
longX = StandardScaler()
longY = StandardScaler()
longX_train = longX.fit_transform(longX_train)
longY_train = longY.fit_transform(longY_train)

In [0]:
from sklearn.svm import SVR
short_regressor = SVR(kernel = 'rbf')
long_regressor = SVR(kernel = 'rbf')
short_regressor.fit(shortX_train, shortY_train)
long_regressor.fit(longX_train,longY_train)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [0]:
short_pred = short_regressor.predict(shortX_test)
short_pred = shortY.inverse_transform(short_pred)
long_pred = long_regressor.predict(longX_test)
long_pred = longY.inverse_transform(long_pred)

In [0]:
Calculate_score(short_pred, shortY_test)
Calculate_score(long_pred, longY_test)

The Spearman's correlation coefficient is: 0.340
The Spearman's correlation coefficient is: 0.164


**Section 3.2: Tfdf Vectorization**

In [0]:
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer = "word",max_features=3112)

In [0]:
captionsArray = tf.fit_transform(groundTruth_captions['caption']).toarray()

In [0]:
captionsArray.shape


(6000, 3112)

**Section 3.2.1 Random Forest with Captions**

In [0]:
X = captionsArray
y = groundTruth_captions[['short-term_memorability','long-term_memorability']].values

In [0]:
# Splitting the dataset into the Training set and Test set
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

In [0]:
print('X_train ', X_train.shape)
print('X_test  ', X_test.shape)
print('Y_train ', y_train.shape)
print('Y_test  ', y_test.shape)

X_train  (4800, 3112)
X_test   (1200, 3112)
Y_train  (4800, 2)
Y_test   (1200, 2)


In [0]:
from sklearn.ensemble import RandomForestRegressor
captionsRandom = RandomForestRegressor(n_estimators=100,random_state=45)

In [114]:
captionsRandom.fit(X_train,y_train)


RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=100, n_jobs=None, oob_score=False,
                      random_state=45, verbose=0, warm_start=False)

In [0]:
captionsPred = captionsRandom.predict(X_test)


In [0]:
Calculate_score(captionsPred, y_test)


The Spearman's correlation coefficient is: 0.433
The Spearman's correlation coefficient is: 0.157


**Section 3.2.2 Support Vector Machine with Captions**

In [0]:
svrX = captionsArray
svrY_short = groundTruth_captions[['short-term_memorability']].values
svrY_long = groundTruth_captions[['long-term_memorability']].values

In [0]:
# Splitting the dataset into the Training set and Test set
shortX_train,shortX_test,shortY_train,shortY_test = train_test_split(svrX,svrY_short,test_size=0.2,random_state=40)
longX_train,longX_test,longY_train,longY_test = train_test_split(svrX,svrY_long,test_size=0.2,random_state=40)

In [0]:
from sklearn.preprocessing import StandardScaler
shortX = StandardScaler()
shortY = StandardScaler()
shortX_train = shortX.fit_transform(shortX_train)
shortY_train = shortY.fit_transform(shortY_train)
longX = StandardScaler()
longY = StandardScaler()
longX_train = longX.fit_transform(longX_train)
longY_train = longY.fit_transform(longY_train)

In [0]:
from sklearn.svm import SVR
short_regressor = SVR(kernel = 'rbf')
long_regressor = SVR(kernel = 'rbf')
short_regressor.fit(shortX_train, shortY_train)
long_regressor.fit(longX_train,longY_train)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [0]:
short_pred = short_regressor.predict(shortX_test)
short_pred = shortY.inverse_transform(short_pred)
long_pred = long_regressor.predict(longX_test)
long_pred = .inverse_transform(long_pred)

In [0]:
Calculate_score(short_pred, shortY_test)
Calculate_score(long_pred, longY_test)

The Spearman's correlation coefficient is: 0.364
The Spearman's correlation coefficient is: 0.169


**SECTION 4   : Selecting the best model and predicting the memorability scores on the Test-Set**

The best model is Random forest when implemented after Tfidf Vectorization.



In [0]:
  Calculate_score(captionsPred, y_test)

The Spearman's correlation coefficient is: 0.433
The Spearman's correlation coefficient is: 0.157



**Section 4.1: Training full 6000 Dev-Set**

In [0]:
X = captionsArray
y = groundTruth_captions[['short-term_memorability','long-term_memorability']].values

In [0]:

print(f'X: ({len(X)})')
print(f'y:{y.shape}')

X: (6000)
y:(6000, 2)


In [0]:
from sklearn.ensemble import RandomForestRegressor
captionsRandom = RandomForestRegressor(n_estimators=100,random_state=45)

In [0]:
captionsRandom.fit(X,y)

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=100, n_jobs=None, oob_score=False,
                      random_state=45, verbose=0, warm_start=False)

**  Section 4.2: Importing Test Datasets and Captions**

In [0]:
#loading test captions
videoAndCaptions_test = list(open('./Test-set/Captions_test/test-set-1_video-captions.txt', 'r'))
videoAndCaptions_test = [i.split('\t') for i in videoAndCaptions_test] 
videoAndCaptions_test = pd.DataFrame(videoAndCaptions_test, columns=['video', 'caption'])
videoAndCaptions_test.head(5)


Unnamed: 0,video,caption
0,video7494.webm,green-jeep-struggling-to-drive-over-huge-rocks\n
1,video7495.webm,hiking-woman-tourist-is-walking-forward-in-mou...
2,video7496.webm,close-up-of-african-american-doctors-hands-usi...
3,video7497.webm,slow-motion-of-a-man-using-treadmill-in-the-gy...
4,video7498.webm,slow-motion-of-photographer-in-national-park\n


In [0]:
groundTruth_file_test = './Test-set/Ground-truth_test/groundTruth_template.csv'

In [0]:
groundTruth_test = pd.read_csv(groundTruth_file_test)
groundTruth_test.head()

Unnamed: 0,video,short-term_memorability,nb_short-term_annotations,long-term_memorability,nb_long-term_annotations
0,7494,,33,,12
1,7495,,34,,10
2,7496,,32,,13
3,7497,,33,,10
4,7498,,33,,10


In [0]:
videoAndCaptions_test['body_text_clean']=videoAndCaptions_test['caption'].apply(lambda x: remove_punct(x))
videoAndCaptions_test.head()

Unnamed: 0,video,caption,body_text_clean
0,video7494.webm,green-jeep-struggling-to-drive-over-huge-rocks\n,greenjeepstrugglingtodriveoverhugerocks\n
1,video7495.webm,hiking-woman-tourist-is-walking-forward-in-mou...,hikingwomantouristiswalkingforwardinmountainso...
2,video7496.webm,close-up-of-african-american-doctors-hands-usi...,closeupofafricanamericandoctorshandsusingasphy...
3,video7497.webm,slow-motion-of-a-man-using-treadmill-in-the-gy...,slowmotionofamanusingtreadmillinthegymregularp...
4,video7498.webm,slow-motion-of-photographer-in-national-park\n,slowmotionofphotographerinnationalpark\n


In [0]:
counts = Counter() # empty counter...
# Counter() is a dict subclass for counting hashable objects. It is an unordered collection where elements
#are stored as dictionary keys and their counts are stored as dictionary values...


# setup prograss tracker
pbar = pyprind.ProgBar(len(videoAndCaptions_test['caption']), title='Counting word occurrences')


for i, cap in enumerate(videoAndCaptions_test['caption']):
    # replace punctuations with space
    # convert words to lower case 
    text = ''.join([c if c not in punctuation else ' ' for c in cap]).lower()
    videoAndCaptions_test.loc[i,'caption'] = text
    # .loc[,]: Access a group of rows and columns by label(s)
    

    #removing stopwords
    rmv_stopwords= ' '.join([word for word in text.split() if word not in stop_words])
    videoAndCaptions_test.loc[i,'caption'] = rmv_stopwords #updating the original captions 
    
    pbar.update()
    counts.update(text.split())

Counting word occurrences
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00


In [0]:
videoAndCaptions_test.head()

Unnamed: 0,video,caption,body_text_clean
0,video7494.webm,green jeep struggling drive huge rocks,greenjeepstrugglingtodriveoverhugerocks\n
1,video7495.webm,hiking woman tourist walking forward mountains...,hikingwomantouristiswalkingforwardinmountainso...
2,video7496.webm,close african american doctors hands using sph...,closeupofafricanamericandoctorshandsusingasphy...
3,video7497.webm,slow motion man using treadmill gym regular ph...,slowmotionofamanusingtreadmillinthegymregularp...
4,video7498.webm,slow motion photographer national park,slowmotionofphotographerinnationalpark\n


In [0]:
print(f'Test-Captions: {videoAndCaptions_test.shape}')

from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer = "word",max_features=3112)

Test-Captions: (2000, 3)


In [0]:
captionsArray_test = tf.fit_transform(videoAndCaptions_test['caption']).toarray()
type(captionsArray_test)

numpy.ndarray

In [0]:
captionsArray_test

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [115]:
print(f'Development Vocabulary Size   : {len(captionsArray[0])}')
print(f'Testing Vocabulary Size       : {len(captionsArray_test[0])}')


Development Vocabulary Size   : 3112
Testing Vocabulary Size       : 3112


**Section 4.3: Predicting the Scores and Exporting the Results**


In [0]:
test_pred = captionsRandom.predict(captionsArray_test)

In [0]:
pred = pd.DataFrame()

In [119]:
type(test_pred)

numpy.ndarray

In [0]:
pred['short-term'] = test_pred[:,0]

In [0]:
pred['long-term'] = test_pred[:,1]

In [122]:
pred.head()

Unnamed: 0,short-term,long-term
0,0.869751,0.731998
1,0.865063,0.727088
2,0.85356,0.77386
3,0.903008,0.805529
4,0.843034,0.714571


In [123]:
pred.describe()

Unnamed: 0,short-term,long-term
count,2000.0,2000.0
mean,0.853293,0.743177
std,0.025894,0.052874
min,0.725846,0.338266
25%,0.840538,0.715254
50%,0.855052,0.742081
75%,0.869247,0.773967
max,0.935965,0.907855


In [0]:
pred.to_csv("/content/drive/My Drive/Anshika_Sharma_19210993_Results  .csv",index=False)
