# Word2VisualVec for Sentence Representation

This note answers the following two questions:
1. How to load a trained Word2VisualVec model?
2. How to predict visual features from a new sentence?

## 0. Setup

Use the following script to download and extract a Word2VisalVec model trained on flickr30k.
Notice that please refer to [here](https://github.com/danieljf24/w2vv#required-data) to download the dataset 


```shell
ROOTPATH=$HOME/trained_w2vv_model
mkdir -p $ROOTPATH && cd $ROOTPATH

# download and extract the pre-trained model
wget http://lixirong.net/data/w2vv-tmm2018/flickr30k_trained_model.tar.gz
tar zxf flickr30k_trained_model.tar.gz
```

In [1]:
import os
import keras
from basic.common import readPkl
from w2vv_pred import W2VV_MS_pred, pred_mutual_error_ms
from util.text import encode_text
from util.text2vec import get_text_encoder
from util.util import readImgSents 
from simpleknn.bigfile import BigFile
from util.losser import get_losser
from util.evaluation import i2t

Using TensorFlow backend.


## 1. Load a trained Word2Visual model

In [2]:
model_path = os.path.join(os.environ['HOME'],'trained_w2vv_model/flickr30k_trained_model')
abs_model_path = os.path.join(model_path, 'model.json')
weight_path = os.path.join(model_path, 'best_model.h5')
predictor = W2VV_MS_pred(abs_model_path, weight_path)

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
12/11/2019 22:47:22 INFO [w2vv_pred.pyc.W2VV_MS_pred] loaded a trained Word2VisualVec model successfully


## 2. Precision of prediction on test dataset

In [3]:
# setup multi-scale sentence vectorization
trainCollection='flickr30kenctrain'
opt = readPkl(os.path.join(model_path, 'option.pkl'))
rootpath=os.path.join(os.environ['HOME'],'VisualSearch')
rnn_style, bow_style, w2v_style = opt.text_style.strip().split('@')
text_data_path = os.path.join(rootpath, trainCollection, "TextData", "vocabulary", "bow", opt.rnn_vocab)
bow_data_path = os.path.join(rootpath, trainCollection, "TextData", "vocabulary", bow_style, opt.bow_vocab)
w2v_data_path = os.path.join(rootpath, "word2vec", opt.corpus,  opt.word2vec)

text2vec = get_text_encoder(rnn_style)(text_data_path)
bow2vec = get_text_encoder(bow_style)(bow_data_path)
w2v2vec = get_text_encoder(w2v_style)(w2v_data_path)

12/11/2019 22:47:22 INFO [util/text2vec.pyc.Index2Vec] initializing ...
12/11/2019 22:47:22 INFO [util/text2vec.pyc.BoW2VecFilterStop] initializing ...
12/11/2019 22:47:22 INFO [util/text2vec.pyc.BoW2VecFilterStop] 7253 words
12/11/2019 22:47:22 INFO [util/text2vec.pyc.AveWord2VecFilterStop] initializing ...
[BigFile] 1743364x500 instances loaded from /home/oleh/VisualSearch/word2vec/flickr/vec500flickr30m


In [4]:
testCollection='flickr30kenctest'

# img2vec
img_feats_path = os.path.join(rootpath, testCollection, 'FeatureData', opt.img_feature)
img_feats = BigFile(img_feats_path)

# similarity function
losser = get_losser(opt.simi_fun)()

test_sent_file = os.path.join(rootpath, testCollection, 'TextData','%s.caption.txt' % testCollection)
img_list, sents_id, sents = readImgSents(test_sent_file)
all_errors = pred_mutual_error_ms(img_list, sents, predictor, text2vec, bow2vec, w2v2vec, img_feats, losser, opt=opt)


# compute performance
(r1i, r5i, r10i, medri, meanri) = i2t(all_errors, n_caption=opt.n_caption)
print "Image to text: %.1f, %.1f, %.1f, %.1f, %.1f" % (r1i, r5i, r10i, medri, meanri)

[BigFile] 31783x2048 instances loaded from /home/oleh/VisualSearch/flickr30kenctest/FeatureData/pyresnet152-pool5os
embedding all sentences ...

embedding all images ...
matching image and text ...
(5000, 1000)
Image to text: 34.0, 57.3, 66.8, 4.0, 26.7


## 3. Predict visual features of a novel sentence

In [5]:
sent='a dog is playing with a cat'
rnn_vec, bow_w2v_vec = encode_text(opt,text2vec,bow2vec,w2v2vec,sent)
predicted_text_feat = predictor.predict_one(rnn_vec,bow_w2v_vec)
print len(predicted_text_feat)
print predicted_text_feat

2048
[0.22264993 0.2669875  0.5451147  ... 0.9105733  0.87271416 0.5411026 ]
