# Word2VisualVec for Sentence Representation

This note answers the following two questions:
1. How to load a trained Word2VisualVec model?
2. How to predict visual features from a new sentence?

## 0. Setup

Use the following script to download and extract a Word2VisalVec model trained on flickr30k.
Notice that please refer to [here](https://github.com/danieljf24/w2vv#required-data) to download the dataset 


```shell
ROOTPATH=$HOME/trained_w2vv_model
mkdir -p $ROOTPATH && cd $ROOTPATH

# download and extract the pre-trained model
wget http://lixirong.net/data/w2vv-tmm2018/flickr30k_trained_model.tar.gz
tar zxf flickr30k_trained_model.tar.gz
```

In [1]:
import os
import keras
from basic.common import readPkl
from w2vv_pred import W2VV_MS_pred
from util.text import encode_text
from util.text2vec import get_text_encoder

Using TensorFlow backend.


## 1. Load a trained Word2Visual model

In [2]:
model_path = os.path.join(os.environ['HOME'],'trained_w2vv_model/flickr30k_trained_model')
abs_model_path = os.path.join(model_path, 'model.json')
weight_path = os.path.join(model_path, 'best_model.h5')
predictor = W2VV_MS_pred(abs_model_path, weight_path)

05/05/2018 21:41:43 INFO [w2vv_pred.pyc.W2VV_MS_pred] loaded a trained Word2VisualVec model successfully


## 2. Predict visual features of a novel sentence

In [3]:
# setup multi-scale sentence vectorization
trainCollection='flickr30kenctrain'
opt = readPkl(os.path.join(model_path, 'option.pkl'))
rootpath=os.path.join(os.environ['HOME'],'VisualSearch')
rnn_style, bow_style, w2v_style = opt.text_style.strip().split('@')
text_data_path = os.path.join(rootpath, trainCollection, "TextData", "vocabulary", "bow", opt.rnn_vocab)
bow_data_path = os.path.join(rootpath, trainCollection, "TextData", "vocabulary", bow_style, opt.bow_vocab)
w2v_data_path = os.path.join(rootpath, "word2vec", opt.corpus,  opt.word2vec)

text2vec = get_text_encoder(rnn_style)(text_data_path)
bow2vec = get_text_encoder(bow_style)(bow_data_path)
w2v2vec = get_text_encoder(w2v_style)(w2v_data_path)

05/05/2018 21:42:11 INFO [util/text2vec.pyc.Index2Vec] initializing ...
05/05/2018 21:42:11 INFO [util/text2vec.pyc.BoW2VecFilterStop] initializing ...
05/05/2018 21:42:11 INFO [util/text2vec.pyc.BoW2VecFilterStop] 7253 words
05/05/2018 21:42:11 INFO [util/text2vec.pyc.AveWord2VecFilterStop] initializing ...
[BigFile] 1743364x500 instances loaded from /home/daniel/VisualSearch/word2vec/flickr/vec500flickr30m


In [4]:
sent='a dog is playing with a cat'
rnn_vec, bow_w2v_vec = encode_text(opt,text2vec,bow2vec,w2v2vec,sent)
predicted_text_feat = predictor.predict_one(rnn_vec,bow_w2v_vec)
print len(predicted_text_feat)
print predicted_text_feat

2048
[ 0.30943465  0.29305869  0.40463841 ...,  0.90311915  0.62051922
  0.58120167]
