For TWEC we need to use a very specific (https://github.com/valedica/gensim.git) version of Word2Vec, for later updates of the gensim library have removed the model parameter "learn_hidden (bool)", so it would not be as simple or direct to keep the weights of the hidden layer (our atemporal "compass" embeddings) frozen as we train the slices.

In [1]:
import sys
sys.path.append('../twec')
from twec import TWEC
from gensim.models import Word2Vec

In [2]:
aligner = TWEC(size=30, siter=10, diter=10, workers=4)

# train the compass: the text should be the concatenation of the text from the slices
aligner.train_compass("training/compass.txt", overwrite=True) # keep an eye on the overwrite behaviour

Training the compass.
Compass will be overwritten after training


In [3]:
# now you can train slices and they will be already aligned
# these are gensim word2vec objects
slice_one = aligner.train_slice("training/arxiv_9.txt", save=True)
slice_two = aligner.train_slice("training/arxiv_14.txt", save=True)

Training temporal embeddings: slice training/arxiv_9.txt.
Initializing temporal embeddings from the atemporal compass.
Training temporal embeddings: slice training/arxiv_14.txt.
Initializing temporal embeddings from the atemporal compass.


In [4]:
compass = Word2Vec.load("model/compass.model")
model1 = Word2Vec.load("model/arxiv_9.model")
model2 = Word2Vec.load("model/arxiv_14.model")

In [5]:
print(model1.wv.vocab['i'])
print(model2.wv.vocab['i'])
print(compass.wv.vocab['i'])

Vocab(count:213, index:34, sample_int:4294967296)
Vocab(count:1371, index:26, sample_int:4294967296)
Vocab(count:1584, index:26, sample_int:4294967296)


Here we can see the hidden embeddings are frozen and the word vectors are updated.

In [7]:
print(model1.trainables.syn1neg[34])
print(model2.trainables.syn1neg[26])
print(compass.trainables.syn1neg[26])

[-4.6020630e-01 -1.0139167e-01 -1.2678191e-01  7.9771250e-01
 -6.3616329e-01 -9.7823456e-02  8.2215977e-01  1.1889621e+00
  9.1399753e-04 -2.1845105e-01  4.9245119e-01  1.5929724e-01
 -1.3990672e-01 -1.7479196e-01  6.2276489e-01 -4.3548945e-01
  6.2521026e-02  1.8913534e-01 -4.2202330e-01 -1.3638224e-01
  1.6924796e-01 -1.0052008e-01 -7.3055524e-01 -7.4516141e-01
  5.1119733e-01 -2.5474387e-01  5.5112261e-01 -7.8023352e-02
  5.1973049e-02 -1.8943286e-01]
[-4.6020630e-01 -1.0139167e-01 -1.2678191e-01  7.9771250e-01
 -6.3616329e-01 -9.7823456e-02  8.2215977e-01  1.1889621e+00
  9.1399753e-04 -2.1845105e-01  4.9245119e-01  1.5929724e-01
 -1.3990672e-01 -1.7479196e-01  6.2276489e-01 -4.3548945e-01
  6.2521026e-02  1.8913534e-01 -4.2202330e-01 -1.3638224e-01
  1.6924796e-01 -1.0052008e-01 -7.3055524e-01 -7.4516141e-01
  5.1119733e-01 -2.5474387e-01  5.5112261e-01 -7.8023352e-02
  5.1973049e-02 -1.8943286e-01]
[-4.6020630e-01 -1.0139167e-01 -1.2678191e-01  7.9771250e-01
 -6.3616329e-01 -9.78

In [11]:
print(model1.wv.vectors[34])
print(model2.wv.vectors[26])
print(compass.wv.vectors[26])

[ 1.0844331  -0.16820982  0.2617854   1.6284348   0.4933021  -1.5635933
  1.2304232   2.228016   -0.9040732  -0.6613041   0.6190973   1.061683
 -0.875423   -0.79606354 -1.2418106   0.49976146  1.2265491   0.10147548
  1.4628812   1.8915395   1.5940462   1.6657975   0.33031133 -1.9388858
 -0.5735354  -0.97300076  1.7425116  -0.04958301  0.88604975 -4.511893  ]
[ 1.7116133  -0.9994501   0.87861973  0.60883236 -0.3904306  -1.6749368
  2.2780495   2.274824   -1.4698108  -0.76882994  0.7390684   1.4430445
 -1.2902341  -1.5848297  -0.2060239   1.1574911   1.4623234   1.136705
  1.396191    1.4307218   0.6584964   1.0620544  -0.15717137 -1.9170482
 -0.9684632  -1.4746758   2.1567512   0.7164301   1.1155984  -4.0872602 ]
[ 1.397193   -0.5185469   0.7240241   0.86851764 -0.06375024 -1.8007306
  1.9351279   2.1237721  -1.4743081  -0.72067106  0.8976281   1.2958043
 -1.1209677  -1.1460305  -0.5169425   0.9858284   1.7558266   0.83620036
  1.4640749   1.0972955   0.82376605  1.1139374  -0.08424934