For TWEC we need to use a very specific (https://github.com/valedica/gensim.git) version of Word2Vec, for later updates of the gensim library have removed the model parameter "learn_hidden (bool)", so it would not be as simple or direct to keep the weights of the hidden layer (our atemporal "compass" embeddings) frozen as we train the slices.

In [1]:
import sys
sys.path.append('../twec')
from twec import TWEC
from gensim.models import Word2Vec

In [2]:
aligner = TWEC(size=30, siter=10, diter=10, workers=4)

# train the compass: the text should be the concatenation of the text from the slices
aligner.train_compass("training/compass.txt", overwrite=True) # keep an eye on the overwrite behaviour

Training the compass.
Compass will be overwritten after training


In [3]:
# now you can train slices and they will be already aligned
# these are gensim word2vec objects
slice_one = aligner.train_slice("training/arxiv_9.txt", save=True)
slice_two = aligner.train_slice("training/arxiv_14.txt", save=True)

Training temporal embeddings: slice training/arxiv_9.txt.
Initializing temporal embeddings from the atemporal compass.
Training temporal embeddings: slice training/arxiv_14.txt.
Initializing temporal embeddings from the atemporal compass.


In [4]:
compass = Word2Vec.load("model/compass.model")
model1 = Word2Vec.load("model/arxiv_9.model")
model2 = Word2Vec.load("model/arxiv_14.model")

In [5]:
print(model1.wv.vocab['i'])
print(model2.wv.vocab['i'])
print(compass.wv.vocab['i'])

Vocab(count:213, index:34, sample_int:4294967296)
Vocab(count:1371, index:26, sample_int:4294967296)
Vocab(count:1584, index:26, sample_int:4294967296)


Here we can see the hidden embeddings are frozen and the word vectors are updated.

In [6]:
print(model1.trainables.syn1neg[34])
print(model2.trainables.syn1neg[26])
print(compass.trainables.syn1neg[26])

[-0.6092696  -0.1193039   0.31543    -0.09908039 -0.16217746  0.32050592
 -0.339508   -0.19728503  0.19355364  0.79276305 -0.22316808  0.41986364
 -0.07552402 -0.07269513 -0.70865303  0.25023764  0.0485032  -0.5808323
 -0.5504872  -0.1951308   0.10067026 -0.11043486  0.7298163   0.38919333
  0.8057359  -0.05181003  0.10824297 -0.1369167   0.51104206 -0.24305834]
[-0.6092696  -0.1193039   0.31543    -0.09908039 -0.16217746  0.32050592
 -0.339508   -0.19728503  0.19355364  0.79276305 -0.22316808  0.41986364
 -0.07552402 -0.07269513 -0.70865303  0.25023764  0.0485032  -0.5808323
 -0.5504872  -0.1951308   0.10067026 -0.11043486  0.7298163   0.38919333
  0.8057359  -0.05181003  0.10824297 -0.1369167   0.51104206 -0.24305834]
[-0.6092696  -0.1193039   0.31543    -0.09908039 -0.16217746  0.32050592
 -0.339508   -0.19728503  0.19355364  0.79276305 -0.22316808  0.41986364
 -0.07552402 -0.07269513 -0.70865303  0.25023764  0.0485032  -0.5808323
 -0.5504872  -0.1951308   0.10067026 -0.11043486  0.

In [7]:
print(model1.wv.vectors[34])
print(model2.wv.vectors[26])
print(compass.wv.vectors[26])

[-1.8303326   0.960173    0.8533724  -1.1779368  -0.0146331   1.5683289
 -0.6500142   1.8434411   0.5951439   0.5357325  -1.7522988   1.7497929
  2.462072   -1.2949187  -0.44782975  0.15375601 -0.09395226 -2.0915394
 -0.3335416  -1.3523211   0.7639021   0.49989834  1.7284914   0.8403828
  0.90838116 -0.9943843   0.7300248  -0.30375084 -2.2289257   0.9640173 ]
[-2.1177497   1.1407727   0.31315944 -0.99036926  0.15467346  2.978759
  0.15068464  1.2055897   0.66350013  1.5574903  -1.9094104   2.3935375
  1.584484    0.18266088 -1.5866187  -1.2914145  -0.9644868  -1.2309283
 -0.3670567  -1.4482507   0.12896751 -0.19316551  2.4483917   0.2885205
  1.387664   -1.0024163   0.8901706   0.14246742 -2.347861    0.2977486 ]
[-2.0718882   1.0770032   0.27714574 -1.1934673   0.06192912  2.3588622
  0.10399392  1.3670722   1.0115632   1.3740172  -1.5741489   2.2163308
  2.093579   -0.33118257 -0.7855288  -0.8735054  -0.6658289  -1.4344666
 -0.34792137 -1.3833925   0.68499106 -0.11436374  2.67084    