Author: Bogdan Asztalos (abogdan@caesar.elte.hu)
Based on William Hamilton's code: Original Repository
This repository contains the code with which one can reproduce the results of this paper. The diachronic word embedding can be produced by the execution of full_process.sh
. This pipeline carries out steps shown in panels a-h in the figure below (see the paper for details). To extract information about the subdiffusive behavior of words from the embedding data, scripts in data_analysis
directory can be used. The figures of the paper was made by IPython notebooks in notebooks
directory.
The code is a developed and (in some cases) modifyed version of William Hamilton's code for historical word embeddings.
The structure of the code (in terms of folder organization) is as follows:
cooc_randomization
contains code for randomizing co-occurrence matrices before embedding. (This is not relevant to the paper, but useful for understanding the logic of Word2vec)- data_analysis contains code for extracting and studying information from the embedding data. Subdiffusive behavior can be observed through these information.
googlengram
contains code for pulling and processing historical Google N-Gram Data (Version 2).- `notebooks˙ contains IPython notebooks to reproduce figures from the paper.
representations
contains code that provides a high-level interface to (historical) word vectors and is originally based upon Omer Levy's hyperwords package (https://bitbucket.org/omerlevy/hyperwords).sgns
contains a modified version of Google's word2vec code (https://code.google.com/archive/p/word2vec/).vecanalysis
contains code for evaluating and analyzing historical word vectors.
For the diachronic embedding:
- python 2.7
- numpy: http://numpy.org/install
- sklearn: http://scikit-learn.org/stable/
- cython: http://docs.cython.org/src/quickstart/install.html
- Natural Language Toolkit: http://nltk.org/install.html
For the IPython notebooks:
- python 3.8
- jupyter: http://docs.jupyter.org/en/latest/install.html
- numpy: http://numpy.org/install
- scipy: http://scipy.org/install
- matplotlib: http://matplotlib.org/stable