Skip to content

EvanZhuang/dynamic-clustering-of-dynamic-embeddings

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dynamic Bernoulli Embeddings for Language Evolution

This repository contains scripts for running (dynamic) Bernoulli embeddings with dynamic clustering on text data. They have been run and tested on Linux.

To execute, go into the source folder (src/) and run

python main.py --dynamic True --dclustering True --fpath [path/to/data]

substitute the path to the folder where you put the data for [path/to/data]. The data folder and files have to be structured in a specific format. For your convenience, we included some scripts that will help you preprocess the text data in dat/src/. For instructions on the required data format see dat/README.md.

For all commandline options run:

python main.py --help

For fastest convergence we recommend the following 2-step training procedure. First run

python main.py --fpath [path/to/data]

This executes Bernoulli embeddings without dynamics. The scripts uses the current timestamp to create a folder where the results are saved ([path/to/results/]). We will use these results to initialize the dynamic embeddings:

python main.py --dynamic True --fpath [path/to/data] --init [path/to/result]/alpha_constant

Make sure to use the same --K for both runs.

We have two inference methods implemented for the GMM, Hamiltonian Monte Carlo and Stochastic Gradient Descent Variational Inference.

Hamiltonian Monte Carlo

python main.py --dynamic True --dclustering True --HMC True --fpath [path/to/data]

Or Stochastic Gradient Descent Variational Inference

python main.py --dynamic True --dclustering True --VI True --fpath [path/to/data]

Reference

Maja Rudolph and David Blei, 2017. Dynamic Bernoulli Embeddings for Language Evolution. arxiv preprint arxiv:1703.08052.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.2%
  • Python 4.8%