Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


This is the tensorflow implementation of the AAAI-2020 paper "Topic Modeling on Document Networks with Adjacent-Encoder" by Ce Zhang and Hady W. Lauw.

Adjacent-Encoder is a neural topic model that extracts topics for networked documents for document classification, clustering, link prediction, etc.

Implementation Environment

  • Python == 3.6
  • Tensorflow == 1.9.0
  • Numpy == 1.17.4



Parameter Setting

  • -lr: learning rate, default = 0.001
  • -ne: number of epoch, default = 1000
  • -ti: transductive or inductive learning, transductive means we input all documents and links for unsupervised training, inductive means we split 80% for training, 20% for test, default = inductive
  • -dn: dataset name
  • -nt: number of topics, default = 64
  • -x: 0 == Adjacent-Encoder, 1 == Adjacent-Encoder-X, default = 0 (Adjacent-Encoder)
  • -tr: training ratio, this program will automatically split 10% among training set for validation, default = 0.8
  • -ms: minibatch size, default = 128
  • -sm: gaussian (sigma) used for Denoising Adjacent-Encoder(-X), if do not want to use denoising variant, set this value to 0, default = 0
  • -c: contractive regularizer for Contractive Adjacent-Encoder(-X), best performance = 1e-11, if do not want to use contractive variant, set this value to 0, default = 0
  • -sp: 0 == no k-sparse, 1 == K-Sparse Adjacent-Encoder(-X), default = 0
  • -k: number of nonzero hidden neurons of K-Sparse Adjacent-Encoder(-X), if do not use k-sparse variant, set above argument -sp to 0, default = 0.5 * num_topics


We extracted four independent datasets (DS, HA, ML, and PL) from source Cora ( Note that the well-known benchmark Cora dataset as used in GAT is actually the ML subset. In addition to ML, we further created three new citation networks.

In ./cora file we release these datasets, each of which contains adjacency matrix, content, label, label name, and vocabulary.

  • adjacency matrix (NxN): a 0-1 symmetric matrix (A^T==A), and its diagonal elements are supposed to be 1.
  • content (Nx|V|): each row is a Bag-of-Words representation of the corresponding document, and each column is a word in the vocabulary. Documents are represented by the word count.
  • label (Nx1): label or category of each document. Labels are used only for evaluation, not for learning in our model.
  • label name: the name of each label or category.
  • vocabulary (|V|x1): words.


The document embeddings are output to the ./results file. Each row represents one document embedding, and each column represents one dimension of the embedding, or one topic.

In transductive learning, training embeddings are the same as test embeddings. In inductive learning, training embeddings are those of training documents (no validation documents), and testing embeddings are inferred for testing documents.


If you use our paper, including code and data, please cite

    title={Topic Modeling on Document Networks with Adjacent-Encoder},
    author={Zhang, Ce and Lauw, Hady W},
    booktitle={Thirty-fourth AAAI conference on artificial intelligence},


Source code of the AAAI-2020 paper "Topic Modeling on Document Networks with Adjacent-Encoder"






No releases published


No packages published