This project contains
- Re-implementation of "Graph Convolutional Networks for Text Classification" in tensorflow 2.1.
- Some baseline models mentioned in original paper.
- Code is highly based on official repository.
- Preprocess part is modified from official repository.
- Model part is written by ourseleves.
- python 3.6
- tensorflow 2.1.0
- nltk 3.4.5
- fasttext 0.9.2 (Optional)
cd ./preprocess
python remove_words.py <dataset>
python build_graph.py <dataset>
The selections of <dataset>
are R8
, R52
, 20NG
, ohsumed
, THUCTC
, CHINESE
.
python train.py <dataset>
The selections of <dataset>
are R8
, R52
, 20NG
, ohsumed
, THUCTC
, CHINESE
.
cd visual
python tsne.py <dataset> <length>
The selections of <dataset>
are R8
, R52
, 20NG
, ohsumed
, THUCTC
, CHINESE
.
The selections of <length>
are 1
, 2
.
R8
is provided in cleaned_data
dictionary. Other datasets can be downloaded at Google drive.
R8 embeddings in first layer:
R8 embeddings in second layer:
More images can be found at visual
dictionary.
- The official implementation: https://github.com/yao8839836/text_gcn
- PyTorch version: https://github.com/iworldtong/text_gcn.pytorch
- Paper: https://arxiv.org/abs/1809.05679