Complexity-Guided Curriculum Learning for Text Graphs

TGCL is an advanced spaced repetition framework designed to enhance the training efficacy of GNNs and understand their learning dynamics. It schedules examples for training, optimize the timing and sequence (order) based on the evolving complexity of training data. To achieve this, it uses a combination of multiview graph and text complexity formalisms. TGCL can effectively tailor the curriculum to the unique learning dynamics of each model and can learn curricula that are transferable across different GNN models and datasets.

The architecture of the proposed model, TGCL. It takes subgraphs and text(s) of their target node(s) as input. The radar chart shows graph complexity indices which quantify the difficulty of each subgraphs from different perspectives (text complexity indices are not shown for simplicity). Subgraphs are ranked according to each complexity index and these rankings are provided to TGCL scheduler to space samples over time for training.

Data

Node Classification

There are two datasets for node classification: Arxiv and Cora.

Arxiv: Arxiv is downloaded from ogbn (https://ogb.stanford.edu/docs/nodeprop/#ogbn-arxiv) using following code:

from ogb.nodeproppred import PygNodePropPredDataset

dataset = PygNodePropPredDataset(name = d_name) 

split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx["train"], split_idx["valid"], split_idx["test"]
graph = dataset[0] # pyg graph object

Cora: Cora is downloaded from Pytorch Geometric library (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.datasets.Planetoid.html#torch_geometric.datasets.Planetoid). We used following code:

dataset = Planetoid(root='/tmp/Cora', name='Cora')

Citeseer: Citeseer is downloaded from Pytorch Geometric library (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.datasets.Planetoid.html#torch_geometric.datasets.Planetoid). We used following code:

dataset = Planetoid(root='/tmp/Citeseer', name='Citeseer')

Link Prediction

There are two datasets for link prediction: PGR and GDPR.

Phenotype Gene Relation (PGR): PGR is created by Sousa et al., NAACL 2019 (https://aclanthology.org/N19-1152/) from PubMed articles and contains sentences describing relations between given genes and phenotypes. In our experiments, we only include data samples in PGR with available text descriptions for their genes and phenotypes. This amounts to ~71% of the original dataset.
Gene, Disease, Phenotype Relation (GDPR): This dataset is obtained by combining and linking entities across two freely-available datasets: Online Mendelian Inheritance in Man (OMIM, https://omim.org/) and Human Phenotype Ontology (HPO, https://hpo.jax.org/). The dataset contains relations between genes, diseases and phenotypes.

To download datasets with embeddings and Train/Test/Val splits, go to data directory and run download.sh as follows

sh data/download.sh

Indices

Indices for all the datasets can be downloaded using the below command:

sh indices/download.sh

To run the code

Use the following command with appropriate arguments:

Node Classification

cd node_classification
python3 node_classification.py

Link Prediction

cd link_prediction
python3 link_prediction.py

Citation

@inproceedings{nidhi-etal-2023-tgcl,
    title = "Complexity-Guided Curriculum Learning for Text Graphs",
    author = "Vakil, Nidhi and  Amiri, Hadi",
    booktitle = "Proceedings of the 2023 Empirical Methods in Natural Language Processing",
    publisher = "Association for Computational Linguistics",
    year = "2023"
    
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
indices		indices
link_prediction		link_prediction
logs		logs
node_classification		node_classification
saved_models		saved_models
LICENSE		LICENSE
README.md		README.md
tgcl.png		tgcl.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

indices

indices

link_prediction

link_prediction

logs

logs

node_classification

node_classification

saved_models

saved_models

LICENSE

LICENSE

README.md

README.md

tgcl.png

tgcl.png

Repository files navigation

Complexity-Guided Curriculum Learning for Text Graphs

Data

Node Classification

Link Prediction

Indices

To run the code

Node Classification

Link Prediction

Citation

About

Releases

Packages

Languages

License

CLU-UML/TGCL

Folders and files

Latest commit

History

Repository files navigation

Complexity-Guided Curriculum Learning for Text Graphs

Data

Node Classification

Link Prediction

Indices

To run the code

Node Classification

Link Prediction

Citation

About

Resources

License

Stars

Watchers

Forks

Languages