Skip to content
/ CROSS Public

[NeurIPS 2025] Implementation for the paper "Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models"

Notifications You must be signed in to change notification settings

SiweiPro/CROSS

Repository files navigation

NeurIPS 2025 CROSS

This is the official implementation for NeurIPS 2025 Research Track Paper: Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models

We have provided the related resources here: [📃 Paper] [🎬 Video]

We propose an LLM-based framework, CROSS, for TTAG representation learning:

  • We design a temporal-aware LLM prompting paradigm and develop Temporal Semantics Extractor. It enhances LLMs with dynamic reasoning capability to offer the evolving semantics of nodes’ neighborhoods, effectively detecting semantic dynamics.
  • We introduce a modal-cohesive co-encoding architecture and propose Semantic-structural Co-encoder, which jointly propagates semantic and structural information, facilitating mutual reinforcement between both modalities.

We summarize the key differences between CROSS and existing methods as follows:

Feel free to cite this work if you find it useful to you! 😄

@inproceedings{zhang2025unifying,
  title={Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models}, 
  author={Siwei Zhang and Yun Xiong and Yateng Tang and Jiarong Xu and Xi Chen and Zehao Gu and Xuezheng Hao and Zian Jia and Jiawei Zhang},
  booktitle={Proceedings of Neural Information Processing Systems},
  year={2025}
}

🔥 News

  • Oct 17 2025 🚀 We released the codes and data of CROSS! See you in San Diego!! 👋
  • Sep 30 2025 📝 We posted the camera ready version of our paper!
  • Sep 18 2025 🎉 Our paper is accepted by NeurIPS 2025. New version of paper will be released soon!
  • Mar 2025 📚 We posted the first version of our paper!

How to run CROSS

Step 1: Prepare the dataset.

We provide the example of the Enron dataset, and the other datasets follow a similar setup.

First, download the original dataset from here and save them into the ../DyLink_Datasets directory. Then, download the LLM-generated texts from here and replace the corresponding files of the dataset into the directory.

The final directory structure should be:

DyLink_Datasets/
└── Enron/
    ├── entity_text.csv
    ├── edge_list.csv
    ├── relation_text.csv
    ├── LLM_temporal_chain_data.pkl
    └── chain_results.json
└── GDELT/
    ├── entity_text.csv
    ├── edge_list.csv
    ├── relation_text.csv
    ├── LLM_temporal_chain_data.pkl
    └── chain_results.json
└── ICESW1819/
    ├── entity_text.csv
    ├── edge_list.csv
    ├── relation_text.csv
    ├── LLM_temporal_chain_data.pkl
    └── chain_results.json
└── Googlemap_CT/
    ├── entity_text.csv
    ├── edge_list.csv
    ├── relation_text.csv
    ├── LLM_temporal_chain_data.pkl
    └── chain_results.json
... (for your own datasets)

Step 2: Prepare the text embeddings.

Run the following commands to generate the text embeddings using MiniLM:

For raw texts:

CUDA_VISIBLE_DEVICES=0 python get_pretrained_embeddings.py

For LLM-generated texts:

CUDA_VISIBLE_DEVICES=0 python get_temporal_chain_embeddings.py

Step 3: Run temporal link prediction.

To train CROSS using DyGFormer for temporal link prediction on different datasets, run:

CUDA_VISIBLE_DEVICES=0 python train_link_prediction.py --dataset_name Enron --use_feature MiniLM --model_name DyGFormer

CUDA_VISIBLE_DEVICES=0 python train_link_prediction.py --dataset_name GDELT --use_feature MiniLM --model_name DyGFormer

CUDA_VISIBLE_DEVICES=0 python train_link_prediction.py --dataset_name ICESW1819 --use_feature MiniLM --model_name DyGFormer

CUDA_VISIBLE_DEVICES=0 python train_link_prediction.py --dataset_name Googlemap_CT --use_feature MiniLM --model_name DyGFormer

How to get the LLM-generated texts

You should add your own authorization from your API account and change the url if using other llms. Please make sure your network connection is stable. In addition, our code supports multi-threaded workers, which you can adjust according to your needs.

Run:

CUDA_VISIBLE_DEVICES=0 python temporal_chain_LLMs.py --model_name deepseek --dataset_name Enron --num_workers 80

CUDA_VISIBLE_DEVICES=0 python temporal_chain_LLMs.py --model_name deepseek --dataset_name GDELT --num_workers 80

CUDA_VISIBLE_DEVICES=0 python temporal_chain_LLMs.py --model_name deepseek --dataset_name ICESW1819 --num_workers 80

CUDA_VISIBLE_DEVICES=0 python temporal_chain_LLMs.py --model_name deepseek --dataset_name Googlemap_CT --num_workers 80

Training logs

To facilitate a quick verification of the reproducibility of our results, we provide the logs of training CROSS in ./logs/, which contains the main experimental results and other details.

Acknowledge

Codes and model implementations are referred to DTGB and DyGLib. Thanks for their great contributions!

About

[NeurIPS 2025] Implementation for the paper "Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages