This is the official implementation for NeurIPS 2025 Research Track Paper: Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models
We have provided the related resources here: [📃 Paper] [🎬 Video]
We propose an LLM-based framework, CROSS, for TTAG representation learning:
- We design a temporal-aware LLM prompting paradigm and develop Temporal Semantics Extractor. It enhances LLMs with dynamic reasoning capability to offer the evolving semantics of nodes’ neighborhoods, effectively detecting semantic dynamics.
- We introduce a modal-cohesive co-encoding architecture and propose Semantic-structural Co-encoder, which jointly propagates semantic and structural information, facilitating mutual reinforcement between both modalities.
We summarize the key differences between CROSS and existing methods as follows:
Feel free to cite this work if you find it useful to you! 😄
@inproceedings{zhang2025unifying,
title={Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models},
author={Siwei Zhang and Yun Xiong and Yateng Tang and Jiarong Xu and Xi Chen and Zehao Gu and Xuezheng Hao and Zian Jia and Jiawei Zhang},
booktitle={Proceedings of Neural Information Processing Systems},
year={2025}
}
- Oct 17 2025 🚀 We released the codes and data of CROSS! See you in San Diego!! 👋
- Sep 30 2025 📝 We posted the camera ready version of our paper!
- Sep 18 2025 🎉 Our paper is accepted by NeurIPS 2025. New version of paper will be released soon!
- Mar 2025 📚 We posted the first version of our paper!
We provide the example of the Enron dataset, and the other datasets follow a similar setup.
First, download the original dataset from here and save them into the ../DyLink_Datasets directory. Then, download the LLM-generated texts from here and replace the corresponding files of the dataset into the directory.
The final directory structure should be:
DyLink_Datasets/
└── Enron/
├── entity_text.csv
├── edge_list.csv
├── relation_text.csv
├── LLM_temporal_chain_data.pkl
└── chain_results.json
└── GDELT/
├── entity_text.csv
├── edge_list.csv
├── relation_text.csv
├── LLM_temporal_chain_data.pkl
└── chain_results.json
└── ICESW1819/
├── entity_text.csv
├── edge_list.csv
├── relation_text.csv
├── LLM_temporal_chain_data.pkl
└── chain_results.json
└── Googlemap_CT/
├── entity_text.csv
├── edge_list.csv
├── relation_text.csv
├── LLM_temporal_chain_data.pkl
└── chain_results.json
... (for your own datasets)
Run the following commands to generate the text embeddings using MiniLM:
For raw texts:
CUDA_VISIBLE_DEVICES=0 python get_pretrained_embeddings.py
For LLM-generated texts:
CUDA_VISIBLE_DEVICES=0 python get_temporal_chain_embeddings.py
To train CROSS using DyGFormer for temporal link prediction on different datasets, run:
CUDA_VISIBLE_DEVICES=0 python train_link_prediction.py --dataset_name Enron --use_feature MiniLM --model_name DyGFormer
CUDA_VISIBLE_DEVICES=0 python train_link_prediction.py --dataset_name GDELT --use_feature MiniLM --model_name DyGFormer
CUDA_VISIBLE_DEVICES=0 python train_link_prediction.py --dataset_name ICESW1819 --use_feature MiniLM --model_name DyGFormer
CUDA_VISIBLE_DEVICES=0 python train_link_prediction.py --dataset_name Googlemap_CT --use_feature MiniLM --model_name DyGFormer
You should add your own authorization from your API account and change the url if using other llms. Please make sure your network connection is stable. In addition, our code supports multi-threaded workers, which you can adjust according to your needs.
Run:
CUDA_VISIBLE_DEVICES=0 python temporal_chain_LLMs.py --model_name deepseek --dataset_name Enron --num_workers 80
CUDA_VISIBLE_DEVICES=0 python temporal_chain_LLMs.py --model_name deepseek --dataset_name GDELT --num_workers 80
CUDA_VISIBLE_DEVICES=0 python temporal_chain_LLMs.py --model_name deepseek --dataset_name ICESW1819 --num_workers 80
CUDA_VISIBLE_DEVICES=0 python temporal_chain_LLMs.py --model_name deepseek --dataset_name Googlemap_CT --num_workers 80
To facilitate a quick verification of the reproducibility of our results, we provide the logs of training CROSS in ./logs/, which contains the main experimental results and other details.
Codes and model implementations are referred to DTGB and DyGLib. Thanks for their great contributions!

