A curated and continuously updated list of papers, code, and resources for Foundation Models on Structured Data.
This repository is the official repository for our survey:
A Survey on Foundation Models for Structured Data: Tabular, Time Series, and Graphs
- Three major structured data types: Tabular, Time Series, and Graph Foundation Models
- Unified comparison schema: data/tasks, objective, tokenization, architecture, adaptation and advance
- Includes benchmark and dataset resources for fast entry and reproducible research
| Model | Paper | Code | Pretraining Data | Pre-training Objective & Task | Tokenization | Architecture | Adaptation | Transfer | Task | Venue |
|---|---|---|---|---|---|---|---|---|---|---|
| MITRA | paper | HF(classifier) HF(regressor) |
Synthetic (SCM, tree-based) | Classification, Regression | Cell | Transformer | FT, ICL | 1:N | CLS, REG | NeurIPS 2025 |
| UniTabE | paper | code | Real-world datasets | Masked cell prediction + Row-wise contrastive | Name-Value | Transformer | FT | N:N | CLS, REG | ICLR 2024 |
| CARTE | paper | code | Knowledge base | Contrastive (graphlet & truncation) | Row | Transformer | FT | N:N | CLS, REG | ICML 2024 |
| PORTAL | paper | code | Real-world datasets | Masked cell modeling | Row | Transformer | FT | N:N | CLS, REG | NeurIPS 2024 (WS) |
| TabForestPFN | paper | code | Synthetic (SCM, tree-based) | Classification | Cell | Transformer | FT, ICL | 1:N | CLS | arXiv 2024 |
| TabPFNv2 | paper | code | Synthetic (SCM) | Masked cell prediction | Cell | Transformer | ICL | 1:N | CLS, REG | Nature 2025 |
| TabICL | paper | code | Synthetic (SCM, tree-based) | Classification | Row | Transformer | ICL | 1:N | CLS | ICML 2025 |
| TabDPT | paper | train code infer code |
Real-world datasets | Masked column prediction | Row | Transformer | ICL | N:N | CLS, REG | NeurIPS 2025 |
| TabSTAR | paper | code | Real-world datasets | Classification, Regression | Name-Value | Transformer | FT | N:N | CLS, REG | NeurIPS 2025 |
| TABULA | paper | code | Real-world datasets | Column-wise reconstruction | Name-Value | Transformer | FT | N:N | IMP | NeurIPS 2025 |
| TARTE | paper | code | Knowledge base | Contrastive (entities & facts) | Name-Value | Transformer | FT | N:N | CLS, REG | TMLR 2025 |
| LimiX | paper | code | Synthetic (SCM) | Context-conditional masked modeling | Cell | Transformer | ICL | 1:N | CLS, REG, IMP, GEN | arXiv 2025 |
| Real-TabPFN | paper | HF | Synthetic + Real-world | Classification | Cell | Transformer | ICL | 1:N | CLS | arXiv 2025 |
| TabLLM | paper | code | Text | Table-to-text generation | Name-Value | LLM | FT | 1:N | CLS | AISTATS 2023 |
| UniPredict | paper | -- | Real-world datasets | Table-to-text generation | Name-Value | LLM | IT | 1:N | CLS, REG | arXiv 2023 |
| TP-BERTa | paper | code | Real-world datasets | CLS, REG | Name-Value | LLM | FT | N:N | CLS, REG | ICLR 2024 |
| TABULA-8B | paper | HF | Real-world datasets | Tabular prediction | Row | LLM | ICL | 1:N | CLS, REG | NeurIPS 2024 |
| IngesTables | paper | -- | Real-world datasets | Attention-based tabular modeling | Name-Value | Transformer + LLM | FT | N:N | CLS, REG | NeurIPS 2023 (WS) |
Notes
-
Adaptation: "FT": Fine-Tuning; "ICL": In-Context Learning; "IT": Instruction Tuning
-
Task: "CLS": Classification; "REG": Regression; "IMP": Imputation; "GEN": Generation
| Model | Paper | Code | Pretraining Data | Pre-training Objective & Task | Tokenization | Architecture | Adaptation | Transfer | Task | Venue |
|---|---|---|---|---|---|---|---|---|---|---|
| ForecastPFN | paper | code | Synthetic (periodicity) | Point forecasting | Point | Transformer | - | 1:N | FCT | NeurIPS 2023 |
| Lag-Llama | paper | code | Real-world datasets | Probabilistic forecasting | Lag feature vector | Transformer | - | N:N | FCT | NeurIPS 2023 (WS) |
| TimeGPT-1 | paper | code | Real-world datasets | Forecasting | Sliding window | Transformer | FT | N:N | FCT | arXiv 2023 |
| UniTime | paper | code | Real-world datasets | Forecasting + Reconstruction | Fixed-length patch | Transformer | ICL | N:N | FCT | WWW 2024 |
| TimesFM | paper | code | Synthetic + Real-world | Point forecasting | Fixed-length patch | Transformer | FT | N:N | FCT | ICML 2024 |
| MOMENT | paper | code | Real-world datasets | Masked reconstruction | Fixed-length patch | Transformer | FT | N:N | FCT, CLS, IMP, AD | ICML 2024 |
| MOIRAI | paper | code | Real-world datasets | Probabilistic forecasting | Adaptive patch | Transformer | ICL | N:N | FCT | ICML 2024 |
| Timer | paper | code | Real-world datasets | Next token prediction | Fixed-length patch | Transformer | - | N:N | FCT, IMP, AD | ICML 2024 |
| UniTS | paper | code | Real-world datasets | Masked reconstruction | Fixed-length patch | Transformer | PL | N:N | FCT, CLS, IMP, AD | NeurIPS 2024 |
| Time-MoE | paper | code | Real-world datasets | Multi-resolution forecasting | Point | Transformer | FT, ICL | N:N | FCT | ICLR 2025 |
| WaveToken | paper | -- | Real-world datasets | Next token prediction | Wavelet | Transformer | ICL | N:N | FCT | ICML 2025 |
| ROSE | paper | -- | Real-world datasets | Masked reconstruction | Fixed-length patch | Transformer | FT | N:N | FCT | ICML 2025 |
| GPT4TS | paper | code | - | - | Fixed-length patch | LLM | FT | 1:N | FCT, CLS, IMP, AD | NeurIPS 2023 |
| LLMTime | paper | code | - | - | Point / digit sequence | LLM | - | 1:N | FCT | NeurIPS 2023 |
| PromptCast | paper | code | - | - | Point / digit sequence | LLM | FT | 1:N | FCT | TKDE 2023 |
| GPT4MTS | paper | code | AAAI 2024 | |||||||
| TIME-LLM | paper | code | - | - | Fixed-length patch | LLM | PL | 1:N | FCT | ICLR 2024 |
| AutoTimes | paper | code | Real-world datasets | Next token prediction | Fixed-length patch | LLM | PL, ICL | N:N | FCT | NeurIPS 2024 |
| Chronos | paper | code | Real + Synthetic | Autoregressive density estimation | Quantization | LLM | - | N:N | FCT | TMLR 2024 |
| CALF | paper | code | - | - | Text + TS embedding | LLM | FT | N:N | FCT | AAAI 2025 |
| LLM4TS | paper | code | - | Autoregressive alignment | Fixed-length patch | LLM | FT | 1:N | FCT | TIST 2025 |
| LLM-Mixer | paper | code | - | - | Text + TS embedding | LLM | FT | 1:N | FCT | ACL 2025 (WS) |
| TEMPO | paper | code | - | Point forecasting | Fixed-length patch | Transformer + LLM | PL | N:N | FCT | ICLR 2024 |
Notes
-
Adaptation: "FT": Fine-Tuning; "PL": Prompt Learning; "ICL": In-Context Learning
-
Task: "FCT": Forecasting; "CLS": Classification; "IMP": Imputation; "AD": Anomaly Detection
| Model | Paper | Code | Pretraining Data | Pre-training Objective & Task | Tokenization | Architecture | Adaptation | Transfer | Task | Venue |
|---|---|---|---|---|---|---|---|---|---|---|
| GraphPrompt | paper | code | Text-free | Subgraph similarity | Subgraph | GNN | PL | 1:1 | NC, GC | WWW 2023 |
| HGPrompt | paper | code | Text-free | Subgraph similarity | Subgraph | GNN | PL | 1:1 | NC, GC | AAAI 2024 |
| GCOPE | paper | code | Text-free | Contrastive + feature reconstruction | Node | GNN | FT, PL | N:N | NC | KDD 2024 |
| MultiGPrompt | paper | code | Text-free | Subgraph similarity | Encoder layer | GNN | PL | 1:1 | NC, GC | WWW 2024 |
| OpenGraph | paper | code | Text-free | Masked autoencoding | Node | GNN | - | N:N | NC, LP | EMNLP 2024 |
| GFT | paper | code | Text-attributed | Tree reconstruction | Computation tree | GNN | FT | N:N | NC, GC, LP | NeurIPS 2024 |
| AnyGraph | paper | code | Text-free | Link prediction | Node | GNN | FT | 1:N | NC, GC, LP | arXiv 2024 |
| MDGPT | paper | -- | Text-free | Subgraph similarity | Domain | GNN | PL | N:N | NC, GC | arXiv 2024 |
| OMOG | paper | -- | Text-attributed | Contrastive pretraining | Node | GNN | - | N:N | NC, LP | arXiv 2024 |
| GraphMoRE | paper | code | Text-free | Topology heterogeneity modeling | Node | GNN | FT | 1:1 | NC, LP | AAAI 2025 |
| GraphAny | paper | code | Text-free | Node classification | Node | GNN | - | 1:N | NC | ICLR 2025 |
| GraphPrompter | paper | code | Text-free | Neighbor matching + reconstruction | Subgraph | GNN | ICL | N:N | NC, GC, LP | ICDE 2025 |
| BRIDGE | paper | code | Text-free | Subgraph similarity | Aligner | GNN | PL | N:N | NC, GC | ICML 2025 |
| GIT | paper | code | Text-attributed | Tree reconstruction | Task tree | GNN | FT, IT, ICL | N:N | NC, GC, LP | ICML 2025 |
| MDGFM | paper | code | Text-free | Subgraph similarity | Domain | GNN | PL | N:N | NC | ICML 2025 |
| AutoGFM | paper | -- | Text-attributed | Disentangled contrastive learning | Subgraph | GNN | FT | N:N | NC, GC, LP | ICML 2025 |
| GCoT | paper | -- | Text-free | Link prediction | Node | GNN | PL | 1:1 | NC, GC | KDD 2025 |
| PatchNet | paper | code | Text-free | Attribute masking + context prediction | Node patch | GNN | FT | N:N | NC, GC | KDD 2025 |
| SAMGPT | paper | code | Text-free | Subgraph similarity | Structural token | GNN | PL | N:N | NC, GC | WWW 2025 |
| UniGraph2 | paper | code | Multimodal | Reconstruction | Node | GNN | - | N:N | Multimodal | WWW 2025 |
| RiemannGFM | paper | code | Text-attributed + Text-free | Geometric contrastive learning | Subgraph | GNN | FT | N:N | NC, LP | WWW 2025 |
| GraphCLIP | paper | code | Text-attributed | Contrastive alignment | Subgraph | GNN | PL | N:N | NC, LP | WWW 2025 |
| UniPrompt | paper | code | Text-free | - | Prompt graph | GNN | PL | N:N | NC | NeurIPS 2025 |
| GraphKeeper | paper | code | Text-free | Continual pretraining | Node | GNN | FT | N:1 | NC, GC | NeurIPS 2025 |
| HยฒGFM | paper | -- | Text-attributed | Text-space encoding + context-path modeling | Node | GNN | - | N:N | NC, LP | arXiv 2025 |
| RWPT | paper | -- | Text-attributed | Contrastive pretraining | Node sequence | GNN | FT | N:N | NC, GC, LP | arXiv 2025 |
| MDGCL | paper | -- | Text-free | Contrastive pretraining | Subgraph | GNN | FT | N:1 | NC, GC | arXiv 2025 |
| GILT | paper | -- | Text-free | Few-shot meta-pretraining | Node/Edge/Graph | GNN | ICL | N:N | NC, GC, LP | arXiv 2025 |
| GMoPE | paper | -- | Text-free | Contrastive pretraining | Node | GNN | FT | N:N | NC, GC, LP | arXiv 2025 |
| GraphGlue | paper | code | Text-free | Geometric pretraining | Manifold patch | GNN | FT, PL | N:1 | NC, GC, LP | ICLR 2026 |
| LLaGA | paper | code | Text-attributed | Alignment tuning | Node sequence | LLM | IT | N:N | NC, LP | ICML 2024 |
| LangGFM | paper | code | Mixed | Instruction tuning | Text | LLM | IT, ICL | N:N | NC, GC, LP | arXiv 2024 |
| PromptGFM | paper | code | Text-attributed | Multi-task instruction tuning | Node | LLM | IT | N:N | NC, LP | arXiv 2025 |
| OFA | paper | code | Text-attributed | Graph classification | Subgraph | GNN + LLM | PL, ICL | N:N | NC, GC, LP | ICLR 2024 |
| GraphGPT | paper | code | Text-attributed | Contrastive alignment + matching | Subgraph | GNN + LLM | IT | N:N | NC, LP | SIGIR 2024 |
| ZeroG | paper | code | Text-attributed | Semantic similarity | Node/Subgraph | GNN + LLM | - | N:N | NC | KDD 2024 |
| GOFA | paper | code | Text-attributed | Generative modeling | Node/Edge | GNN + LLM | IT | N:N | NC, GC, LP | ICLR 2025 |
| UniGraph | paper | code | Text-attributed | Text reconstruction | Node/Subgraph | GNN + LLM | IT, ICL | N:N | NC, GC, LP | KDD 2025 |
| BooG | paper | code | Text-attributed | Super-node matching | Subgraph | GNN + LLM | FT | N:N | NC, GC, LP | FCS 2025 |
| GRAVER | paper | code | Text-attributed | Subgraph similarity | Subgraph | GNN + LLM | PL | N:N | NC, GC | NeurIPS 2025 |
| SAยฒGFM | paper | code | Text-attributed | Subgraph similarity | Node + structural entropy | GNN + LLM | PL | N:N | NC, GC | AAAI 2026 |
Notes
-
Adaptation: "FT": Fine-Tuning; "PL": Prompt Learning; "IT": Instruction Tuning; "ICL": In-Context Learning
-
Task: "NC": Node Classification; "GC": Graph Classification; "LP": Link Prediction
| Data | Name | Description | Paper | Code | Web |
|---|---|---|---|---|---|
| Tabular | OpenTabs | Large-scale table dataset | paper | code | -- |
| Tabular | TabArena | Tabular benchmark | paper | code | https://tabarena.ai |
| Tabular | -- | Benchmarking Privacy Leakage | paper | -- | -- |
| Tabular | TabularFM | Open Framework | paper | code | https://tabularfm.github.io/ |
| Time Series | TSFM-Bench | Benchmark for TSFM | paper | code | -- |
| Time Series | LTSM-Bundle | Toolbox/Bench on LLM for Time Series Forecasting | paper | code | https://ltsm-doc.github.io/ |
| Time Series | PriceFM | Benchmark for Probabilistic Electricity Price Forecasting | paper | code | https://runyao-yu.com/PriceFM/ |
| Graph | TSGFM | Bench/Dataset for Text-space GFM | paper | Code | -- |
| Graph | GFMBench | Benchmark + Pipeline | paper | code | https://ggfm.readthedocs.io/en/latest/ |
| Graph | GFMBenchmark | Codebase for GFM | paper | code | -- |
- ๐ Survey Paper: xxx
- ๐ Cite:
@article{XXX}
