Skip to content

DryDew/AutoDV

Repository files navigation

Official repository of ICLR 2026 paper: AutoDV:An End-to-End Deep Learning Model for High-Dimensional Data Visualization

Following the double-blind policy, we delete the path name that includes the authors' name. Please manually set your own data path.

This project requires multiple python packages, including dgl, torch geometry, keras, and other common package for deep learning research.

  1. use prepare_data/dataset_utils.py to downsample the dataset, and generate features and pairwise distance data
  2. use prepare_data/bo to search the optimal Z* for tsne and umap
  3. use gnn/LargeCompleteMVGraphDatasets.py object to calculate PE, where there is "precompute" method.
  4. set the path carefully. There are 4 paths, including cdist_path (path to the pairwise distance), visual_path (path to Z* from tSNE), visual_path_umap (path to Z* from UMAP), and precomputed_pe_path (path to pre-computed PEs).
  5. use gnn/train_gin_large_complete_g_mv*.py to train the AutoDV model

All raw datasets used in the project can be found in https://drive.google.com/drive/folders/1Cv15e5A5W5a9K8OZkAA-TVRURvMpdgbL?usp=sharing This is a anonymous file sharing link. The data preprocessing code can be find in prepare_data folder. We make a clarification here for potential copyright problem. For image data, we access the raw data of MNIST, FMNIST, and CIFAR10 from pytorch interface https://docs.pytorch.org/vision/main/datasets.html.

For gene data, we download Campbell from https://github.com/perslab/campbell-2017. We download PBMC68k from https://www.10xgenomics.com/datasets/fresh-68-k-pbm-cs-donor-a-1-standard-1-1-0 We download Mouse Retina from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE201402 We download Baron Human from https://www.ncbi.nlm.nih.gov/gds/?term=GSE84133[Accession].

For tabular data from UCI library, we download the data by searching the name in https://archive.ics.uci.edu/, which is publicly accessible.

Note that there are two modes of AutoDV, "parallel gt" and "none parallel gt". "parallel gt" means the graph transformers are accepting k-view graphs at very begining and process the input parallelly with the GINs. For AutoDV-UMAP, we use this mode.

To evaluate the model, codes can be find in gnn/transfer_exp or gnn/large_res. gnn/runing_time_exp provide running code for dummy data.

There may exist some irrelevant codes. They are codes in the early period of this project.

Aside from the main training code, some codes such as data preparation, results reading, and ploting, may be not well writen. Some useful code may be comment out. Use them carefully.

About

Full datasets used in AutoDV

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages