HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction

Official implementation of HEXST, a geometry-aligned Transformer for spatial gene expression prediction from histology.

Given spot-level image features and spatial coordinates, HEXST predicts spot-wise gene expression.

The implementation uses:

model/HEXST.py: main HEXST architecture
model/hex.py: hexagonal coordinate conversion, window construction, slot packing
model/pos_embed.py: HexRoPE implementation
utils/loss.py: MSE, Pearson loss, deviation-matching loss, and feature-alignment loss

Installation

The spatial gene expression prediction experiments in the paper were run with Python 3.10 and PyTorch 2.6.

Create a conda environment:

conda create -n hexst python=3.10 -y
conda activate hexst

Install PyTorch according to your CUDA environment, then install the remaining dependencies:

pip install torch torchvision
pip install numpy scipy scikit-learn pandas h5py pyyaml pillow tqdm

Data Preparation

The training and evaluation scripts expect pre-cached .pt files.

dataloader/data_cacheing.py
It contains project-specific absolute paths and need to be edited before use.
The current implementation assumes pre-extracted UNI image features and scFoundation transcriptomic embeddings.

Each cached file should contain the following structure:

{
    "train": {
        "UNI_feats": Tensor[N_train, 1024],
        "scFM_feats": Tensor[N_train, 3072],
        "metadata": [
            [img_path, spot_id, split, slide_id, gene_expression, coords],
            ...
        ],
    },
    "val": {
        "UNI_feats": Tensor[N_val, 1024],
        "scFM_feats": Tensor[N_val, 3072],
        "metadata": [...],
    },
    "test": {
        "UNI_feats": Tensor[N_test, 1024],
        "scFM_feats": Tensor[N_test, 3072],
        "metadata": [...],
    },
    "num_genes": 128
}

The current dataloader reads the following fields:

UNI_feats: spot-level pathology foundation model features
scFM_feats: transcriptomic embeddings used for feature alignment
metadata[i][3]: slide ID
metadata[i][4]: target gene expression vector
metadata[i][5]: spot coordinate

Supported Dataset Configs

Dataset configs are provided under config/data/:

abalo_human_squamous_cell_carcinoma.yaml
erickson_human_prostate_cancer_p1.yaml
mirzazadeh_mouse_bone.yaml
mirzazadeh_mouse_brain_p1.yaml
mirzazadeh_mouse_brain_p2.yaml
vicari_mouse_brain.yaml
villacampa_lung_organoid.yaml

Additional configs are also included:

mirzazadeh_human_small_intestine.yaml
vicari_human_striatium.yaml
villacampa_mouse_brain.yaml

Training

Train HEXST on a single dataset:

python train.py \
  --base_config ./config/baseline.yaml \
  --data_config ./config/data/abalo_human_squamous_cell_carcinoma.yaml \
  --model_config ./config/model/HEXST.yaml \
  --loss_function_config ./config/loss/function/MSEPL.yaml \
  --loss_mode_config ./config/loss/mode/DIOR.yaml \
  --loss_type_config ./config/loss/type/IF.yaml

Checkpoints and logs are saved to:

./results/<project>/HEXST_MSEPL_DIOR_IF/
├── model_best.pth
├── model_last.pth
└── log.txt

Evaluation

Evaluate a trained model on the test split:

python eval.py \
  --base_config ./config/baseline.yaml \
  --data_config ./config/data/abalo_human_squamous_cell_carcinoma.yaml \
  --model_config ./config/model/HEXST.yaml \
  --loss_function_config ./config/loss/function/MSEPL.yaml \
  --loss_mode_config ./config/loss/mode/DIOR.yaml \
  --loss_type_config ./config/loss/type/IF.yaml

Predicted expression tensors are saved to:

/data/SpaRED_pred/HEXST/<project>/HEXST_MSEPL_DIOR_IF/<slide_id>.pt

The evaluation code reports:

PCC_F: gene-wise Pearson correlation
PCC_S: spot-wise Pearson correlation
PCC_M: matrix-level Pearson correlation
MI_F: gene-wise mutual information
MI_M: matrix-level mutual information
NRMSE_F
NRMSE_M
AUC_0vNZ
AUC_Q50
JSDIV_M

Third-party Resources and Data

The current implementation assumes pre-extracted UNI image features and scFoundation transcriptomic embeddings. The spatial transcriptomics benchmark datasets and splits follow SpaRED.

UNI
Chen, Richard J., et al. "Towards a general-purpose foundation model for computational pathology." Nature Medicine 30.3 (2024): 850–862.
We use the official implementation from: https://github.com/mahmoodlab/UNI
scFoundation
Hao, Minsheng, et al. "Large-scale foundation model on single-cell transcriptomics." Nature Methods 21.8 (2024): 1481–1491.
We use the official implementation from: https://github.com/biomap-research/scFoundation
SpaRED
Mejia, Gabriel, et al. "Enhancing gene expression prediction from histology images with spatial transcriptomics completion." International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024.
We use the SpaRED benchmark data and splits from: https://bcv-uniandes.github.io/spared_webpage/

Please refer to the original repositories, papers, and dataset webpage for license terms, model access, data access, and usage restrictions.

License

This repository is released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Commercial use is not permitted. Non-commercial research and educational use is permitted with appropriate attribution.
This license applies only to the original HEXST source code in this repository. Third-party resources are subject to their own licenses and usage restrictions.

Citation

If you find this repository useful, please cite HEXST:

@article{byeon2026hexst,
  title   = {HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction},
  author  = {Byeon, Keunho and Kwak, Jin Tae},
  journal = {arXiv preprint arXiv:2605.04682},
  year    = {2026}
}

Acknowledgements

This work was supported by a grant of the National Research Foundation of Korea (NRF) (No. RS-2025-00558322 and RS-2024-00397293) and the AI Computing Infrastructure Enhancement (GPU Rental Support) User Support Program funded by the Ministry of Science and ICT (MSIT) (No. RQT-25-120213), Republic of Korea.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
dataloader		dataloader
model		model
utils		utils
LICENSE.md		LICENSE.md
README.md		README.md
eval.py		eval.py
eval.sh		eval.sh
eval_external.py		eval_external.py
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction

Installation

Data Preparation

Supported Dataset Configs

Training

Evaluation

Third-party Resources and Data

License

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction

Installation

Data Preparation

Supported Dataset Configs

Training

Evaluation

Third-party Resources and Data

License

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages