Official/author-maintained partial code release for:
MoLA: Molecular multimodal layerwise adaptive network for molecular property prediction
Jiayi Li, Zihang Zhang, Zhenyu Lei, Jiujun Cheng, Lianbo Ma, Cong Liu, Shangce Gao
Knowledge-Based Systems, Volume 338, 115563, 2026.
- DOI: https://doi.org/10.1016/j.knosys.2026.115563
- ScienceDirect: https://www.sciencedirect.com/science/article/pii/S0950705126003059
- Share Link: https://authors.elsevier.com/c/1md~S3OAb9Kb5p
- Multimodal molecular representation learning with:
- molecular graph features
- Morgan fingerprint features
- SMILES sequence features
- optional MoLFormer embeddings
- Layerwise adaptive fusion with cross-layer attention.
- Reproducible training script for MoleculeNet classification tasks.
- Built-in ablation switch for MoLFormer:
--use-molformer(default)--no-molformer
.
|-- main_Cla_MoLA.py # training/evaluation entry
|-- model_MoLA.py # MoLA architecture
|-- prepare_molformer_embeddings.py # build MoLFormer embeddings (reference utility)
|-- prepare_MoLFormer.py # backward-compatible launcher
|-- environment.yml # base conda environment specification
|-- setup_env.sh # one-command environment bootstrap
|-- utils.py # metrics (PRC-AUC / ROC-AUC)
|-- datasets/
| |-- featurized/ # DeepChem cached data
| `-- Molformer_OUTPUT/ # optional MolFormer embeddings
|-- result_Cla/ # training logs
`-- weights/ # checkpoints
conda(ormamba)- Python
3.10 - CUDA
12.1(for GPU training with the default setup) - Linux/macOS shell (or Git Bash on Windows) for
setup_env.sh
bash setup_env.sh
conda activate chemThe training script uses DeepChem MoleculeNet loaders and expects:
- DeepChem cache directory:
./datasets/featurized/...
- (Optional) MoLFormer embedding file:
./datasets/Molformer_OUTPUT/<dataset_name>/Molformer_Emb_2025.h5
- H5 keys:
train_fp,valid_fp,test_fp
This repository provides prepare_molformer_embeddings.py to compute embeddings and save:
datasets/Molformer_OUTPUT/<dataset_name>/Molformer_Emb_2025.h5
Notes:
- This script is a project helper for MoLA and is provided for reference.
- It must be used together with the official IBM MoLFormer release:
Run:
python prepare_molformer_embeddings.pypython main_Cla_MoLA.py \
--use-molformer \
--run-times 1 \
--epochs 300 \
--batch-size 256python main_Cla_MoLA.py \
--no-molformer \
--run-times 1 \
--epochs 300 \
--batch-size 256- Dataset loaders currently configured:
bace_classification,bbbp,clintox,muv,sider,tox21
main_Cla_MoLA.pydoes not require--datasets; it runs the built-in dataset list.- Seed control:
--seed(default:2025)
- Typical output paths:
- logs:
result_Cla/<model_name>/result_<model_name>_<run_id>.txt - checkpoints:
weights/<model_name>_<dataset>_<run_id>.pth
- logs:
--model-name: experiment name prefix--run-times: repeated runs--epochs: maximum epochs--batch-size: mini-batch size--embed-dim: hidden width--num-layers: encoder depth--max-len: SMILES max length--fp-bits: Morgan fingerprint bits (default2048)--lr: learning rate--weight-decay: weight decay--scheduler-patience: LR scheduler patience--early-stop-patience: early stop patience--device:cudaorcpu--use-molformer/--no-molformer: whether to use MoLFormer embeddings
This repository is a partial release and currently includes limited prepared data (mainly BACE-related files in this snapshot). It focuses on core model/training logic and reproducible experiments.
If you use this codebase, please cite:
@article{li2026mola,
title = {{MoLA}: Molecular multimodal layerwise adaptive network for molecular property prediction},
author = {Li, Jiayi and Zhang, Zihang and Lei, Zhenyu and Cheng, Jiujun and Ma, Lianbo and Liu, Cong and Gao, Shangce},
journal = {Knowledge-Based Systems},
volume = {338},
pages = {115563},
year = {2026},
doi = {10.1016/j.knosys.2026.115563}
}