Skip to content
/ MoLA Public

MoLA: Molecular Multimodal Layerwise Adaptive Network for Molecular Property Prediction

Notifications You must be signed in to change notification settings

Rirock/MoLA

Repository files navigation

MoLA: Molecular Multimodal Layerwise Adaptive Network

Paper Python PyTorch

Official/author-maintained partial code release for:

MoLA: Molecular multimodal layerwise adaptive network for molecular property prediction
Jiayi Li, Zihang Zhang, Zhenyu Lei, Jiujun Cheng, Lianbo Ma, Cong Liu, Shangce Gao
Knowledge-Based Systems, Volume 338, 115563, 2026.

Highlights

  • Multimodal molecular representation learning with:
    • molecular graph features
    • Morgan fingerprint features
    • SMILES sequence features
    • optional MoLFormer embeddings
  • Layerwise adaptive fusion with cross-layer attention.
  • Reproducible training script for MoleculeNet classification tasks.
  • Built-in ablation switch for MoLFormer:
    • --use-molformer (default)
    • --no-molformer

Repository Structure

.
|-- main_Cla_MoLA.py        # training/evaluation entry
|-- model_MoLA.py           # MoLA architecture
|-- prepare_molformer_embeddings.py  # build MoLFormer embeddings (reference utility)
|-- prepare_MoLFormer.py    # backward-compatible launcher
|-- environment.yml         # base conda environment specification
|-- setup_env.sh            # one-command environment bootstrap
|-- utils.py                # metrics (PRC-AUC / ROC-AUC)
|-- datasets/
|   |-- featurized/         # DeepChem cached data
|   `-- Molformer_OUTPUT/   # optional MolFormer embeddings
|-- result_Cla/             # training logs
`-- weights/                # checkpoints

Environment

Requirements

  • conda (or mamba)
  • Python 3.10
  • CUDA 12.1 (for GPU training with the default setup)
  • Linux/macOS shell (or Git Bash on Windows) for setup_env.sh

Quick Setup (recommended)

bash setup_env.sh
conda activate chem

Data Preparation

The training script uses DeepChem MoleculeNet loaders and expects:

  1. DeepChem cache directory:
    • ./datasets/featurized/...
  2. (Optional) MoLFormer embedding file:
    • ./datasets/Molformer_OUTPUT/<dataset_name>/Molformer_Emb_2025.h5
  3. H5 keys:
    • train_fp, valid_fp, test_fp

Build MoLFormer embeddings (reference utility)

This repository provides prepare_molformer_embeddings.py to compute embeddings and save:

  • datasets/Molformer_OUTPUT/<dataset_name>/Molformer_Emb_2025.h5

Notes:

  • This script is a project helper for MoLA and is provided for reference.
  • It must be used together with the official IBM MoLFormer release:

Run:

python prepare_molformer_embeddings.py

Quick Start

1) Train with MoLFormer embeddings

python main_Cla_MoLA.py \
  --use-molformer \
  --run-times 1 \
  --epochs 300 \
  --batch-size 256

2) Train without MoLFormer embeddings

python main_Cla_MoLA.py \
  --no-molformer \
  --run-times 1 \
  --epochs 300 \
  --batch-size 256

Reproducibility Notes

  • Dataset loaders currently configured:
    • bace_classification, bbbp, clintox, muv, sider, tox21
  • main_Cla_MoLA.py does not require --datasets; it runs the built-in dataset list.
  • Seed control:
    • --seed (default: 2025)
  • Typical output paths:
    • logs: result_Cla/<model_name>/result_<model_name>_<run_id>.txt
    • checkpoints: weights/<model_name>_<dataset>_<run_id>.pth

Main Arguments

  • --model-name: experiment name prefix
  • --run-times: repeated runs
  • --epochs: maximum epochs
  • --batch-size: mini-batch size
  • --embed-dim: hidden width
  • --num-layers: encoder depth
  • --max-len: SMILES max length
  • --fp-bits: Morgan fingerprint bits (default 2048)
  • --lr: learning rate
  • --weight-decay: weight decay
  • --scheduler-patience: LR scheduler patience
  • --early-stop-patience: early stop patience
  • --device: cuda or cpu
  • --use-molformer / --no-molformer: whether to use MoLFormer embeddings

Current Release Scope

This repository is a partial release and currently includes limited prepared data (mainly BACE-related files in this snapshot). It focuses on core model/training logic and reproducible experiments.

Citation

If you use this codebase, please cite:

@article{li2026mola,
  title   = {{MoLA}: Molecular multimodal layerwise adaptive network for molecular property prediction},
  author  = {Li, Jiayi and Zhang, Zihang and Lei, Zhenyu and Cheng, Jiujun and Ma, Lianbo and Liu, Cong and Gao, Shangce},
  journal = {Knowledge-Based Systems},
  volume  = {338},
  pages   = {115563},
  year    = {2026},
  doi     = {10.1016/j.knosys.2026.115563}
}

About

MoLA: Molecular Multimodal Layerwise Adaptive Network for Molecular Property Prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors