Methodology Report

Data Source

For training all single-task models, only the dataset provided by the competition organizers was used.
No preliminary data preprocessing was applied to the original dataset.

Graph-Based Molecular Representation Learning

A pretrained Chemeleon Bond Message Passing Neural Network (MPNN)
(https://github.com/JacksonBurns/chemeleon) was used as the base molecular representation model.

For each of the nine target properties, an independent MPNN was fine-tuned using the corresponding target-specific training dataset. Each dataset was randomly split into training (90%) and validation (10%) sets.

After training, the best-performing checkpoint for each target-specific MPNN was frozen and used exclusively as a feature extractor. The resulting molecular embeddings encode both molecular structural information and target-dependent patterns learned during supervised fine-tuning.

The same target-specific MPNN models were then used to extract embeddings for the shared external test set. These embeddings served as numerical feature vectors for downstream regression modeling.

TabM Regression Model

For downstream modeling, the TabM framework
(https://github.com/yandex-research/tabm) was employed.

TabM is a deep probabilistic architecture for tabular data that relies on ensembling k independent submodels and specialized numerical embeddings. The model integrates several key components:

Piecewise-linear embeddings for numerical features, enabling adaptive partitioning of feature value ranges;
A compact TabM-mini backbone consisting of multiple computational blocks with residual connections and dropout regularization;
An ensemble of k submodels whose predictions are averaged, reducing variance and improving model robustness, particularly on small datasets.

Prior to training, input features were standardized using a QuantileTransformer, and the target variable was standardized using statistics computed on the training set.

Model optimization was performed using the AdamW optimizer with gradient clipping and an early stopping strategy based on validation performance. Final TabM predictions were obtained using an ensemble strategy with five different random initializations.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
TabM_OpenADMET.ipynb		TabM_OpenADMET.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Methodology Report

Data Source

Graph-Based Molecular Representation Learning

TabM Regression Model

About

Uh oh!

Releases

Packages

Languages

MDMISC/OpenADMET

Folders and files

Latest commit

History

Repository files navigation

Methodology Report

Data Source

Graph-Based Molecular Representation Learning

TabM Regression Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages