This repository introduces FRAME, a framework for learning fragment-based molecular representations to enhance the interpretability of graph neural networks in drug discovery. FRAME represents chemically meaningful fragments as graph nodes and is compatible with several GNN architectures, including GCN, GAT, and AttentiveFP. It also integrates Integrated Gradients to generate more transparent and chemically grounded model explanations.
-
Clone the repo:
-
Create and activate your
virtualenvwith Python 3.12, for example as described here. -
Install PyTorch 2.8.0 using:
pip install torch==2.8.0 -f https://download.pytorch.org/whl/cu129 -
Install FRAME using:
python -m pip install .or for development:
python -m pip install -e .
The CSV file used in FRAME must include the following columns:
id– A unique identifier for each entry.smiles– The SMILES representation of the molecule.label– The target value or class associated with each molecule.set– Indicates the data split for each entry. This column must contain one of the following values:train(training data)valid(data used for early stopping)test(external test data)
Please ensure that all entries follow this structure so the dataset can be correctly loaded and processed by the pipeline.
All model parameters and runtime settings are defined in a YAML configuration file.
An example file, parameters.yaml, is provided.
To enable hyperparameter optimization, define parameters using min and max:
Tune:
hidden_channels:
min: 64
max: 128If you want to specify fixed values without optimization, use value:
Tune:
hidden_channels:
value: 64All entry points accept a -c/--config parameter pointing to the YAML config file.
- Generate a processed dataset:
frame_gen -c parameters.yaml- Run Optuna hyperparameter tuning:
frame_tune -c parameters.yaml- Train a single model using values in the
Tunesection:
frame_train -c parameters.yaml- Evaluate trained with the test set:
frame_eval -c parameters.yaml- Explain and run model prediction:
frame_explain -c parameters.yaml