TumFlow

This repository contains the code used to generate the results reported in the paper: TumFlow: An AI Model for Predicting New Anticancer Molecules.

@Article{ijms25116186,
AUTHOR = {Rigoni, Davide and Yaddehige, Sachithra and Bianchi, Nicoletta and Sperduti, Alessandro and Moro, Stefano and Taccioli, Cristian},
TITLE = {TumFlow: An AI Model for Predicting New Anticancer Molecules},
JOURNAL = {International Journal of Molecular Sciences},
VOLUME = {25},
YEAR = {2024},
NUMBER = {11},
ARTICLE-NUMBER = {6186},
URL = {https://www.mdpi.com/1422-0067/25/11/6186},
ISSN = {1422-0067},
ABSTRACT = {Melanoma is the fifth most common cancer in the United States. Conventional drug discovery methods are inherently time-consuming and costly, which imposes significant limitations. However, the advent of Artificial Intelligence (AI) has opened up new possibilities for simulating and evaluating numerous drug candidates, thereby mitigating the requisite time and resources. In this context, normalizing flow models by employing machine learning techniques to create new molecular structures holds promise for accelerating the discovery of effective anticancer therapies. This manuscript introduces TumFlow, a novel AI model designed to generate new molecular entities with potential therapeutic value in cancer treatment. It has been trained on the NCI-60 dataset, encompassing thousands of molecules tested across 60 tumour cell lines, with an emphasis on the melanoma SK-MEL-28 cell line. The model successfully generated new molecules with predicted improved efficacy in inhibiting tumour growth while being synthetically feasible. This represents a significant advancement over conventional generative models, which often produce molecules that are challenging or impossible to synthesize. Furthermore, TumFlow has also been utilized to optimize molecules known for their efficacy in clinical melanoma treatments. This led to the creation of novel molecules with a predicted enhanced likelihood of effectiveness against melanoma, currently undocumented on PubChem.},
DOI = {10.3390/ijms25116186}
}

Overall Model

Dependencies

This project uses the conda environment. In the root folder you can find the .yml file for the configuration of the conda environment. Note that some versions of the dependencies can generate problems in the configuration of the environment. For this reason, although the setup.bash file is present to configure the project, it is better to configure it manually or with the Dockerfile.

Structure

The project is structured as follows:

data: contains the code to execute to preprocess the dataset with the NCI-60 cleaned dataset;
mflow: contains the code about the model;
results: contains the checkpoints and the results;

Usage

Data Download

First, you need to download the necessary files and configure the conda environment by running the following commands:

bash setup.bash install         # install env
bash setup.bash download        # download dataset
#bash setup.bash uninstall      # delete the conda env
conda activate tumflow          # activate conda env

Data Pre-processing

To make de datasets type the following commands:

python tumflow.py preprocess --data_name melanoma_skmel28

Model Training

To train the model, first use:

python tumflow.py train --data_name melanoma_skmel28  \
                        --batch_size  256  \
                        --max_epochs 300 \
                        --gpu 0  \
                        --debug False  \
                        --save_dir [EXP_FOLDER] \
                        --b_n_flow 7  \
                        --b_hidden_ch '128,128'  \
                        --a_n_flow 130 \
                        --a_hidden_gnn 128  \
                        --a_hidden_lin '128,128'  \
                        --mask_row_size_list 1 \
                        --mask_row_stride_list 1 \
                        --noise_scale 0.6 \
                        --b_conv_lu 2 \
                        --num_workers 32 \
                        --save_interval 50

Then:

python tumflow.py train_optimizer   -snapshot [BEST_EPOCH]  \
                                    --hyperparams_path tumflow-params.json \
                                    --batch_size 256 \
                                    --model_dir [EXP_FOLDER]   \
                                    --gpu 0 \
                                    --max_epochs 10  \
                                    --weight_decay 1e-3  \
                                    --data_name melanoma_skmel28  \
                                    --hidden 100,10  \
                                    --temperature 1.0  \
                                    --property_name AVERAGE_GI50

where:

[BEST_EPOCH] represents the model snapshot with the best results obtained in the first step, e.g., model_snapshot_epoch_3.pt:
[EXP_FOLDER] indicates the folder to save the model results, e.g., ./results/melanoma_skmel28_v2_256.

Molecule Generation

Standard Molecule Generation

To generate new molecules:

python tumflow.py generate  --model_dir [EXP_FOLDER] \
                            -snapshot [BEST_EPOCH] \
                            --gpu 0 \
                            --data_name melanoma_skmel28 \
                            --hyperparams-path tumflow-params.json \
                            --batch-size 256 \
                            --temperature 0.85 \
                            --delta 0.1 \
                            --n_experiments 100 \
                            --save_fig false \
                            --correct_validity true

Structure Optimization

To optimize a molecule use the following command:

python tumflow.py optimize  -snapshot [BEST_EPOCH]  \
                            --hyperparams_path tumflow-params.json \
                            --batch_size 256 \
                            --model_dir [EXP_FOLDER]   \
                            --gpu 0   \
                            --data_name melanoma_skmel28   \
                            --property_name AVERAGE_GI50 \
                            --topk 150  \
                            --property_model_path [PROPERTY_PREDICTOR_BEST_EPOCH] \
                            --debug false  \
                            --topscore

where:

[PROPERTY_PREDICTOR_BEST_EPOCH] represents the name of the property predictor, e.g., model_predictor_snapshot_average_gi50.pt

Results

Pre-trained Models

The pre-trained models, which should be placed in the folder ./results/, can be downloaded at the following link.

Generation from molecules in the NCI-60 dataset

Generation from clinical molecules

Docker

Once the repository is configured, you can use a Docker container to execute TumFlow.

Standard Approach

First, you need to build the Docker:

docker build -t tumflow .

Then, you need to execute it:

docker run --gpus all -v ./:/TumFlow/ -it tumflow [COMMAND]

where COMMAND is the command to execute in the container.

Example:

docker run --gpus all -v ./:/TumFlow/ -it tumflow python tumflow.py     optimize  \
                                                                        -snapshot [BEST_EPOCH]  \
                                                                        --hyperparams_path tumflow-params.json \
                                                                        --batch_size 256 \
                                                                        --model_dir [EXP_FOLDER]   \
                                                                        --gpu 0   \
                                                                        --data_name melanoma_skmel28   \
                                                                        --property_name AVERAGE_GI50 \
                                                                        --topk 150  \
                                                                        --property_model_path [PROPERTY_PREDICTOR_BEST_EPOCH] \
                                                                        --debug false  \
                                                                        --topscore

Docker-Compose Approach

First, you need to compose the service:

docker compose up

Then, you need to execute it:

docker compose run tumflow conda run --no-capture-output -n tumflow [COMMAND]

where COMMAND is the command to execute in the container.

Example:

docker compose run tumflow conda run --no-capture-output -n tumflow python tumflow.py   optimize  \
                                                                                        -snapshot [BEST_EPOCH]  \
                                                                                        --hyperparams_path tumflow-params.json \
                                                                                        --batch_size 256 \
                                                                                        --model_dir [EXP_FOLDER]   \
                                                                                        --gpu 0   \
                                                                                        --data_name melanoma_skmel28   \
                                                                                        --property_name AVERAGE_GI50 \
                                                                                        --topk 150  \
                                                                                        --property_model_path [PROPERTY_PREDICTOR_BEST_EPOCH] \
                                                                                        --debug false  \
                                                                                        --topscore

Information

Our code is based on moflow. Thanks!

License

Creative Commons Attribution-NonCommercial 4.0 International (CC-BY-NC-4.0)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
images		images
mflow		mflow
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
setup.bash		setup.bash
tumflow.py		tumflow.py
tumflow_env.yml		tumflow_env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TumFlow

Overall Model

Dependencies

Structure

Usage

Data Download

Data Pre-processing

Model Training

Molecule Generation

Standard Molecule Generation

Structure Optimization

Results

Pre-trained Models

Generation from molecules in the NCI-60 dataset

Generation from clinical molecules

Docker

Standard Approach

Docker-Compose Approach

Information

License

About

Releases

Packages

Languages

License

drigoni/TumFlow

Folders and files

Latest commit

History

Repository files navigation

TumFlow

Overall Model

Dependencies

Structure

Usage

Data Download

Data Pre-processing

Model Training

Molecule Generation

Standard Molecule Generation

Structure Optimization

Results

Pre-trained Models

Generation from molecules in the NCI-60 dataset

Generation from clinical molecules

Docker

Standard Approach

Docker-Compose Approach

Information

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages