Skip to content

TumFlow: an AI model for predicting new anticancer molecules with a focus on the melanoma SK-MEL-28 cell line.

License

Notifications You must be signed in to change notification settings

drigoni/TumFlow

Repository files navigation

TumFlow

This repository contains the code used to generate the results reported in the paper: TumFlow: An AI Model for Predicting New Anticancer Molecules.

@Article{ijms25116186,
AUTHOR = {Rigoni, Davide and Yaddehige, Sachithra and Bianchi, Nicoletta and Sperduti, Alessandro and Moro, Stefano and Taccioli, Cristian},
TITLE = {TumFlow: An AI Model for Predicting New Anticancer Molecules},
JOURNAL = {International Journal of Molecular Sciences},
VOLUME = {25},
YEAR = {2024},
NUMBER = {11},
ARTICLE-NUMBER = {6186},
URL = {https://www.mdpi.com/1422-0067/25/11/6186},
ISSN = {1422-0067},
ABSTRACT = {Melanoma is the fifth most common cancer in the United States. Conventional drug discovery methods are inherently time-consuming and costly, which imposes significant limitations. However, the advent of Artificial Intelligence (AI) has opened up new possibilities for simulating and evaluating numerous drug candidates, thereby mitigating the requisite time and resources. In this context, normalizing flow models by employing machine learning techniques to create new molecular structures holds promise for accelerating the discovery of effective anticancer therapies. This manuscript introduces TumFlow, a novel AI model designed to generate new molecular entities with potential therapeutic value in cancer treatment. It has been trained on the NCI-60 dataset, encompassing thousands of molecules tested across 60 tumour cell lines, with an emphasis on the melanoma SK-MEL-28 cell line. The model successfully generated new molecules with predicted improved efficacy in inhibiting tumour growth while being synthetically feasible. This represents a significant advancement over conventional generative models, which often produce molecules that are challenging or impossible to synthesize. Furthermore, TumFlow has also been utilized to optimize molecules known for their efficacy in clinical melanoma treatments. This led to the creation of novel molecules with a predicted enhanced likelihood of effectiveness against melanoma, currently undocumented on PubChem.},
DOI = {10.3390/ijms25116186}
}

Overall Model

drawing

Dependencies

This project uses the conda environment. In the root folder you can find the .yml file for the configuration of the conda environment. Note that some versions of the dependencies can generate problems in the configuration of the environment. For this reason, although the setup.bash file is present to configure the project, it is better to configure it manually or with the Dockerfile.

Structure

The project is structured as follows:

  • data: contains the code to execute to preprocess the dataset with the NCI-60 cleaned dataset;
  • mflow: contains the code about the model;
  • results: contains the checkpoints and the results;

Usage

Data Download

First, you need to download the necessary files and configure the conda environment by running the following commands:

bash setup.bash install         # install env
bash setup.bash download        # download dataset
#bash setup.bash uninstall      # delete the conda env
conda activate tumflow          # activate conda env

Data Pre-processing

To make de datasets type the following commands:

python tumflow.py preprocess --data_name melanoma_skmel28

Model Training

To train the model, first use:

python tumflow.py train --data_name melanoma_skmel28  \
                        --batch_size  256  \
                        --max_epochs 300 \
                        --gpu 0  \
                        --debug False  \
                        --save_dir [EXP_FOLDER] \
                        --b_n_flow 7  \
                        --b_hidden_ch '128,128'  \
                        --a_n_flow 130 \
                        --a_hidden_gnn 128  \
                        --a_hidden_lin '128,128'  \
                        --mask_row_size_list 1 \
                        --mask_row_stride_list 1 \
                        --noise_scale 0.6 \
                        --b_conv_lu 2 \
                        --num_workers 32 \
                        --save_interval 50

Then:

python tumflow.py train_optimizer   -snapshot [BEST_EPOCH]  \
                                    --hyperparams_path tumflow-params.json \
                                    --batch_size 256 \
                                    --model_dir [EXP_FOLDER]   \
                                    --gpu 0 \
                                    --max_epochs 10  \
                                    --weight_decay 1e-3  \
                                    --data_name melanoma_skmel28  \
                                    --hidden 100,10  \
                                    --temperature 1.0  \
                                    --property_name AVERAGE_GI50 

where:

  • [BEST_EPOCH] represents the model snapshot with the best results obtained in the first step, e.g., model_snapshot_epoch_3.pt:
  • [EXP_FOLDER] indicates the folder to save the model results, e.g., ./results/melanoma_skmel28_v2_256.

Molecule Generation

Standard Molecule Generation

To generate new molecules:

python tumflow.py generate  --model_dir [EXP_FOLDER] \
                            -snapshot [BEST_EPOCH] \
                            --gpu 0 \
                            --data_name melanoma_skmel28 \
                            --hyperparams-path tumflow-params.json \
                            --batch-size 256 \
                            --temperature 0.85 \
                            --delta 0.1 \
                            --n_experiments 100 \
                            --save_fig false \
                            --correct_validity true

Structure Optimization

To optimize a molecule use the following command:

python tumflow.py optimize  -snapshot [BEST_EPOCH]  \
                            --hyperparams_path tumflow-params.json \
                            --batch_size 256 \
                            --model_dir [EXP_FOLDER]   \
                            --gpu 0   \
                            --data_name melanoma_skmel28   \
                            --property_name AVERAGE_GI50 \
                            --topk 150  \
                            --property_model_path [PROPERTY_PREDICTOR_BEST_EPOCH] \
                            --debug false  \
                            --topscore

where:

  • [PROPERTY_PREDICTOR_BEST_EPOCH] represents the name of the property predictor, e.g., model_predictor_snapshot_average_gi50.pt

Results

Pre-trained Models

The pre-trained models, which should be placed in the folder ./results/, can be downloaded at the following link.

Generation from molecules in the NCI-60 dataset

drawing

drawing

Generation from clinical molecules

drawing

Docker

Once the repository is configured, you can use a Docker container to execute TumFlow.

Standard Approach

First, you need to build the Docker:

docker build -t tumflow .

Then, you need to execute it:

docker run --gpus all -v ./:/TumFlow/ -it tumflow [COMMAND]

where COMMAND is the command to execute in the container.

Example:

docker run --gpus all -v ./:/TumFlow/ -it tumflow python tumflow.py     optimize  \
                                                                        -snapshot [BEST_EPOCH]  \
                                                                        --hyperparams_path tumflow-params.json \
                                                                        --batch_size 256 \
                                                                        --model_dir [EXP_FOLDER]   \
                                                                        --gpu 0   \
                                                                        --data_name melanoma_skmel28   \
                                                                        --property_name AVERAGE_GI50 \
                                                                        --topk 150  \
                                                                        --property_model_path [PROPERTY_PREDICTOR_BEST_EPOCH] \
                                                                        --debug false  \
                                                                        --topscore

Docker-Compose Approach

First, you need to compose the service:

docker compose up

Then, you need to execute it:

docker compose run tumflow conda run --no-capture-output -n tumflow [COMMAND]

where COMMAND is the command to execute in the container.

Example:

docker compose run tumflow conda run --no-capture-output -n tumflow python tumflow.py   optimize  \
                                                                                        -snapshot [BEST_EPOCH]  \
                                                                                        --hyperparams_path tumflow-params.json \
                                                                                        --batch_size 256 \
                                                                                        --model_dir [EXP_FOLDER]   \
                                                                                        --gpu 0   \
                                                                                        --data_name melanoma_skmel28   \
                                                                                        --property_name AVERAGE_GI50 \
                                                                                        --topk 150  \
                                                                                        --property_model_path [PROPERTY_PREDICTOR_BEST_EPOCH] \
                                                                                        --debug false  \
                                                                                        --topscore

Information

Our code is based on moflow. Thanks!

License

Creative Commons Attribution-NonCommercial 4.0 International (CC-BY-NC-4.0)

About

TumFlow: an AI model for predicting new anticancer molecules with a focus on the melanoma SK-MEL-28 cell line.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages