GexMolGen 🐰

Have you ever thought about designing personalized drugs based on your own genes? Sounds fascinating, doesn't it? This is something our GexMolGen aim for!

Introduction

GexMolGen is a model for generating hit-like molecules based on gene expression signatures. The workflow of GexMolGen is shown below:

We divide the task of generating hit-like molecules from gene expression profiles into four steps:

Encoding of gene expression and small molecular data
Matching of genetic modality and small molecular modality
Transformation from genetic modality to small molecular modality
Generation of small molecules

To simplify the process, we use pre-trained models for the encoders in steps 1, namely scGPT and hierVAE. Step 2 is introduced to align the genetic and molecular modalities, while step 3 facilitates the transformation from genetic embeddings to molecular ones. These stages are inspired by DALL.E - simple yet effective! Hahaha..

GexMolGen is an attempt to explore the chemical and biological relationships in the drug discovery process using large language models and multimodal techniques. It has high effectiveness in generating results, flexible input, and strong controllability. For further details, please refer to our paper GexMolGen: Cross-modal Generation of Hit-like Molecules via Large Language Model Encoding of Gene Expression Signatures.

Installation

Before running pip install -r requirements.txt, we strongly advise that you individually install RDKit, FlashAttention, PyTorch on your device. Here are some configurations from our device for reference:

CUDA == 11.7
Python == 3.8
rdkit == 2023.3.2
flash-attn == 1.0.1
torch == 1.13.0+cu117
gradio == 3.40.1

Please be mindful of version compatibility during your actual setup.

Next, you need to pull down scGPT under this project. Installation is not necessary.

Model Parameters

If you want to use our model, you can download it from the provided link. This link already includes the pre-trained 'whole-human' version of the scGPT weights, so there's no need for an additional download.

Demo

To facilitate your use of our model, we have created an interactive interface. After configuring the environment and adjusting some addresses according to your installation path, you can simply run python server.py in the command line to display the interface.

We currently have two integrated functions: Standard and Screen.

Standard: This function generates a specified number of drugs based on gene transcription profile data.
Screen: This function allows you to input reference molecules and similarity calculation methods. It will output the generated results in descending order of similarity to the reference molecules
Retrieving: Retrieval of potential small molecules by providing gene expression profiles and the molecular database you want to search. 🆕

We provide experimental data for AKT2 (server_test_ctl.csv and server_test_pert.csv) and reference inhibitors (AKT2_ref.csv). You can use the Screen function to verify the Result 2.3 in our paper.

To-do-list

Upload video version explanation of the demo
Upload the complete datase
Upload training code

Acknowledgements

Finally, we would like to express our deepest gratitude to the authors of scGPT and hierVAE. They have not only created excellent work but also made it open source for the benefit of researchers worldwide.

No matter what questions you have, feel free to contact us via email or raise issues on GitHub. We firmly believe that different perspectives helps us develop better tools. 😉

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
data		data
hgraph		hgraph
AKT2_ref.csv		AKT2_ref.csv
Overview.pdf		Overview.pdf
README.md		README.md
functions.py		functions.py
server.py		server.py
server_test_ctl_AKT2.csv		server_test_ctl_AKT2.csv
server_test_pert_AKT2.csv		server_test_pert_AKT2.csv
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

data

data

hgraph

hgraph

AKT2_ref.csv

AKT2_ref.csv

Overview.pdf

Overview.pdf

README.md

README.md

functions.py

functions.py

server.py

server.py

server_test_ctl_AKT2.csv

server_test_ctl_AKT2.csv

server_test_pert_AKT2.csv

server_test_pert_AKT2.csv

utils.py

utils.py

Repository files navigation

GexMolGen 🐰

Introduction

Installation

Model Parameters

Demo

To-do-list

Acknowledgements

About

Releases

Packages

Languages

Bunnybeibei/GexMolGen

Folders and files

Latest commit

History

Repository files navigation

GexMolGen 🐰

Introduction

Installation

Model Parameters

Demo

To-do-list

Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages