ConfGen

ConfGen is diffusion model to generate molecule conformation condition with key atom positions.

Installing

Conda enviroment

$ conda create -n confgen python=3.9
$ conda activate confgen
$ pip install -r requirements.txt

Dataset (Use only for training)

We use a custom ZINC20 drug-like dataset to train our model, detailed filtering is described in the dataset/README.md.

$ cd datset
$ sh ZINC-downloader-3D-sdf.gz.curl
$ sh additional-ZINC-downloader-3D-sdf.gz.curl

$ mkdir ZINC
$ gzip -f -d *.xaa.sdf.gz
$ mv *.xaa.sdf ZINC/

Model weight

Model weight can be downloaded from Zenodo

$ wget -P confgen/model https://zenodo.org/records/10663250/files/model.pt

Train

Preprocessing

Preprocessing takes about an hour.

$ python confgen/sdf_to_dataset.py

Training

$ python confgen/trainer.py --input dataset/input --T 500 --lr 2e-4 --coords_agg mean --noise_schedule cosine --num_egnn_layers 6 --num_gcl_layers 1 --attention True --batch 64 --scale 5 --scale_eps True

Optional flags:

Flag	Description
--input	Input directory path
--wandb	Using Wandb
--T	Timesteps for diffusion process
--lr	Learning rate
--batch	Batch size
--Attention	Use attention layer
--coords_agg	Aggregation method for coordinate (`sum` or `mean`)
--noise_schedule	Noise schedule for diffusion process (`linear` or `cosine`)
--num_egnn_layers	Number of egnn layer
--num_gcl_layers	Number of sub-egnn layer
--resume	Resume with pretrained model
--shuffle	Shuffle dataloader
--cutoff	Max molecule samples of each heavy atoms number
--scale	Reduction factor for coordinate scale
--scale_eps	Use epsilon scaling for diffusion process
--num_epochs	Num of maxinum epoch

Inference

Option 1. GUI (jupyter notebook)

You can easily inference from confgen/inference.ipynb as a user-friendly interface with GUI enviroments.

Option 2. CLI

Unconditional Sampling

Generate molecule conformation w/o conditioning. Sample trajectory pdb file was saved in confgen/sample

$ cd confgen
$ python inference.py --smiles <query_smiles> --model_path <model_path>

Optional flags:

Flag	Description
--smiles	Query molecule smiles
--n_samples	Number of sampled molecule
--timesteps	Compressed time steps for inference speed acceleration (using DDIM)
--model_path	Model weight path
--save	Save sampled molecule with diffusion trajectories

Conditional Sampling

Generate molecule conformation w conditioning.

$ cd confgen
$ python inference.py --pdb_path <query_molecule_pdb> --key_atom_list <key_atom_list> --model_path <model_path>

Optional flags:

Flag	Description
--pdb_path	Query molecule pdb path
--key_atom_list	Key atom name list (e.g. O1 O2 C10 C15)
--resampling	Number of resampling for each steps
--refix	Number of refix steps
--mode	Conditioning mode (`fixed` or `replacement`)
--n_samples	Number of sampled molecule
--timesteps	Compressed time steps for inference speed acceleration (using DDIM)
--model_path	Model weight path
--save	Save sampled molecule with diffusion trajectories

Acknowledgements

ehoogeboom / e3_diffusion_for_molecules (Model Architecture)

ermongroup / ddim (Accelerate Inference Speed)

arneschneuing / DiffSBDD (Conditioning Method)

blt2114 / ProtDiff_SMCDiff (Data Representation)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
confgen		confgen
dataset		dataset
example		example
figure		figure
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ConfGen

Installing

Conda enviroment

Dataset (Use only for training)

Model weight

Train

Preprocessing

Training

Inference

Option 1. GUI (jupyter notebook)

Option 2. CLI

Unconditional Sampling

Conditional Sampling

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

HParklab/ConfDiff

Folders and files

Latest commit

History

Repository files navigation

ConfGen

Installing

Conda enviroment

Dataset (Use only for training)

Model weight

Train

Preprocessing

Training

Inference

Option 1. GUI (jupyter notebook)

Option 2. CLI

Unconditional Sampling

Conditional Sampling

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages