# Uncertainty guided virtual screening of small molecules
This tutorial describes the process to perform uncertainty guided virtual screening by utilizing a deterministic pre-trained predictor model. We utlize the active subspace approach to enable the uncertainty quantification of the pre-trained predictor model. 

![alt text](image.png)

1. The UQ guided virtual screening pipeline takes the pre-trained property predictor and enables the uncertainty quantification by the active subspace (AS) around the pre-trained model weights. 
2. Next, the properties of the candidate moleules are predicted in Bayesian inference manner which also provides the uncertainty in the predictions. 
3. Finally, we perform screening based on the predictions and the corresponding uncertainties. By removing samples with higher uncertainty (lower confidence) in predicted class-labels, we want to improve the hit rate (success rate of selecting active samples in the screened pool of candidates) of the virtual screening process.

In this tutorial, we will describe the first step, i.e. enabling UQ with active subspace around pre-trained property predictor weights. For more details, see the repository UQ_VS. 

# Install dependencies
The `basic_env.yml` file contains the required package information. Run the following command to create a conda environment for the project.


In [None]:
%%bash
conda env create -f basic_env.yml
source activate vs_env
pip install -e .

# Train the predictor model (Step 0)
We first train a predictor model for molecular prorperty DRD2. Note this prediction network provides a deterministic prediction of molecular activity against DRD2.

In [None]:
!python train_surrogate.py --prop_name=DRD2

# Enable UQ for the predictor model (Step 1)
The trained predictor model does not provide the uncertainty in its prediction. To perform UQ through AS, first we need to construct the active subspace around the pre-trained model weights, and learn the posterior distribution over the active subspace parameters by variational inference technique. Later this learned/approximated posterior distribution is used in Bayesian inference for the molecules during screening.

In [None]:
!python run_active_subspace_construction.py --prop_name=DRD2 --AS_dim=10
!python run_vi_training.py --prop_name=DRD2 --AS_dim=10