In [2]:
# # (optional) check if the installation is successful
# import torch_sparse
# import torch_geometric
# import torch_cluster
# import torch_scatter
# import torch_spline_conv

# Introduction

This notebook shows you how to run zero-shot with an ensemble of non-structure-informed models (ESM-1v, ESM-2 3B) with the ```zero_shot_esm_dms``` function as well as run zero-shot with a structure-informed model (ESM-IF) using ```zero_shot_esm_if_dms``` function. 

```zero_shot_esm_dms``` requires the wild-type amino acid sequence of the protein of the interest

```zero_shot_esm_if_dms``` requires the wild-type amino acid sequence and the structure of the protein of the interest, including the chain id of the protein of interest in the structure file.

Both functions return a dataframe with all the possible single amino acid mutations and their corresponding log likelihood ratio scores (i.e. the ratio of the likelihood compared to the wild-type sequence).

In [3]:
from Bio import SeqIO

from multievolve import zero_shot_esm_dms, zero_shot_esm_if_dms

In [4]:
wt_file = "../../data/example_protein/apex.fasta"
pdf_file = "../../data/example_protein/apex.cif"

In [None]:
wt_seq = str(SeqIO.read(wt_file, "fasta").seq)

esm_zeroshot = zero_shot_esm_dms(wt_seq)
esm_if_zeroshot = zero_shot_esm_if_dms(wt_seq, pdf_file, chain_id = 'A', scoring_strategy='wt-marginals')

The ```zero_shot_esm_dms``` returns a dataframe with the log likelihood ratio scores for each model as well as whether the mutation had a ratio greater than 1 indicated by the corresponding ```model#_pass``` column.

In [None]:
esm_zeroshot.head(10)

The ```zero_shot_esm_if_dms``` returns a dataframe with the log likelihood ratio scores.

In [None]:
esm_if_zeroshot.head(10)