<a href="https://colab.research.google.com/github/bzhu8/NanoQmodel1/blob/main/NanoQ_Model_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**NanoQ-model 1.0: An end-to-end trained protein language model predicts the quenching efficiency of quenchbody biosensors**

Update: Jan. 25th, 2025

Usage guide:
1. Run the Step 1 and Step 2 cells to prepare the necessary environment and the model files.
2. Input the CDR1 and CDR3 sequence with a space between each amino acid, and a [SEP] between CDR1 and CDR3. The sequence including the three amino acid frameworks should be used to repeat the prediction results in the manuscript.

 Example sequence: A S G T I F Q V G S V G W [SEP] Y C A A L G Q V S E Y N S A S Y E W T Y P Y W G

 Format example: A S G (CDR1 seuqnece) M G W [SEP] Y C A (CDR3 seuqnece) Y W G
3. Run the Step 3 cell to predict the probability score for quenching. The three amino acid framework sequences can be adjusted according to the antibody sequence of interest.

(This model can be used under CC BY-NC-ND 4.0 Deed License)

**Reference**: **Akihito Inoue**#, **Bo Zhu**#, Keisuke Mizutani, Ken Kobayashi, Takanobu Yasuda, Alon Wellner, Chang C. Liu, Tetsuya Kitaguchi, Prediction of single-mutation effects for fluorescent immunosensor engineering with an end-to-end trained protein language model, 2024, Jxiv, 971. (#Equal contributions) DOI: https://doi.org/10.51094/jxiv.971

#@title Step 1: Install Transformers (Please click 'runtime restart' button if appears) < 1 min
#Change to transformers version 4.28.1
%%capture
!pip install transformers==4.28.1

import transformers
print(transformers.__version__)

#@title Step 2: Environment preparation + NanoQ-model 1.0 loading ~ 1-2 min.
%%capture
#import transformers
import pandas as pd
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
#import torch.optim as optim
from transformers import AutoModel, AutoTokenizer

#Download NanoQ-model-1.0
!wget https://huggingface.co/bzhu8/Qmodel1/resolve/main/230527Qmodel1.pth

# BERT tokenizer initiate pLM prot_bert_bfd
model_name = "Rostlab/prot_bert_bfd"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# ProtBERT Transformer model Initialization
class Transformer(nn.Module):
    def __init__(self):
        super(Transformer, self).__init__()
        self.bert = AutoModel.from_pretrained("Rostlab/prot_bert_bfd").to(device)
        self.dropout = nn.Dropout(0.1)
        self.linear = nn.Linear(1024, out_features=1)

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids, attention_mask=attention_mask)
        pooled_output = outputs[1]
        pooled_output = self.dropout(pooled_output)
        x = self.linear(pooled_output).squeeze()
        return x

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Define model
model = Transformer().to(device)
# Load end-to-end trained NanoQ-model1.0
model_path = '/content/230527Qmodel1.pth'
model.load_state_dict(torch.load(model_path))
model = model.cpu()
max_len = 512

In [15]:
%%time
#@title Step 3: Input the CDR1 and CDR3 sequences to predict the quenching efficiency (probability score, range 0-1)

CDR = "A S G T I F Q V G S V G W [SEP] Y C A A L G Q V S E Y N S A S Y E W T Y P Y W G" #@param {type:"string"}

sequences = CDR
labels = [0.]

#create data
pred_seq=pd.DataFrame({"sequence":sequences, "label":labels})
pred_seq
encodings = tokenizer(list(pred_seq["sequence"]), truncation=True, padding=True, max_length=max_len, return_tensors='pt')

# prediction
model.eval()
with torch.no_grad():
    test_inputs = {
        'input_ids': encodings['input_ids'],
        'attention_mask': encodings['attention_mask'],
    }
    outputs = model(**test_inputs)
predictions = F.sigmoid(outputs).cpu().detach().numpy()

# print score (probability score)

import sys
np.set_printoptions(threshold=sys.maxsize)
print('Probability Score for quenching =', predictions)

Probability Score for quenching = 0.672328
CPU times: user 933 ms, sys: 4.21 ms, total: 937 ms
Wall time: 964 ms
