<a href="https://colab.research.google.com/github/bzhu8/NanoQmodel1/blob/main/NanoQ_Model_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
**NanoQ-model 1.0: An end-to-end trained protein language model predicts the fluorophore quenching efficiency of nano-Q-body biosensor (beta test v2).**
Update: Jan. 25th, 2025

All the information in this note is for the internal test of the NanoQ-model-1.

To use or test this model, you need to admit to not copy and transfer the model to any other locations before the official realease of the model.

Usage guide:
1. Run the Step 1 and Step 2 cells to prepare the necessary environment and the model files.
2. Input the CDR1 and CDR3 sequence with a space between each amino acid, and a [SEP] between CDR1 and CDR3. The sequence in including the 3 amino acid framework should be used to repeat the prediction results in the manuscript.

 Example sequence: A S G T I F Q V G S V G W [SEP] Y C A A L G Q V S E Y N S A S Y E W T Y P Y W G

 Format example: A S G (CDR1 seuqnece) M G W [SEP] Y C A (CDR3 seuqnece) Y W G
3. Run the Step 3 cell to predict the probability of the high quenching efficiency.

**Colab Version Bo Zhu and co-authors @ Science Tokyo**
(License: CC BY-NC-ND 4.0 Deed)

**Reference**: **Inoue**#, **Zhu**#, (other 5 co-authors), & Kitaguchi, Prediction of single-mutation effects for fluorescent immunosensor engineering with an end-to-end trained protein language model, in preparation. (#Equal contributions)

In [None]:
#@title Step 1: Install Transformers (Please click 'runtime restart' button if appears) < 1 min
#Change to transformers version 4.28.1
%%capture
!pip install transformers==4.28.1
#import os
#os.kill(os.getpid(), 9)
#pwd
import transformers
print(transformers.__version__)

In [None]:
#@title Step 2: Environment preparation + NanoQ-model 1.0 loading ~ 1-2 min.
%%capture
import transformers
import pandas as pd
import urllib.request
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve, roc_curve, auc
from sklearn.model_selection import train_test_split
from tqdm.notebook import tqdm


import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from transformers import AutoModel, AutoTokenizer

#Download NanoQ-model-1.0
!wget https://huggingface.co/bzhu8/Qmodel1/resolve/main/230527Qmodel1.pth

# BERT tokenizer initiate prot_bert_bfd
model_name = "Rostlab/prot_bert_bfd"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# ProtBERT Transformer model Initialization
class Transformer(nn.Module):
    def __init__(self):
        super(Transformer, self).__init__()
        self.bert = AutoModel.from_pretrained("Rostlab/prot_bert_bfd").to(device)
        self.dropout = nn.Dropout(0.1)
        self.linear = nn.Linear(1024, out_features=1)

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids, attention_mask=attention_mask)
        pooled_output = outputs[1]
        pooled_output = self.dropout(pooled_output)
        x = self.linear(pooled_output).squeeze()
        return x

# set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# define model
model = Transformer().to(device)
# load end-to-end trained NanoQ-model1.0
model_path = '/content/230527Qmodel1.pth'
model.load_state_dict(torch.load(model_path))
model = model.cpu()
max_len = 512

In [None]:
%%time
#@title Step 3: Input the CDR1 and CDR3 sequences to predict the quenching efficiency (quenching probability score)

CDR = "A S G Y T F T D Y W M N W [SEP] Y C E S Q S G A Y W G" #@param {type:"string"}

sequences = CDR
labels = [0.]

#create data
pred_seq=pd.DataFrame({"sequence":sequences, "label":labels})
pred_seq
encodings = tokenizer(list(pred_seq["sequence"]), truncation=True, padding=True, max_length=max_len, return_tensors='pt')

# prediction
model.eval()
with torch.no_grad():
    test_inputs = {
        'input_ids': encodings['input_ids'],
        'attention_mask': encodings['attention_mask'],
    }
    outputs = model(**test_inputs)
predictions = F.sigmoid(outputs).cpu().detach().numpy()

# print prediction score (probability score)

import sys
np.set_printoptions(threshold=sys.maxsize)
print('Quenching probability score =', predictions)

Quenching probability score = 0.70090234
CPU times: user 732 ms, sys: 1.25 ms, total: 733 ms
Wall time: 777 ms
