# TOPT prediction with Prime

This tutorial demonstrates how to predict the OGT of a protein using a pretrained model from the Prime model.

We provide:

- The sequences, a FASTA file.

Goals
Obtain an predicted TOPT for each sequence.


## Import the necessary libraries and modules.

In [11]:
from transformers import AutoTokenizer, AutoModel
import torch
import pandas as pd
from Bio import SeqIO
from tqdm.notebook import tqdm

## Prepare data path

In [12]:
sequence_file = "example.fasta"

## Load model and Tokenizer

In [15]:
model_path = "AI4Protein/Prime_690M"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.eval()
model = model.to(device)

## Prediction

In [17]:
togt = []
with torch.no_grad():
    for record in tqdm(list(SeqIO.parse(sequence_file, "fasta"))):
        sequence = str(record.seq)
        tokenied_results = tokenizer(sequence, return_tensors="pt")
        input_ids = tokenied_results.input_ids.to(device)
        attention_mask = tokenied_results.attention_mask.to(device)
        logits = model(input_ids=input_ids, attention_mask=attention_mask).predicted_values
        togt.append(logits.item())

  0%|          | 0/14 [00:00<?, ?it/s]

In [18]:
togt

[29.42494010925293,
 30.343338012695312,
 25.358503341674805,
 28.9854736328125,
 25.786643981933594,
 25.800437927246094,
 25.850440979003906,
 28.526737213134766,
 30.309772491455078,
 22.676509857177734,
 23.928354263305664,
 25.09626007080078,
 21.36566162109375,
 27.3638973236084]